As per Relevance of the word bandwidth, we have this rfc below:
Network Working Group R.
Request for Comments: 1106
June 1989
TCP Big Window and Nak
Status of this
This memo discusses two extensions to the TCP protocol to provide
more efficient operation over a network with a high bandwidth*
product. The extensions described in this document have
implemented and shown to work using resources at NASA. This
describes an Experimental Protocol, these extensions are not
as an Internet standard, but as a starting point for
research. Distribution of this memo is unlimited
Two extensions to the TCP protocol are described in this RFC in
to provide a more efficient operation over a network with a
bandwidth*delay product. The main issue that still needs to
solved is congestion versus noise. This issue is touched on in
memo, but further research is still needed on the applicability
the extensions in the Internet as a whole infrastructure and not
high bandwidth*delay product networks. Even with this
issue, this document does describe the use of these options in
isolated satellite network environment to help facilitate
efficient use of this special medium to help off load bulk
transfers from links needed for interactive use
1.
Recent work on TCP has shown great performance gains over a
of network paths [1]. However, these changes still do not work
over network paths that have a large round trip delay (satellite
a 600 ms round trip delay) or a very large
(transcontinental DS3 line). These two networks exhibit a
bandwidth*delay product, over 10**6 bits, than the 10**5 bits
TCP is currently limited to. This high bandwidth*delay
refers to the amount of data that may be unacknowledged so that
of the networks bandwidth is being utilized by TCP. This may also
referred to as "filling the pipe" [2] so that the sender of data
always put data onto the network and the receiver will always
something to read, and neither end of the connection will be
to wait for the other end
After the last batch of algorithm improvements to TCP,
Fox [Page 1]
RFC 1106 TCP Big Window and Nak Options June 1989
over high bandwidth*delay networks is still very poor. It
that no algorithm changes alone will make any
improvements over high bandwidth*delay networks, but will require
extension to the protocol itself. This RFC discusses two
options to TCP for this purpose
The two options implemented and discussed in this RFC are
1.
This extension allows the receiver of data to inform the
that a packet of data was not received and needs to be resent
This option proves to be useful over any network path (both
and low bandwidth*delay type networks) that experiences
errors such as lost packets, noisy links, or dropped packets
to congestion. The information conveyed by this option
advisory and if ignored, does not have any effect on TCP what
ever
2. Big
This option will give a method of expanding the current 16 bit (64
Kbytes) TCP window to 32 bits of which 30 bits (over 1 gigabytes
are allowed for the receive window. (The maximum window
allowed in TCP due to the requirement of TCP to detect old
versus new data. For a good explanation please see [2].)
changes are required to the standard TCP header [6]. The 16
field in the TCP header that is used to convey the receive
will remain unchanged. The 32 bit receive window is
through the use of an option that contains the upper half of
window. It is this option that is necessary to fill large
pipes such as a satellite link
This RFC is broken up into the following sections: section 2
discuss the operation of the NAK option in greater detail, section 3
will discuss the big window option in greater detail. Section 4
discuss other effects of the big windows and nak feature when
together. Included in this section will be a brief discussion on
effects of congestion versus noise to TCP and possible options
satellite networks. Section 5 will be a conclusion with some
as to what future development may be done at NASA, and then
appendix containing some test results is included
2. NAK
Any packet loss in a high bandwidth*delay network will have
catastrophic effect on throughput because of the
acknowledgement of TCP. TCP always acks the stream of data that
Fox [Page 2]
RFC 1106 TCP Big Window and Nak Options June 1989
successfully been received and tells the sender the next byte of
of the stream that is expected. If a packet is lost and
packets arrive the current protocol has no way of telling the
that it missed one packet but received following packets.
currently resends all of the data over again, after a timeout or
sender suspects a lost packet due to a duplicate ack algorithm [1],
until the receiver receives the lost packet and can then ack the
packet as well as succeeding packets received. On a normal
bandwidth*delay network this effect is minimal if the timeout
is set short enough. However, on a long delay network such as a T
satellite channel this is catastrophic because by the time the
packet can be sent and the ack returned the TCP window would
been exhausted and both the sender and receiver would be
stalled waiting for the packet and ack to fully travel the data pipe
This causes the pipe to become empty and requires the sender
refill the pipe after the ack is received. This will cause a
of 3*X bandwidth loss, where X is the one way delay of the medium
may be much higher depending on the size of the timeout period
bandwidth*delay product. Its 1X for the packet to be resent, 1X
the ack to be received and 1X for the next packet being sent to
the destination. This calculation assumes that the window size
much smaller than the pipe size (window = 1/2 data pipe or 1X),
is the typical case with the current TCP window limitation over
delay networks such as a T1 satellite link
An attempt to reduce this wasted bandwidth from 3*X was introduced
[1] by having the sender resend a packet after it notices that
number of consecutively received acks completely acknowledges
acknowledged data. On a typical network this will reduce the
bandwidth to almost nil, since the packet will be resent before
TCP window is exhausted and with the data pipe being much
than the TCP window, the data pipe will not become empty and
bandwidth will be lost. On a high delay network the reduction
lost bandwidth is minimal such that lost bandwidth is
significant. On a very noisy satellite, for instance, the
bandwidth is very high (see appendix for some performance figures
and performance is very poor
There are two methods of informing the sender of lost data
Selective acknowledgements and NAKS. Selective acknowledgements
been the object of research in a number of experimental
including VMTP [3], NETBLT [4], and SatFTP [5]. The idea
selective acks is that the receiver tells the sender which pieces
received so that the sender can resend the data not acked but
sent once. NAKs on the other hand, tell the sender that a
packet of data needs to be resent
There are a couple of disadvantages of selective acks. Namely,
Fox [Page 3]
RFC 1106 TCP Big Window and Nak Options June 1989
some of the protocols mentioned above, the receiver waits a
time before sending the selective ack so that acks may be bundled up
This delay can cause some wasted bandwidth and requires more
state information than the simple nak. Even if the receiver doesn'
bundle up the selective acks but sends them as it notices
packets have been lost, more complex state information is needed
determine which packets have been acked and which packets need to
resent. With naks, only the immediate data needed to move the
edge of the window is naked, thus almost completely eliminating
state information
The selective ack has one advantage over naks. If the link is
noisy and packets are being lost close together, then the sender
find out about all of the missing data at once and can send all
the missing data out immediately in an attempt to move the
window edge in the acknowledge number of the TCP header, thus
the data pipe flowing. Whereas with naks, the sender will
notified of lost packets one at a time and this will cause the
to process extra packets compared to selective acks. However
empirical studies has shown that most lost packets occur far
apart that the advantage of selective acks over naks is rarely seen
Also, if naks are sent out as soon as a packet has been
lost, then the advantage of selective acks becomes no more
possibly a more aesthetic algorithm for handling lost data,
offers no gains over naks as described in this paper. It is
reason that the simplicity of naks was chosen over selective acks
the current implementation
2.1 Implementation
When the receiver of data notices a gap between the expected
number and the actual sequence number of the packet received,
receiver can assume that the data between the two sequence numbers
either going to arrive late or is lost forever. Since the
can not distinguish between the two events a nak should be sent
the TCP option field. Naking a packet still destined to arrive
the effect of causing the sender to resend the packet, wasting
packets worth of bandwidth. Since this event is fairly rare,
lost bandwidth is insignificant as compared to that of not sending
nak when the packet is not going to arrive. The option will take
form as follows
+========+=========+=========================+================+
+option= + length= + sequence number of + number of +
+ A + 7 + first byte being naked + segments naked +
+========+=========+=========================+================+
This option contains the first sequence number not received and
Fox [Page 4]
RFC 1106 TCP Big Window and Nak Options June 1989
count of how many segments of bytes needed to be resent,
segments is the size of the current TCP MSS being used for
connection. Since a nak is an advisory piece of information,
sending of a nak is unreliable and no means for retransmitting a
is provided at this time
When the sender of data receives the option it may either choose
do nothing or it will resend the missing data immediately and
continue sending data where it left off before receiving the nak
The receiver will keep track of the last nak sent so that it will
repeat the same nak. If it were to repeat the same nak the
could get into the mode where on every reception of data the
would nak the first missing data frame. Since the data pipe may
very large by the time the first nak is read and responded to by
sender, many naks would have been sent by the receiver. Since
sender does not know that the naks are repetitious it will resend
data each time, thus wasting the network bandwidth with
retransmissions of the same piece of data. Having an unreliable
may result in a nak being damaged and not being received by
sender, and in this case, we will let the tcp recover by its
means. Empirical data has shown that the likelihood of the nak
lost is quite small and thus, this advisory nak option works
well
3. Big Window
Currently TCP has a 16 bit window limitation built into the protocol
This limits the amount of outstanding unacknowledged data to 64
Kbytes. We have already seen that some networks have a pipe
than 64 Kbytes. A T1 satellite channel and a cross country DS
network with a 30ms delay have data pipes much larger than 64 Kbytes
Thus, even on a perfectly conditioned link with no bandwidth
due to errors, the data pipe will not be filled and bandwidth will
wasted. What is needed is the ability to send more
data. This is achieved by having bigger windows, bigger than
current limitation of 16 bits. This option to expands the
size to 30 bits or over 1 gigabytes by literally expanding the
size mechanism currently used by TCP. The added option contains
upper 15 bits of the window while the lower 16 bits will continue
go where they normally go [6] in the TCP header
A TCP session will use the big window options only if both
agree to use them, otherwise the option is not used and the normal 16
bit windows will be used. Once the 2 sides agree to use the
windows then every packet thereafter will be expected to contain
window option with the current upper 15 bits of the window.
negotiation to decide whether or not to use the bigger windows
place during the SYN and SYN ACK segments of the TCP
Fox [Page 5]
RFC 1106 TCP Big Window and Nak Options June 1989
startup process. The originator of the connection will include
the SYN segment the following option
1 byte 1 byte 4
+=========+==========+===============+
+option=B + length=6 + 30 bit window +
+=========+==========+===============+
If the other end of the connection wants to use big windows it
include the same option back in the SYN ACK segment that it
send. At this point, both sides have agreed to use big windows
the specified windows will be used. It should be noted that the
and SYN ACK segments will use the small windows, and once the
window option has been negotiated then the bigger windows will
used
Once both sides have agreed to use 32 bit windows the protocol
function just as it did before with no difference in operation,
in the event of lost packets. This claim holds true since
rcv_wnd and snd_wnd variables of tcp contain the 16 bit windows
the big window option is negotiated and then they are replaced
the appropriate 32 bit values. Thus, the use of big windows
part of the state information kept by TCP
Other methods of expanding the windows have been presented,
a window multiple [2] or streaming [5], but this solution is
elegant in the sense that it is a true extension of the window
one day may easily become part of the protocol and not just be
option to the protocol
3.1 How does it
Once a connection has decided to use big windows every
packet must contain the following option
+=========+==========+==========================+
+option=C + length=4 + upper 15 bits of rcv_wnd +
+=========+==========+==========================+
With all segments sent, the sender supplies the size of its
window. If the connection is only using 16 bits then this option
not supplied, otherwise the lower 16 bits of the receive window
into the tcp header where it currently resides [6] and the upper 15
bits of the window is put into the data portion of the option C
When the receiver processes the packet it must first reform
window and then process the packet as it would in the absence of
option
Fox [Page 6]
RFC 1106 TCP Big Window and Nak Options June 1989
3.2 Impact of
In implementing the first version of the big window option there
very little change required to the source. State information must
added to the protocol to determine if the big window option is to
used and all 16 bit variables that dealt with window information
now become 32 bit quantities. A future document will describe
more detail the changes required to the 4.3 bsd tcp source code
Test results of the window change only are presented in the appendix
When expanding 16 bit quantities to 32 bit quantities in the
control block in the source (4.3 bsd source) may cause the
to become larger than the mbuf used to hold the structure. Care
be taken to insure this doesn't occur with your system
undetermined events may take place
4. Effects of Big Windows and Naks when used
With big windows alone, transfer times over a satellite were
impressive with the absence of any introduced errors. However,
an error simulator was used to create random errors during transfers
performance went down extremely fast. When the nak option was
to the big window option performance in the face of errors went
some but not to the level that was expected. This section
discuss some issues that were overcome to produce the results
in the appendix
4.1 Window Size and Nak
With out errors, the window size required to keep the data pipe
is equal to the round trip delay * throughput desired, or the
pipe bandwidth (called Z from now on). This and other
assume that processing time of the hosts is negligible. In the
of an error (without NAKs), the window size needs to become
than Z in order to keep the data pipe full while the sender
waiting for the ack of the resent packet. If the window size
equaled to Z and we assume that the retransmission timer is
to Z, then when a packet is lost, the retransmission timer will
off as the last piece of data in the window is sent. In this case
the lost piece of data can be resent with no delay. The data
will empty out because it will take 1/2Z worth of data to get the
back to the sender, an additional 1/2Z worth of data to get the
pipe refilled with new data. This causes the required window to
2Z, 1Z to keep the data pipe full during normal operations and 1Z
keep the data pipe full while waiting for a lost packet to be
and acked
If the same scenario in the last paragraph is used with the
of NAKs, the required window size still needs to be 2Z to
Fox [Page 7]
RFC 1106 TCP Big Window and Nak Options June 1989
wasting any bandwidth in the event of a dropped packet. This
to mean that the nak option does not provide any benefits at all
Testing showed that the retransmission timer was larger than the
pipe and in the event of errors became much bigger than the
pipe, because of the retransmission backoff. Thus, the nak
bounds the required window to 2Z such that in the event of an
there is no lost bandwidth, even with the retransmission
fluctuations. The results in the appendix shows that by using naks
bandwidth waste associated with the retransmission timer facility
eliminated
4.2 Congestions vs
An issue that must be looked at when implementing both the NAKs
big window scheme together is in the area of congestion versus
packets due to the medium, or noise. In the recent
enhancements [1], slow start was introduced so that whenever a
transfer is being started on a connection or right after a
packet, the effective send window would be set to a very small
(typically would equal the MSS being used). This is done so that
new connection would not cause congestion by immediately
the network, and so that an existing connection would back off
network if a packet was dropped due to congestion and allow
network to clear up. If a connection using big windows loses
packet due to the medium (a packet corrupted by an error) the
thing that should be done is to close the send window so that
connection can only send 1 packet and must use the slow
algorithm to slowly work itself back up to sending full windows
of data. This algorithm would quickly limit the usefulness of
big window and nak options over lossy links
On the other hand, if a packet was dropped due to congestion and
sender assumes the packet was dropped because of noise the
will continue sending large amounts of data. This action will
the congestion to continue, more packets will be dropped, and
part of the network will collapse. In this instance, the
would want to back off from sending at the current window limit
Using the current slow start mechanism over a satellite builds up
window too slowly [1]. Possibly a better solution would be for
window to be opened 2*Rlog2(W) instead of R*log2(W) [1] (open
by 2 packets instead of 1 for each acked packet). This will
the wasted bandwidth by opening the window much quicker while
the network a chance to clear up. More experimentation is
to find the optimal rate of opening the window, especially when
windows are being used
The current recommendation for TCP is to use the slow start
in the event of any lost packet. If an application knows that
Fox [Page 8]
RFC 1106 TCP Big Window and Nak Options June 1989
will be using a satellite with a high error rate, it doesn't
sense to force it to use the slow start mechanism for every
packet. Instead, the application should be able to choose
action should happen in the event of a lost packet. In the
environment, a setsockopt call should be provided so that
application may inform TCP to handle lost packets in a special
for this particular connection. If the known error rate of a link
known to be small, then by using slow start with modified rate
above, will cause the amount of bandwidth loss to be very small
respect to the amount of bandwidth actually utilized. In this case
the setsockopt call should not be used. What is really needed is
way for a host to determine if a packet or packets are being
due to congestion or noise. Then, the host can choose to do
right thing. This will require a mechanism like source quench to
used. For this to happen more experimentation is necessary
determine a solid definition on the use of this mechanism. Now it
believed by some that using source quench to avoid congestion
adds to the problem, not help suppress it
The TCP used to gather the results in the appendix for the big
with nak experiment, assumed that lost packets were the result
noise and not congestion. This assumption was used to show how
make the current TCP work in such an environment. The
satellite used in the experiment (when the satellite simulator
not used) only experienced an error rate around 10e-10. With
error rate it is suggested that in practice when big windows are
over the link, TCP should use the slow start mechanism for all
packets with the 2*Rlog2(W) rate discussed above. Under
situations when long delay networks are being used (
DS3 networks using fiber with very low error rates, or
links with low error rates) big windows and naks should be used
the assumption that lost packets are the result of congestion until
better algorithm is devised [7].
Another problem noticed, while testing the affects of slow start
a satellite link, was at times, the retransmission timer was set
restrictive, that milliseconds before a naked packet's ack
received the retransmission timer would go off due to a timed
within the send window. The timer was set at the round trip delay
the network allowing no time for packet processing. If this
went off due to congestion then backing off is the right thing to do
otherwise to avoid the scenario discovered by experimentation,
transmit timer should be set a little longer so that
retransmission timer does not go off too early. Care must be
to make sure the right thing is done in the implementation
question so that a packet isn't retransmitted too soon, and blamed
congestion when in fact, the ack is on its way
Fox [Page 9]
RFC 1106 TCP Big Window and Nak Options June 1989
4.3 Duplicate
Another problem found with the 4.3bsd implementation is in the
of duplicate acks. When the sender of data receives a certain
of acks (3 in the current Berkeley release) that
previously acked data before, it then assumes that a packet has
lost and will resend the one packet assumed lost, and close its
window as if the network is congested and the slow start
mention above will be used to open the send window. This facility
no longer needed since the sender can use the reception of a nak
its indicator that a particular packet was dropped. If the
packet is lost then the retransmit timer will go off and the
will be retransmitted by normal means. If a senders
continues to count duplicate acks the sender will find
possibly receiving many duplicate acks after it has already
the packet due to a nak being received because of the large size
the data pipe. By receiving all of these duplicate acks the
may find itself doing nothing but resending the same packet of
unnecessarily while keeping the send window closed for absolutely
reason. By removing this feature of the implementation a user
expect to find a satellite connection working much better in the
of errors and other connections should not see any performance loss
but a slight improvement in performance if anything at all
5.
This paper has described two new options that if used will make TCP
more efficient protocol in the face of errors and a more
protocol over networks that have a high bandwidth*delay
without decreasing performance over more common networks. If
system that implements the options talks with one that does not,
two systems should still be able to communicate with no problems
This assumes that the system doesn't use the option numbers
in this paper in some other way or doesn't panic when faced with
option that the machine does not implement. Currently at NASA,
are many machines that do not implement either option and
just fine with the systems that do implement them
The drive for implementing big windows has been the direct result
trying to make TCP more efficient over large delay networks [2,3,4,5]
such as a T1 satellite. However, another practical use of
windows is becoming more apparent as the local area networks
developed are becoming faster and supporting much larger MTU's
Hyperchannel, for instances, has been stated to be able to support 1
Mega bit MTU's in their new line of products. With the
implementation of TCP, efficient use of hyperchannel is not
as it should because the physical mediums MTU is larger than
maximum window of the protocol being used. By increasing the
Fox [Page 10]
RFC 1106 TCP Big Window and Nak Options June 1989
window size, better utilization of networks like hyperchannel will
gained instantly because the sender can send 64 Kbyte packets (
limitation) but not have to operate in a stop and wait fashion
Future work is being started to increase the IP maximum datagram
so that even better utilization of fast local area networks will
seen by having the TCP/IP protocols being able to send large
over mediums with very large MTUs. This will hopefully,
the network protocol as the bottleneck in data transfers
workstations and workstation file system technology advances
more so, than it already has
An area of concern when using the big window mechanism is the use
machine resources. When running over a satellite and a packet
dropped such that 2Z (where Z is the round trip delay) worth of
is unacknowledged, both ends of the connection need to be able
buffer the data using machine mbufs (or whatever mechanism
machine uses), usually a valuable and scarce commodity. If
window size is not chosen properly, some machines will crash when
memory is all used up, or it will keep other parts of the system
running. Thus, setting the window to some fairly large
number is not a good idea, especially on a general purpose
where many users log on at any time. What is currently
engineered at NASA is the ability for certain programs to use
setsockopt feature or 4.3bsd asking to use big windows such that
average user may not have access to the large windows, thus
the use of big windows to applications that absolutely need them
to protect a valuable system resource
6.
[1] Jacobson, V., "Congestion Avoidance and Control", SIGCOMM 88,
Stanford, Ca., August 1988.
[2] Jacobson, V., and R. Braden, "TCP Extensions for Long-
Paths", LBL, USC/Information Sciences Institute, RFC 1072,
October 1988.
[3] Cheriton, D., "VMTP: Versatile Message Transaction Protocol",
1045, Stanford University, February 1988.
[4] Clark, D., M. Lambert, and L. Zhang, "NETBLT: A Bulk
Transfer Protocol", RFC 998, MIT, March 1987.
[5] Fox, R., "Draft of Proposed Solution for High Delay Circuit
Transfer", GE/NAS Internal Document, March 1988.
[6] Postel, J., "Transmission Control Protocol - DARPA
Program Protocol Specification", RFC 793, DARPA, September 1981.
Fox [Page 11]
RFC 1106 TCP Big Window and Nak Options June 1989
[7] Leiner, B., "Critical Issues in High Bandwidth Networking",
1077, DARPA, November 1989.
7.
Both options have been implemented and tested. Contained in
section is some performance gathered to support the use of these
options. The satellite channel used was a 1.544 Mbit link with
580ms round trip delay. All values are given as units of bytes
TCP with Big Windows, No Naks
|---------------transfer rates----------------------|
Window Size | no error | 10e-7 error rate | 10e-6 error rate |
-----------------------------------------------------------------
64K | 94K | 53K | 14K |
-----------------------------------------------------------------
72K | 106K | 51K | 15K |
-----------------------------------------------------------------
80K | 115K | 42K | 14K |
-----------------------------------------------------------------
92K | 115K | 43K | 14K |
-----------------------------------------------------------------
100K | 135K | 66K | 15K |
-----------------------------------------------------------------
112K | 126K | 53K | 17K |
-----------------------------------------------------------------
124K | 154K | 45K | 14K |
-----------------------------------------------------------------
136K | 160K | 66K | 15K |
-----------------------------------------------------------------
156K | 167K | 45K | 14K |
-----------------------------------------------------------------
Figure 1.
Fox [Page 12]
RFC 1106 TCP Big Window and Nak Options June 1989
TCP with Big Windows, and Naks
|---------------transfer rates----------------------|
Window Size | no error | 10e-7 error rate | 10e-6 error rate |
-----------------------------------------------------------------
64K | 95K | 83K | 43K |
-----------------------------------------------------------------
72K | 104K | 87K | 49K |
-----------------------------------------------------------------
80K | 117K | 96K | 62K |
-----------------------------------------------------------------
92K | 124K | 119K | 39K |
-----------------------------------------------------------------
100K | 140K | 124K | 35K |
-----------------------------------------------------------------
112K | 151K | 126K | 53K |
-----------------------------------------------------------------
124K | 160K | 140K | 36K |
-----------------------------------------------------------------
136K | 167K | 148K | 38K |
-----------------------------------------------------------------
156K | 167K | 160K | 38K |
-----------------------------------------------------------------
Figure 2.
With a 10e-6 error rate, many naks as well as data packets
dropped, causing the wild swing in transfer times. Also, please
that the machines used are SGI Iris 2500 Turbos with the 3.6 OS
the new TCP enhancements. The performance associated with the
are slower than a Sun 3/260, but due to some source code
the Iris was used. Initial results on the Sun showed slightly
performance and less variance
Author's
Richard
950 Linden #208
Sunnyvale, Cal, 94086
EMail: rfox@tandem.
Fox [Page 13]
if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.
RFC documents can be found at I.E.T.F.
Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX