As per Relevance of the word research, we have this rfc below:
Network Working Group M. Allman,
Request for Comments: 2760 NASA Glenn Research Center/BBN
Category: Informational S.
D.
J.
D.
NASA Glenn Research
T.
University of California at
J.
J.
University of Southern California/
H.
S.
Ohio
K.
The MITRE
J.
Pittsburgh Supercomputing
February 2000
Ongoing TCP Research Related to
Status of this
This memo provides information for the Internet community. It
not specify an Internet standard of any kind. Distribution of
memo is unlimited
Copyright
Copyright (C) The Internet Society (2000). All Rights Reserved
This document outlines possible TCP enhancements that may allow
to better utilize the available bandwidth provided by
containing satellite links. The algorithms and mechanisms
have not been judged to be mature enough to be recommended by
IETF. The goal of this document is to educate researchers as to
current work and progress being done in TCP research related
satellite networks
Allman, et al. Informational [Page 1]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
Table of
1 Introduction. . . . . . . . . . . . . . . . . . . . 2
2 Satellite Architectures . . . . . . . . . . . . . . 3
2.1 Asymmetric Satellite Networks . . . . . . . . . . . 3
2.2 Satellite Link as Last Hop. . . . . . . . . . . . . 3
2.3 Hybrid Satellite Networks . . . . . . . . . . . 4
2.4 Point-to-Point Satellite Networks . . . . . . . . . 4
2.5 Multiple Satellite Hops . . . . . . . . . . . . . . 4
3 Mitigations . . . . . . . . . . . . . . . . . . . . 4
3.1 TCP For Transactions. . . . . . . . . . . . . . . . 4
3.2 Slow Start. . . . . . . . . . . . . . . . . . . . . 5
3.2.1 Larger Initial Window . . . . . . . . . . . . . . . 6
3.2.2 Byte Counting . . . . . . . . . . . . . . . . . . . 7
3.2.3 Delayed ACKs After Slow Start . . . . . . . . . . . 9
3.2.4 Terminating Slow Start. . . . . . . . . . . . . . . 11
3.3 Loss Recovery . . . . . . . . . . . . . . . . . . . 12
3.3.1 Non-SACK Based Mechanisms . . . . . . . . . . . . . 12
3.3.2 SACK Based Mechanisms . . . . . . . . . . . . . . . 13
3.3.3 Explicit Congestion Notification. . . . . . . . . . 16
3.3.4 Detecting Corruption Loss . . . . . . . . . . . . . 18
3.4 Congestion Avoidance. . . . . . . . . . . . . . . . 21
3.5 Multiple Data Connections . . . . . . . . . . . . . 22
3.6 Pacing TCP Segments . . . . . . . . . . . . . . . . 24
3.7 TCP Header Compression. . . . . . . . . . . . . . . 26
3.8 Sharing TCP State Among Similar Connections . . . . 29
3.9 ACK Congestion Control. . . . . . . . . . . . . . . 32
3.10 ACK Filtering . . . . . . . . . . . . . . . . . . . 34
4 Conclusions . . . . . . . . . . . . . . . . . . . . 36
5 Security Considerations . . . . . . . . . . . . . . 36
6 Acknowledgments . . . . . . . . . . . . . . . . . . 37
7 References. . . . . . . . . . . . . . . . . . . . . 37
8 Authors' Addresses. . . . . . . . . . . . . . . . . 43
9 Full Copyright Statement. . . . . . . . . . . . . . 46
1
This document outlines mechanisms that may help the
Control Protocol (TCP) [Pos81] better utilize the bandwidth
by long-delay satellite environments. These mechanisms may also
in other environments or for other protocols. The proposals
in this document are currently being studied throughout the
community. Therefore, these mechanisms are not mature enough to
recommended for wide-spread use by the IETF. However, some of
mechanisms may be safely used today. It is hoped that this
will stimulate further study into the described mechanisms. If,
Allman, et al. Informational [Page 2]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
some point, the mechanisms discussed in this memo prove to be
and appropriate to be recommended for general use, the
IETF documents will be written
It should be noted that non-TCP mechanisms that help performance
satellite links do exist (e.g., application-level changes,
disciplines, etc.). However, outlining these non-TCP mitigations
beyond the scope of this document and therefore is left as
work. Additionally, there are a number of mitigations to TCP'
performance problems that involve very active intervention
gateways along the end-to-end path from the sender to the receiver
Documenting the pros and cons of such solutions is also left
future work
2 Satellite
Specific characteristics of satellite links and the impact
characteristics have on TCP are presented in RFC 2488 [AGS99].
section discusses several possible topologies where satellite
may be integrated into the global Internet. The mitigation
in section 3 will include a discussion of which environment
mechanism is expected to benefit
2.1 Asymmetric Satellite
Some satellite networks exhibit a bandwidth asymmetry, a larger
rate in one direction than the reverse direction, because of
on the transmission power and the antenna size at one end of
link. Meanwhile, some other satellite systems are unidirectional
use a non-satellite return path (such as a dialup modem link).
nature of most TCP traffic is asymmetric with data flowing in
direction and acknowledgments in opposite direction. However,
term asymmetric in this document refers to different
capacities in the forward and return links. Asymmetry has been
to be a problem for TCP [BPK97,BPK98].
2.2 Satellite Link as Last
Satellite links that provide service directly to end users,
opposed to satellite links located in the middle of a network,
allow for specialized design of protocols used over the last hop
Some satellite providers use the satellite link as a shared
speed downlink to users with a lower speed, non-shared
link that is used as a return link for requests and acknowledgments
Many times this creates an asymmetric network, as discussed above
Allman, et al. Informational [Page 3]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
2.3 Hybrid Satellite
In the more general case, satellite links may be located at any
in the network topology. In this case, the satellite link acts
just another link between two gateways. In this environment, a
connection may be sent over terrestrial links (including
wireless), as well as satellite links. On the other hand,
connection could also travel over only the terrestrial network
only over the satellite portion of the network
2.4 Point-to-Point Satellite
In point-to-point satellite networks, the only hop in the network
over the satellite link. This pure satellite environment
only the problems associated with the satellite links, as outlined
[AGS99]. Since this is a private network, some mitigations that
not appropriate for shared networks can be considered
2.5 Multiple Satellite
In some situations, network traffic may traverse multiple
hops between the source and the destination. Such an
aggravates the satellite characteristics described in [AGS99].
3
The following sections will discuss various techniques for
the problems TCP faces in the satellite environment. Each of
following sections will be organized as follows: First,
mitigation will be briefly outlined. Next, research work
the mechanism in question will be briefly discussed. Next
implementation issues of the mechanism will be presented (
whether or not the particular mechanism presents any dangers
shared networks). Then a discussion of the mechanism's
with regard to the topologies outlined above is given. Finally,
relationships and possible interactions with other TCP mechanisms
outlined. The reader is expected to be familiar with the
terminology used in [AGS99].
3.1 TCP For
3.1.1 Mitigation
TCP uses a three-way handshake to setup a connection between
hosts [Pos81]. This connection setup requires 1-1.5 round-trip
(RTTs), depending upon whether the data sender started the
actively or passively. This startup time can be eliminated by
TCP extensions for transactions (T/TCP) [Bra94]. After the
Allman, et al. Informational [Page 4]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
connection between a pair of hosts is established, T/TCP is able
bypass the three-way handshake, allowing the data sender to
transmitting data in the first segment sent (along with the SYN).
This is especially helpful for short request/response traffic, as
saves a potentially long setup phase when no useful data is
transmitted
3.1.2
T/TCP is outlined and analyzed in [Bra92,Bra94].
3.1.3 Implementation
T/TCP requires changes in the TCP stacks of both the data sender
the data receiver. While T/TCP is safe to implement in
networks from a congestion control perspective, several
implications of sending data in the first data segment have
identified [ddKI99].
3.1.4 Topology
It is expected that T/TCP will be equally beneficial in
environments outlined in section 2.
3.1.5 Possible Interaction and Relationships with Other
T/TCP allows data transfer to start more rapidly, much like using
larger initial congestion window (see section 3.2.1), delayed
after slow start (section 3.2.3) or byte counting (section 3.2.2).
3.2 Slow
The slow start algorithm is used to gradually increase the size
TCP's congestion window (cwnd) [Jac88,Ste97,APS99]. The algorithm
an important safe-guard against transmitting an inappropriate
of data into the network when the connection starts up. However
slow start can also waste available network capacity, especially
long-delay networks [All97a,Hay97]. Slow start is
inefficient for transfers that are short compared to
delay*bandwidth product of the network (e.g., WWW transfers).
Delayed ACKs are another source of wasted capacity during the
start phase. RFC 1122 [Bra89] suggests data receivers refrain
ACKing every incoming data segment. However, every second full-
segment should be ACKed. If a second full-sized segment does
arrive within a given timeout, an ACK must be generated (this
cannot exceed 500 ms). Since the data sender increases the size
cwnd based on the number of arriving ACKs, reducing the number
Allman, et al. Informational [Page 5]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
ACKs slows the cwnd growth rate. In addition, when TCP
sending, it sends 1 segment. When using delayed ACKs a
segment must arrive before an ACK is sent. Therefore, the
is always forced to wait for the delayed ACK timer to expire
ACKing the first segment, which also increases the transfer time
Several proposals have suggested ways to make slow start less
consuming. These proposals are briefly outlined below and
to the research work given
3.2.1 Larger Initial
3.2.1.1 Mitigation
One method that will reduce the amount of time required by slow
(and therefore, the amount of wasted capacity) is to increase
initial value of cwnd. An experimental TCP extension outlined
[AFP98] allows the initial size of cwnd to be increased from 1
segment to that given in equation (1).
min (4*MSS, max (2*MSS, 4380 bytes)) (1)
By increasing the initial value of cwnd, more packets are sent
the first RTT of data transmission, which will trigger more ACKs
allowing the congestion window to open more rapidly. In addition,
sending at least 2 segments initially, the first segment does
need to wait for the delayed ACK timer to expire as is the case
the initial size of cwnd is 1 segment (as discussed above).
Therefore, the value of cwnd given in equation 1 saves up to 3
and a delayed ACK timeout when compared to an initial cwnd of 1
segment
Also, we note that RFC 2581 [APS99], a standards-track document
allows a TCP to use an initial cwnd of up to 2 segments. This
is highly recommended for satellite networks
3.2.1.2
Several researchers have studied the use of a larger initial
in various environments. [Nic97] and [KAGT98] show a reduction
WWW page transfer time over hybrid fiber coax (HFC) and
links respectively. Furthermore, it has been shown that using
initial cwnd of 4 segments does not negatively impact
performance over dialup modem links with a small number of
[SP98]. [AHO98] shows an improvement in transfer time for 16
files across the Internet and dialup modem links when using a
initial value for cwnd. However, a slight increase in
Allman, et al. Informational [Page 6]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
segments was also shown. Finally, [PN98] shows improved
time for WWW traffic in simulations with competing traffic,
addition to a small increase in the drop rate
3.2.1.3 Implementation
The use of a larger initial cwnd value requires changes to
sender's TCP stack. Using an initial congestion window of 2
is allowed by RFC 2581 [APS99]. Using an initial congestion
of 3 or 4 segments is not expected to present any danger
congestion collapse [AFP98], however may degrade performance in
networks
3.2.1.4 Topology
It is expected that the use of a large initial window would
equally beneficial to all network architectures outlined in
2.
3.2.1.5 Possible Interaction and Relationships with Other
Using a fixed larger initial congestion window decreases the
of a long RTT on transfer time (especially for short transfers)
the cost of bursting data into a network with unknown conditions.
mechanism that mitigates bursts may make the use of a larger
congestion window more appropriate (e.g., limiting the size of line
rate bursts [FF96] or pacing the segments in a burst [VH97a]).
Also, using delayed ACKs only after slow start (as outlined
section 3.2.3) offers an alternative way to immediately ACK the
segment of a transfer and open the congestion window more rapidly
Finally, using some form of TCP state sharing among a number
connections (as discussed in 3.8) may provide an alternative to
a fixed larger initial window
3.2.2 Byte
3.2.2.1 Mitigation
As discussed above, the wide-spread use of delayed ACKs increases
time needed by a TCP sender to increase the size of the
window during slow start. This is especially harmful to
traversing long-delay GEO satellite links. One mechanism that
been suggested to mitigate the problems caused by delayed ACKs is
use of "byte counting", rather than standard ACK
[All97a,All98]. Using standard ACK counting, the congestion
is increased by 1 segment for each ACK received during slow start
However, using byte counting the congestion window increase is
Allman, et al. Informational [Page 7]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
on the number of previously unacknowledged bytes covered by
incoming ACK, rather than on the number of ACKs received. This
the increase relative to the amount of data transmitted, rather
being dependent on the ACK interval used by the receiver
Two forms of byte counting are studied in [All98]. The first
unlimited byte counting (UBC). This mechanism simply uses the
of previously unacknowledged bytes to increase the congestion
each time an ACK arrives. The second form is limited byte
(LBC). LBC limits the amount of cwnd increase to 2 segments.
limit throttles the size of the burst of data sent in response to
"stretch ACK" [Pax97]. Stretch ACKs are acknowledgments that
more than 2 segments of previously unacknowledged data. Stretch
can occur by design [Joh95] (although this is not standard), due
implementation bugs [All97b,PADHV99] or due to ACK loss. [All98]
shows that LBC prevents large line-rate bursts when compared to UBC
and therefore offers fewer dropped segments and better performance
In addition, UBC causes large bursts during slow start based
recovery due to the large cumulative ACKs that can arrive during
recovery. The behavior of UBC during loss recovery can cause
decreases in performance and [All98] strongly recommends UBC not
deployed without further study into mitigating the large bursts
Note: The standards track RFC 2581 [APS99] allows a TCP to use
counting to increase cwnd during congestion avoidance, however
during slow start
3.2.2.2
Using byte counting, as opposed to standard ACK counting, has
shown to reduce the amount of time needed to increase the value
cwnd to an appropriate size in satellite networks [All97a].
addition, [All98] presents a simulation comparison of byte
and the standard cwnd increase algorithm in uncongested networks
networks with competing traffic. This study found that the
form of byte counting outlined above can improve performance,
also increasing the drop rate slightly
[BPK97,BPK98] also investigated unlimited byte counting
conjunction with various ACK filtering algorithms (discussed
section 3.10) in asymmetric networks
Allman, et al. Informational [Page 8]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
3.2.2.3 Implementation
Changing from ACK counting to byte counting requires changes to
data sender's TCP stack. Byte counting violates the algorithm
increasing the congestion window outlined in RFC 2581 [APS99] (
making congestion window growth more aggressive during slow start
and therefore should not be used in shared networks
3.2.2.4 Topology
It has been suggested by some (and roundly criticized by others)
byte counting will allow TCP to provide uniform cwnd increase
regardless of the ACKing behavior of the receiver. In addition,
counting also mitigates the retarded window growth provided
receivers that generate stretch ACKs because of the capacity of
return link, as discussed in [BPK97,BPK98]. Therefore, this
is expected to be especially beneficial to asymmetric networks
3.2.2.5 Possible Interaction and Relationships with Other
Unlimited byte counting should not be used without a method
mitigate the potentially large line-rate bursts the algorithm
cause. Also, LBC may send bursts that are too large for the
network conditions. In this case, LBC may also benefit from
algorithm that would lessen the impact of line-rate bursts
segments. Also note that using delayed ACKs only after slow
(as outlined in section 3.2.3) negates the limited byte
algorithm because each ACK covers only one segment during slow start
Therefore, both ACK counting and byte counting yield the
increase in the congestion window at this point (in the first RTT).
3.2.3 Delayed ACKs After Slow
3.2.3.1 Mitigation
As discussed above, TCP senders use the number of incoming ACKs
increase the congestion window during slow start. And, since
ACKs reduce the number of ACKs returned by the receiver by
half, the rate of growth of the congestion window is reduced.
proposed solution to this problem is to use delayed ACKs only
the slow start (DAASS) phase. This provides more ACKs while TCP
aggressively increasing the congestion window and less ACKs while
is in steady state, which conserves network resources
Allman, et al. Informational [Page 9]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
3.2.3.2
[All98] shows that in simulation, using delayed ACKs after slow
(DAASS) improves transfer time when compared to a receiver
always generates delayed ACKs. However, DAASS also
increases the loss rate due to the increased rate of cwnd growth
3.2.3.3 Implementation
The major problem with DAASS is in the implementation. The
has to somehow know when the sender is using the slow
algorithm. The receiver could implement a heuristic that attempts
watch the change in the amount of data being received and change
ACKing behavior accordingly. Or, the sender could send a message (
flipped bit in the TCP header, perhaps) indicating that it was
slow start. The implementation of DAASS is, therefore, an
issue
Using DAASS does not violate the TCP congestion control
[APS99]. However, the standards (RFC 2581 [APS99])
recommend using delayed acknowledgments and DAASS goes (partially
against this recommendation
3.2.3.4 Topology
DAASS should work equally well in all scenarios presented in
2. However, in asymmetric networks it may aggravate ACK
in the return link, due to the increased number of ACKs (see
3.9 and 3.10 for a more detailed discussion of ACK congestion).
3.2.3.5 Possible Interaction and Relationships with Other
DAASS has several possible interactions with other proposals made
the research community. DAASS can aggravate congestion on the
between the data receiver and the data sender due to the
number of returning acknowledgments. This can have an
adverse effect on asymmetric networks that are prone to
ACK congestion. As outlined in sections 3.9 and 3.10,
mitigations have been proposed to reduce the number of ACKs that
passed over a low-bandwidth return link. Using DAASS will
the number of ACKs sent by the receiver. The interaction
DAASS and the methods for reducing the number of ACKs is an
research question. Also, as noted in section 3.2.1.5 above,
provides some of the same benefits as using a larger
congestion window and therefore it may not be desirable to use
mechanisms together. However, this remains an open question
Finally, DAASS and limited byte counting are both used to
Allman, et al. Informational [Page 10]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
the rate at which the congestion window is opened. The
algorithm substantially reduces the impact limited byte counting
on the rate of congestion window increase
3.2.4 Terminating Slow
3.2.4.1 Mitigation
The initial slow start phase is used by TCP to determine
appropriate congestion window size for the given network
[Jac88]. Slow start is terminated when TCP detects congestion,
when the size of cwnd reaches the size of the receiver's
window. Slow start is also terminated if cwnd grows beyond a
size. The threshold at which TCP ends slow start and begins
the congestion avoidance algorithm is called "ssthresh" [Jac88].
most implementations, the initial value for ssthresh is
receiver's advertised window. During slow start, TCP roughly
the size of cwnd every RTT and therefore can overwhelm the
with at most twice as many segments as the network can handle.
setting ssthresh to a value less than the receiver's
window initially, the sender may avoid overwhelming the network
twice the appropriate number of segments. Hoe [Hoe96] proposes
the packet-pair algorithm [Kes91] and the measured RTT to determine
more appropriate value for ssthresh. The algorithm observes
spacing between the first few returning ACKs to determine
bandwidth of the bottleneck link. Together with the measured RTT
the delay*bandwidth product is determined and ssthresh is set to
value. When TCP's cwnd reaches this reduced ssthresh, slow start
terminated and transmission continues using congestion avoidance
which is a more conservative algorithm for increasing the size of
congestion window
3.2.4.2
It has been shown that estimating ssthresh can improve
and decrease packet loss in simulations [Hoe96]. However,
an accurate estimate of the available bandwidth in a dynamic
is very challenging, especially attempting to do so on the
side of the TCP connection [AP99]. Therefore, before this
is widely deployed, bandwidth estimation must be studied in a
detail
3.2.4.3 Implementation
As outlined in [Hoe96], estimating ssthresh requires changes to
data sender's TCP stack. As suggested in [AP99], bandwidth
may be more accurate when taken by the TCP receiver, and
both sender and receiver changes would be required.
Allman, et al. Informational [Page 11]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
ssthresh is safe to implement in production networks from
congestion control perspective, as it can only make TCP
conservative than outlined in RFC 2581 [APS99] (assuming the
implementation is using an initial ssthresh of infinity as allowed
[APS99]).
3.2.4.4 Topology
It is expected that this mechanism will work equally well in
symmetric topologies outlined in section 2. However,
links pose a special problem, as the rate of the returning ACKs
not be the bottleneck bandwidth in the forward direction. This
lead to the sender setting ssthresh too low. Premature
of slow start can hurt performance, as congestion avoidance
cwnd more conservatively. Receiver-based bandwidth estimators do
suffer from this problem
3.2.4.5 Possible Interaction and Relationships with Other
Terminating slow start at the right time is useful to avoid
dropped segments. However, using a selective acknowledgment-
loss recovery scheme (as outlined in section 3.3.2) can
improve TCP's ability to quickly recover from multiple lost
Therefore, it may not be as important to terminate slow start
a large loss event occurs. [AP99] shows that using
acknowledgments [Bra89] reduces the effectiveness of sender-
bandwidth estimation. Therefore, using delayed ACKs only during
start (as outlined in section 3.2.3) may make bandwidth
more feasible
3.3 Loss
3.3.1 Non-SACK Based
3.3.1.1 Mitigation
Several similar algorithms have been developed and studied
improve TCP's ability to recover from multiple lost segments in
window of data without relying on the (often long)
timeout. These sender-side algorithms, known as NewReno TCP, do
depend on the availability of selective acknowledgments (SACKs
[MMFR96].
These algorithms generally work by updating the fast
algorithm to use information provided by "partial ACKs" to
retransmissions. A partial ACK covers some new data, but not
data outstanding when a particular loss event starts. For instance
consider the case when segment N is retransmitted using the
Allman, et al. Informational [Page 12]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
retransmit algorithm and segment M is the last segment sent
segment N is resent. If segment N is the only segment lost, the
elicited by the retransmission of segment N would be for segment M
If, however, segment N+1 was also lost, the ACK elicited by
retransmission of segment N will be N+1. This can be taken as
indication that segment N+1 was lost and used to trigger
retransmission
3.3.1.2
Hoe [Hoe95,Hoe96] introduced the idea of using partial ACKs
trigger retransmissions and showed that doing so could
performance. [FF96] shows that in some cases using partial ACKs
trigger retransmissions reduces the time required to recover
multiple lost segments. However, [FF96] also shows that in
cases (many lost segments) relying on the RTO timer can
performance over simply using partial ACKs to trigger
retransmissions. [HK99] shows that using partial ACKs to
retransmissions, in conjunction with SACK, improves performance
compared to TCP using fast retransmit/fast recovery in a
environment. Finally, [FH99] describes several slightly
variants of NewReno
3.3.1.3 Implementation
Implementing these fast recovery enhancements requires changes to
sender-side TCP stack. These changes can safely be implemented
production networks and are allowed by RFC 2581 [APS99].
3.3.1.4 Topology
It is expected that these changes will work well in all
outlined in section 2.
3.3.1.5 Possible Interaction and Relationships with Other
See section 3.3.2.2.5.
3.3.2 SACK Based
3.3.2.1 Fast Recovery with
3.3.2.1.1 Mitigation
Fall and Floyd [FF96] describe a conservative extension to the
recovery algorithm that takes into account information provided
selective acknowledgments (SACKs) [MMFR96] sent by the receiver.
algorithm starts after fast retransmit triggers the resending of
Allman, et al. Informational [Page 13]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
segment. As with fast retransmit, the algorithm cuts cwnd in
when a loss is detected. The algorithm keeps a variable
"pipe", which is an estimate of the number of outstanding segments
the network. The pipe variable is decremented by 1 segment for
duplicate ACK that arrives with new SACK information. The
variable is incremented by 1 for each new or retransmitted
sent. A segment may be sent when the value of pipe is less than
(this segment is either a retransmission per the SACK information
a new segment if the SACK information indicates that no
retransmits are needed).
This algorithm generally allows TCP to recover from multiple
losses in a window of data within one RTT of loss detection.
the forward acknowledgment (FACK) algorithm described below, the
information allows the pipe algorithm to decouple the choice of
to send a segment from the choice of what segment to send
[APS99] allows the use of this algorithm, as it is consistent
the spirit of the fast recovery algorithm
3.3.2.1.2
[FF96] shows that the above described SACK algorithm performs
than several non-SACK based recovery algorithms when 1--4
are lost from a window of data. [AHKO97] shows that the
improves performance over satellite links. Hayes [Hay97] shows
in certain circumstances, the SACK algorithm can hurt performance
generating a large line-rate burst of data at the end of
recovery, which causes further loss
3.3.2.1.3 Implementation
This algorithm is implemented in the sender's TCP stack. However,
relies on SACK information generated by the receiver. This
is safe for shared networks and is allowed by RFC 2581 [APS99].
3.3.2.1.4 Topology
It is expected that the pipe algorithm will work equally well in
scenarios presented in section 2.
3.3.2.1.5 Possible Interaction and Relationships with Other
See section 3.3.2.2.5.
Allman, et al. Informational [Page 14]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
3.3.2.2 Forward
3.3.2.2.1 Mitigation
The Forward Acknowledgment (FACK) algorithm [MM96a,MM96b]
developed to improve TCP congestion control during loss recovery
FACK uses TCP SACK options to glean additional information about
congestion state, adding more precise control to the injection
data into the network during recovery. FACK decouples the
control algorithms from the data recovery algorithms to provide
simple and direct way to use SACK to improve congestion control.
to the separation of these two algorithms, new data may be
during recovery to sustain TCP's self-clock when there is no
data to retransmit
The most recent version of FACK is Rate-Halving [MM96b], in which
packet is sent for every two ACKs received during recovery
Transmitting a segment for every-other ACK has the result of
the congestion window in one round trip to half of the number
packets that were successfully handled by the network (so when
is too large by more than a factor of two it still gets reduced
half of what the network can sustain). Another important aspect
FACK with Rate-Halving is that it sustains the ACK self-clock
recovery because transmitting a packet for every-other ACK does
require half a cwnd of data to drain from the network
transmitting, as required by the fast recovery
[Ste97,APS99].
In addition, the FACK with Rate-Halving implementation
Thresholded Retransmission to each lost segment. "Tcprexmtthresh"
the number of duplicate ACKs required by TCP to trigger a
retransmit and enter recovery. FACK applies
retransmission to all segments by waiting until tcprexmtthresh
blocks indicate that a given segment is missing before resending
segment. This allows reasonable behavior on links that
segments. As described above, FACK sends a segment for every
ACK received during recovery. New segments are transmitted
when tcprexmtthresh SACK blocks have been observed for a
segment, at which point the dropped segment is retransmitted
[APS99] allows the use of this algorithm, as it is consistent
the spirit of the fast recovery algorithm
3.3.2.2.2
The original FACK algorithm is outlined in [MM96a]. The
was later enhanced to include Rate-Halving [MM96b]. The real-
performance of FACK with Rate-Halving was shown to be much closer
Allman, et al. Informational [Page 15]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
the theoretical maximum for TCP than either TCP Reno or the SACK
based extensions to fast recovery outlined in section 3.3.2.1
[MSMO97].
3.3.2.2.3 Implementation
In order to use FACK, the sender's TCP stack must be modified.
addition, the receiver must be able to generate SACK options
obtain the full benefit of using FACK. The FACK algorithm is
for shared networks and is allowed by RFC 2581 [APS99].
3.3.2.2.4 Topology
FACK is expected to improve performance in all environments
in section 2. Since it is better able to sustain its self-clock
TCP Reno, it may be considerably more attractive over long
paths
3.3.2.2.5 Possible Interaction and Relationships with Other
Both SACK based loss recovery algorithms described above (the
recovery enhancement and the FACK algorithm) are similar in that
attempt to effectively repair multiple lost segments from a window
data. Which of the SACK-based loss recovery algorithms to use
still an open research question. In addition, these algorithms
similar to the non-SACK NewReno algorithm described in section 3.3.1,
in that they attempt to recover from multiple lost segments
reverting to using the retransmission timer. As has been shown,
above SACK based algorithms are more robust than the
algorithm. However, the SACK algorithm requires a cooperating
receiver, which the NewReno algorithm does not. A reasonable
implementation might include both a SACK-based and a NewReno-
loss recovery algorithm such that the sender can use the
appropriate loss recovery algorithm based on whether or not
receiver supports SACKs. Finally, both SACK-based and non-SACK-
versions of fast recovery have been shown to transmit a large
of data upon leaving loss recovery, in some cases [Hay97].
Therefore, the algorithms may benefit from some burst
algorithm
3.3.3 Explicit Congestion
3.3.3.1 Mitigation
Explicit congestion notification (ECN) allows routers to inform
senders about imminent congestion without dropping segments.
major forms of ECN have been studied. A router employing
ECN (BECN), transmits messages directly to the data
Allman, et al. Informational [Page 16]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
informing it of congestion. IP routers can accomplish this with
ICMP Source Quench message. The arrival of a BECN signal may or
not mean that a TCP data segment has been dropped, but it is a
indication that the TCP sender should reduce its sending rate (i.e.,
the value of cwnd). The second major form of congestion
is forward ECN (FECN). FECN routers mark data segments with
special tag when congestion is imminent, but forward the
segment. The data receiver then echos the congestion
back to the sender in the ACK packet. A description of a
mechanism for TCP/IP is given in [RF99].
As described in [RF99], senders transmit segments with an "ECN
Capable Transport" bit set in the IP header of each packet. If
router employing an active queueing strategy, such as Random
Detection (RED) [FJ93,BCC+98], would otherwise drop this segment,
"Congestion Experienced" bit in the IP header is set instead.
reception, the information is echoed back to TCP senders using a
in the TCP header. The TCP sender adjusts the congestion window
as it would if a segment was dropped
The implementation of ECN as specified in [RF99] requires
deployment of active queue management mechanisms in the
routers. This allows the routers to signal congestion by sending
a small number of "congestion signals" (segment drops or
messages), rather than discarding a large number of segments, as
happen when TCP overwhelms a drop-tail router queue
Since satellite networks generally have higher bit-error rates
terrestrial networks, determining whether a segment was lost due
congestion or corruption may allow TCP to achieve better
in high BER environments than currently possible (due to TCP'
assumption that all loss is due to congestion). While not a
to this problem, adding an ECN mechanism to TCP may be a part of
mechanism that will help achieve this goal. See section 3.3.4 for
more detailed discussion of differentiating between corruption
congestion based losses
3.3.3.2
[Flo94] shows that ECN is effective in reducing the segment loss
which yields better performance especially for short and
TCP connections. Furthermore, [Flo94] also shows that ECN
some unnecessary, and costly TCP retransmission timeouts. Finally
[Flo94] also considers some of the advantages and disadvantages
various forms of explicit congestion notification
Allman, et al. Informational [Page 17]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
3.3.3.3 Implementation
Deployment of ECN requires changes to the TCP implementation on
sender and receiver. Additionally, deployment of ECN
deployment of some active queue management infrastructure in routers
RED is assumed in most ECN discussions, because RED is
identifying segments to drop, even before its buffer space
exhausted. ECN simply allows the delivery of "marked" segments
still notifying the end nodes that congestion is occurring along
path. ECN is safe (from a congestion control perspective) for
networks, as it maintains the same TCP congestion control
as are used when congestion is detected via segment drops
3.3.3.4 Topology
It is expected that none of the environments outlined in section 2
will present a bias towards or against ECN traffic
3.3.3.5 Possible Interaction and Relationships with Other
Note that some form of active queueing is necessary to use ECN (e.g.,
RED queueing).
3.3.4 Detecting Corruption
Differentiating between congestion (loss of segments due to
buffer overflow or imminent buffer overflow) and corruption (loss
segments due to damaged bits) is a difficult problem for TCP.
differentiation is particularly important because the action that
should take in the two cases is entirely different. In the case
corruption, TCP should merely retransmit the damaged segment as
as its loss is detected; there is no need for TCP to adjust
congestion window. On the other hand, as has been widely
above, when the TCP sender detects congestion, it should
reduce its congestion window to avoid making the congestion worse
TCP's defined behavior, as motivated by [Jac88,Jac90] and defined
[Bra89,Ste97,APS99], is to assume that all loss is due to
and to trigger the congestion control algorithms, as defined
[Ste97,APS99]. The loss may be detected using the fast
algorithm, or in the worst case is detected by the expiration
TCP's retransmission timer
TCP's assumption that loss is due to congestion rather
corruption is a conservative mechanism that prevents
collapse [Jac88,FF98]. Over satellite networks, however, as in
wireless environments, loss due to corruption is more common than
terrestrial networks. One common partial solution to this problem
Allman, et al. Informational [Page 18]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
to add Forward Error Correction (FEC) to the data that's sent
the satellite/wireless link. A more complete discussion of
benefits of FEC can be found in [AGS99]. However, given that
does not always work or cannot be universally applied,
mechanisms have been studied to attempt to make TCP able
differentiate between congestion-based and corruption-based loss
TCP segments that have been corrupted are most often dropped
intervening routers when link-level checksum mechanisms detect
an incoming frame has errors. Occasionally, a TCP segment
an error may survive without detection until it arrives at the
receiving host, at which point it will almost always either fail
IP header checksum or the TCP checksum and be discarded as in
link-level error case. Unfortunately, in either of these cases, it'
not generally safe for the node detecting the corruption to
information about the corrupt packet to the TCP sender because
sending address itself might have been corrupted
3.3.4.1 Mitigation
Because the probability of link errors on a satellite link
relatively greater than on a hardwired link, it is
important that the TCP sender retransmit these lost segments
reducing its congestion window. Because corrupt segments do
indicate congestion, there is no need for the TCP sender to enter
congestion avoidance phase, which may waste available bandwidth
Simulations performed in [SF98] show a performance improvement
TCP can properly differentiate between between corruption
congestion of wireless links
Perhaps the greatest research challenge in detecting corruption
getting TCP (a transport-layer protocol) to receive
information from either the network layer (IP) or the link layer
Much of the work done to date has involved link-layer mechanisms
retransmit damaged segments. The challenge seems to be to get
mechanisms to make repairs in such a way that TCP understands
happened and can respond appropriately
3.3.4.2
Research into corruption detection to date has focused primarily
making the link level detect errors and then perform link-
retransmissions. This work is summarized in [BKVP97,BPSK96]. One
the problems with this promising technique is that it causes
effective reordering of the segments from the TCP receiver's point
view. As a simple example, if segments A B C D are sent across
noisy link and segment B is corrupted, segments C and D may
already crossed the link before B can be retransmitted at the
Allman, et al. Informational [Page 19]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
level, causing them to arrive at the TCP receiver in the order A C
B. This segment reordering would cause the TCP receiver to
duplicate ACKs upon the arrival of segments C and D. If
reordering was bad enough, the sender would trigger the
retransmit algorithm in the TCP sender, in response to the
ACKs. Research presented in [MV98] proposes the idea of
or delaying the duplicate ACKs in the reverse direction to
this behavior. Alternatively, proposals that make TCP more robust
the face of re-ordered segment arrivals [Flo99] may reduce the
effects of the re-ordering caused by link-layer retransmissions
A more high-level approach, outlined in the [DMT96], uses a
"corruption experienced" ICMP error message generated by routers
detect corruption. These messages are sent in the forward direction
toward the packet's destination, rather than in the reverse
as is done with ICMP Source Quench messages. Sending the
messages in the forward direction allows this feedback to work
asymmetric paths. As noted above, generating an error message
response to a damaged packet is problematic because the source
destination addresses may not be valid. The mechanism outlined
[DMT96] gets around this problem by having the routers maintain
small cache of recent packet destinations; when the
experiences an error rate above some threshold, it sends an
corruption-experienced message to all of the destinations in
cache. Each TCP receiver then must return this information to
respective TCP sender (through a TCP option). Upon receiving an
with this "corruption-experienced" option, the TCP sender
that packet loss is due to corruption rather than congestion for
round trip times (RTT) or until it receives additional link
information (such as "link down", source quench, or
"corruption experienced" messages). Note that in shared networks
ignoring segment loss for 2 RTTs may aggravate congestion by
TCP unresponsive
3.3.4.3 Implementation
All of the techniques discussed above require changes to at least
TCP sending and receiving stacks, as well as intermediate routers
Due to the concerns over possibly ignoring congestion signals (i.e.,
segment drops), the above algorithm is not recommended for use
shared networks
3.3.4.4 Topology
It is expected that corruption detection, in general would
beneficial in all environments outlined in section 2. It would
particularly beneficial in the satellite/wireless environment
which these errors may be more prevalent
Allman, et al. Informational [Page 20]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
3.3.4.5 Possible Interaction and Relationships with Other
SACK-based loss recovery algorithms (as described in 3.3.2)
reduce the impact of corrupted segments on mostly clean links
recovery will be able to happen more rapidly (and without relying
the retransmission timer). Note that while SACK-based loss
helps, throughput will still suffer in the face of non-
related packet loss
3.4 Congestion
3.4.1 Mitigation
During congestion avoidance, in the absence of loss, the TCP
adds approximately one segment to its congestion window during
RTT [Jac88,Ste97,APS99]. Several researchers have observed that
policy leads to unfair sharing of bandwidth when multiple
with different RTTs traverse the same bottleneck link, with the
RTT connections obtaining only a small fraction of their fair
of the bandwidth
One effective solution to this problem is to deploy fair queueing
TCP-friendly buffer management in network routers [Sut98]. However
in the absence of help from the network, other researchers
investigated changes to the congestion avoidance policy at the
sender, as described in [Flo91,HK98].
3.4.2
The "Constant-Rate" increase policy has been studied in [Flo91,HK98].
It attempts to equalize the rate at which TCP senders increase
sending rate during congestion avoidance. Both [Flo91] and [HK98]
illustrate cases in which the "Constant-Rate" policy largely
the bias against long RTT connections, although [HK98] presents
evidence that such a policy may be difficult to incrementally
in an operational network. The proper selection of a constant (
the constant rate of increase) is an open issue
The "Increase-by-K" policy can be selectively used by long
connections in a heterogeneous environment. This policy
changes the slope of the linear increase, with connections over
given RTT threshold adding "K" segments to the congestion
every RTT, instead of one. [HK98] presents evidence that
policy, when used with small values of "K", may be successful
reducing the unfairness while keeping the link utilization high,
a small number of connections share a bottleneck link. The
of the constant "K," the RTT threshold to invoke this policy,
performance under a large number of flows are all open issues
Allman, et al. Informational [Page 21]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
3.4.3 Implementation
Implementation of either the "Constant-Rate" or "Increase-by-K
policies requires a change to the congestion avoidance mechanism
the TCP sender. In the case of "Constant-Rate," such a change
be implemented globally. Additionally, the TCP sender must have
reasonably accurate estimate of the RTT of the connection.
algorithms outlined above violate the congestion avoidance
as outlined in RFC 2581 [APS99] and therefore should not
implemented in shared networks at this time
3.4.4 Topology
These solutions are applicable to all satellite networks that
integrated with a terrestrial network, in which satellite
may be competing with terrestrial connections for the same
link
3.4.5 Possible Interaction and Relationships with Other
As shown in [PADHV99], increasing the congestion window by
segments per RTT can cause TCP to drop multiple segments and force
retransmission timeout in some versions of TCP. Therefore, the
changes to the congestion avoidance algorithm may need to
accompanied by a SACK-based loss recovery algorithm that can
repair multiple dropped segments
3.5 Multiple Data
3.5.1 Mitigation
One method that has been used to overcome TCP's inefficiencies in
satellite environment is to use multiple TCP flows to transfer
given file. The use of N TCP connections makes the sender N
more aggressive and therefore can improve throughput in
situations. Using N multiple TCP connections can impact the
and the network in a number of ways, which are listed below
1. The transfer is able to start transmission using an
congestion window of N segments, rather than a single segment
one TCP flow uses. This allows the transfer to more
increase the effective cwnd size to an appropriate size for
given network. However, in some circumstances an initial
of N segments is inappropriate for the network conditions.
this case, a transfer utilizing more than one connection
aggravate congestion
Allman, et al. Informational [Page 22]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
2. During the congestion avoidance phase, the transfer increases
effective cwnd by N segments per RTT, rather than the one
per RTT increase that a single TCP connection provides. Again
this can aid the transfer by more rapidly increasing the
cwnd to an appropriate point. However, this rate of increase
also be too aggressive for the network conditions. In this case
the use of multiple data connections can aggravate congestion
the network
3. Using multiple connections can provide a very large
congestion window. This can be an advantage for
implementations that do not support the TCP window
extension [JBB92]. However, the aggregate cwnd size across all
connections is equivalent to using a TCP implementation
supports large windows
4. The overall cwnd decrease in the face of dropped segments
reduced when using N parallel connections. A single
connection reduces the effective size of cwnd to half when
single segment loss is detected. When utilizing N
each using a window of W bytes, a single drop reduces the
to
(N * W) - (W / 2)
Clearly this is a less dramatic reduction in the effective cwnd
than when using a single TCP connection. And, the amount by
the cwnd is decreased is further reduced by increasing N
The use of multiple data connections can increase the ability
non-SACK TCP implementations to quickly recover from multiple
segments without resorting to a timeout, assuming the
segments cross connections
The use of multiple parallel connections makes TCP overly
for many environments and can contribute to congestive collapse
shared networks [FF99]. The advantages provided by using
TCP connections are now largely provided by TCP extensions (
windows, SACKs, etc.). Therefore, the use of a single TCP
is more "network friendly" than using multiple parallel connections
However, using multiple parallel TCP connections may
performance improvement in private networks
Allman, et al. Informational [Page 23]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
3.5.2
Research on the use of multiple parallel TCP connections
improved performance [IL92,Hah94,AOK95,AKO96]. In addition,
has shown that multiple TCP connections can outperform a
modern TCP connection (with large windows and SACK) [AHKO97].
However, these studies did not consider the impact of using
TCP connections on competing traffic. [FF99] argues that
multiple simultaneous connections to transfer a given file may
to congestive collapse in shared networks
3.5.3 Implementation
To utilize multiple parallel TCP connections a client application
the corresponding server must be customized. As outlined in [FF99]
using multiple parallel TCP connections is not safe (from
congestion control perspective) in shared networks and should not
used
3.5.4 Topological
As stated above, [FF99] outlines that the use of multiple
connections in a shared network, such as the Internet, may lead
congestive collapse. However, the use of multiple connections may
safe and beneficial in private networks. The specific topology
used will dictate the number of parallel connections required.
work has been done to determine the appropriate number of
on the fly [AKO96], but such a mechanism is far from complete
3.5.5 Possible Interaction and Relationships with Other
Using multiple concurrent TCP connections enables use of a
congestion window, much like the TCP window scaling option [JBB92].
In addition, a larger initial congestion window is achieved,
to using [AFP98] or TCB sharing (see section 3.8).
3.6 Pacing TCP
3.6.1 Mitigation
Slow-start takes several round trips to fully open the TCP
window over routes with high bandwidth-delay products. For short
connections (such as WWW traffic with HTTP/1.0), the slow-
overhead can preclude effective use of the high-bandwidth
links. When senders implement slow-start restart after a
connection goes idle (suggested by Jacobson and Karels [JK92]),
Allman, et al. Informational [Page 24]
RFC 2760 Ongoing TCP Research Related to Satellites February 2000
performance is reduced in long-lived (but bursty) connections (
as HTTP/1.1, which uses persistent TCP connections to
multiple WWW page elements) [Hei97a].
Rate-based pacing (RBP) is a technique, used in the absence
incoming ACKs, where the data sender temporarily paces TCP
at a given rate to restart the ACK clock. Upon receipt of the
ACK, pacing is discontinued and normal TCP ACK clocking resumes.
pacing rate may either be known from recent traffic estimates (
restarting an idle connection or from recent prior connections),
may be known through external means (perhaps in a point-to-point
point-to-multipoint satellite network where available bandwidth
be assumed to be large).
In addition, pacing data during the first RTT of a transfer may
TCP to make effective use of high bandwidth-delay links even
short transfers. However, in order to pace segments during the
RTT a TCP will have to be using a non-standard initial
window and a new mechanism to pace outgoing segments rather than
them back-to-back. Determining an appropriate size for the
cwnd is an open research question. Pacing can also be used to
bursts in general (due to buggy TCPs or byte counting, see
3.2.2 for a discussion on byte counting).
3.6.2
Simulation studies of rate-paced pacing for WWW-like traffic
shown reductions in router congestion and drop rates [VH97a].
this environment, RBP substantially improves performance compared
slow-start-after-idle for intermittent senders, and it
improves performance over burst-full-cwnd-after-idle (because
drops) [VH98]. More recently, pacing has been suggested to
burstiness in networks with ACK filtering [BPK97].
3.6.3 Implementation
RBP requires only sender-side changes to TCP.
implementations of RBP are available [VH97b]. RBP requires
additional sender timer for pacing. The overhead of timer-
data transfer is often considered too high for practical use
Preliminary experiments suggest that in RBP this overhead is
because RBP only requires this timer