As per Relevance of the word addition, we have this rfc below:
Network Working Group K.
Request for Comments: 3168 TeraOptic
Updates: 2474, 2401, 793 S.
Obsoletes: 2481
Category: Standards Track D.
September 2001
The Addition of Explicit Congestion Notification (ECN) to
Status of this
This document specifies an Internet standards track protocol for
Internet community, and requests discussion and suggestions
improvements. Please refer to the current edition of the "
Official Protocol Standards" (STD 1) for the standardization
and status of this protocol. Distribution of this memo is unlimited
Copyright
Copyright (C) The Internet Society (2001). All Rights Reserved
This memo specifies the incorporation of ECN (Explicit
Notification) to TCP and IP, including ECN's use of two bits in
IP header
Table of
1. Introduction.................................................. 3
2. Conventions and Acronyms...................................... 5
3. Assumptions and General Principles............................ 5
4. Active Queue Management (AQM)................................. 6
5. Explicit Congestion Notification in IP........................ 6
5.1. ECN as an Indication of Persistent Congestion............... 10
5.2. Dropped or Corrupted Packets................................ 11
5.3. Fragmentation............................................... 11
6. Support from the Transport Protocol........................... 12
6.1. TCP......................................................... 13
6.1.1 TCP Initialization......................................... 14
6.1.1.1. Middlebox Issues........................................ 16
6.1.1.2. Robust TCP Initialization with an Echoed Reserved Field. 17
6.1.2. The TCP Sender............................................ 18
6.1.3. The TCP Receiver.......................................... 19
6.1.4. Congestion on the ACK-path................................ 20
6.1.5. Retransmitted TCP packets................................. 20
Ramakrishnan, et al. Standards Track [Page 1]
RFC 3168 The Addition of ECN to IP September 2001
6.1.6. TCP Window Probes......................................... 22
7. Non-compliance by the End Nodes............................... 22
8. Non-compliance in the Network................................. 24
8.1. Complications Introduced by Split Paths..................... 25
9. Encapsulated Packets.......................................... 25
9.1. IP packets encapsulated in IP............................... 25
9.1.1. The Limited-functionality and Full-functionality Options.. 27
9.1.2. Changes to the ECN Field within an IP Tunnel.............. 28
9.2. IPsec Tunnels............................................... 29
9.2.1. Negotiation between Tunnel Endpoints...................... 31
9.2.1.1. ECN Tunnel Security Association Database Field.......... 32
9.2.1.2. ECN Tunnel Security Association Attribute............... 32
9.2.1.3. Changes to IPsec Tunnel Header Processing............... 33
9.2.2. Changes to the ECN Field within an IPsec Tunnel........... 35
9.2.3. Comments for IPsec Support................................ 35
9.3. IP packets encapsulated in non-IP Packet Headers............ 36
10. Issues Raised by Monitoring and Policing Devices............. 36
11. Evaluations of ECN........................................... 37
11.1. Related Work Evaluating ECN................................ 37
11.2. A Discussion of the ECN nonce.............................. 37
11.2.1. The Incremental Deployment of ECT(1) in Routers.......... 38
12. Summary of changes required in IP and TCP.................... 38
13. Conclusions.................................................. 40
14. Acknowledgements............................................. 41
15. References................................................... 41
16. Security Considerations...................................... 45
17. IPv4 Header Checksum Recalculation........................... 45
18. Possible Changes to the ECN Field in the Network............. 45
18.1. Possible Changes to the IP Header.......................... 46
18.1.1. Erasing the Congestion Indication........................ 46
18.1.2. Falsely Reporting Congestion............................. 47
18.1.3. Disabling ECN-Capability................................. 47
18.1.4. Falsely Indicating ECN-Capability........................ 47
18.2. Information carried in the Transport Header................ 48
18.3. Split Paths................................................ 49
19. Implications of Subverting End-to-End Congestion Control..... 50
19.1. Implications for the Network and for Competing Flows....... 50
19.2. Implications for the Subverted Flow........................ 53
19.3. Non-ECN-Based Methods of Subverting End-to-end
Control.................................................... 54
20. The Motivation for the ECT Codepoints........................ 54
20.1. The Motivation for an ECT Codepoint........................ 54
20.2. The Motivation for two ECT Codepoints...................... 55
21. Why use Two Bits in the IP Header?........................... 57
22. Historical Definitions for the IPv4 TOS Octet................ 58
23. IANA Considerations.......................................... 60
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet................. 60
23.2. TCP Header Flags........................................... 61
Ramakrishnan, et al. Standards Track [Page 2]
RFC 3168 The Addition of ECN to IP September 2001
23.3. IPSEC Security Association Attributes....................... 62
24. Authors' Addresses........................................... 62
25. Full Copyright Statement..................................... 63
1.
We begin by describing TCP's use of packet drops as an indication
congestion. Next we explain that with the addition of active
management (e.g., RED) to the Internet infrastructure, where
detect congestion before the queue overflows, routers are no
limited to packet drops as an indication of congestion. Routers
instead set the Congestion Experienced (CE) codepoint in the
header of packets from ECN-capable transports. We describe when
CE codepoint is to be set in routers, and describe
needed to TCP to make it ECN-capable. Modifications to
transport protocols (e.g., unreliable unicast or multicast,
multicast, other reliable unicast transport protocols) could
considered as those protocols are developed and advance through
standards process. We also describe in this document the
involving the use of ECN within IP tunnels, and within IPsec
in particular
One of the guiding principles for this document is that, to
extent possible, the mechanisms specified here be
deployable. One challenge to the principle of incremental
has been the prior existence of some IP tunnels that were
compatible with the use of ECN. As ECN becomes deployed, non
compatible IP tunnels will have to be upgraded to conform to
document
This document obsoletes RFC 2481, "A Proposal to add
Congestion Notification (ECN) to IP", which defined ECN as
Experimental Protocol for the Internet Community. This document
updates RFC 2474, "Definition of the Differentiated Services
(DS Field) in the IPv4 and IPv6 Headers", in defining the ECN
in the IP header, RFC 2401, "Security Architecture for the
Protocol" to change the handling of IPv4 TOS Byte and IPv6
Class Octet in tunnel mode header construction to be compatible
the use of ECN, and RFC 793, "Transmission Control Protocol",
defining two new flags in the TCP header
TCP's congestion control and avoidance algorithms are based on
notion that the network is a black-box [Jacobson88, Jacobson90].
network's state of congestion or otherwise is determined by end
systems probing for the network state, by gradually increasing
load on the network (by increasing the window of packets that
outstanding in the network) until the network becomes congested and
packet is lost. Treating the network as a "black-box" and
Ramakrishnan, et al. Standards Track [Page 3]
RFC 3168 The Addition of ECN to IP September 2001
loss as an indication of congestion in the network is appropriate
pure best-effort data carried by TCP, with little or no
to delay or loss of individual packets. In addition, TCP'
congestion management algorithms have techniques built-in (such
Fast Retransmit and Fast Recovery) to minimize the impact of losses
from a throughput perspective. However, these mechanisms are
intended to help applications that are in fact sensitive to the
or loss of one or more individual packets. Interactive traffic
as telnet, web-browsing, and transfer of audio and video data can
sensitive to packet losses (especially when using an unreliable
delivery transport such as UDP) or to the increased latency of
packet caused by the need to retransmit the packet after a loss (
the reliable data delivery semantics provided by TCP).
Since TCP determines the appropriate congestion window to use
gradually increasing the window size until it experiences a
packet, this causes the queues at the bottleneck router to build up
With most packet drop policies at the router that are not
to the load placed by each individual flow (e.g., tail-drop on
overflow), this means that some of the packets of latency-
flows may be dropped. In addition, such drop policies lead
synchronization of loss across multiple flows
Active queue management mechanisms detect congestion before the
overflows, and provide an indication of this congestion to the
nodes. Thus, active queue management can reduce unnecessary
delay for all traffic sharing that queue. The advantages of
queue management are discussed in RFC 2309 [RFC2309]. Active
management avoids some of the bad properties of dropping on
overflow, including the undesirable synchronization of loss
multiple flows. More importantly, active queue management means
transport protocols with mechanisms for congestion control (e.g.,
TCP) do not have to rely on buffer overflow as the only indication
congestion
Active queue management mechanisms may use one of several methods
indicating congestion to end-nodes. One is to use packet drops, as
currently done. However, active queue management allows the router
separate policies of queuing or dropping packets from the
for indicating congestion. Thus, active queue management
routers to use the Congestion Experienced (CE) codepoint in a
header as an indication of congestion, instead of relying solely
packet drops. This has the potential of reducing the impact of
on latency-sensitive flows
Ramakrishnan, et al. Standards Track [Page 4]
RFC 3168 The Addition of ECN to IP September 2001
There exist some middleboxes (firewalls, load balancers, or
detection systems) in the Internet that either drop a TCP SYN
configured to negotiate ECN, or respond with a RST. This
specifies procedures that TCP implementations may use to
robust connectivity even in the presence of such equipment
2. Conventions and
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in
document, are to be interpreted as described in [RFC2119].
3. Assumptions and General
In this section, we describe some of the important design
and assumptions that guided the design choices in this proposal
* Because ECN is likely to be adopted gradually,
migration is essential. Some routers may still only drop
to indicate congestion, and some end-systems may not be ECN
capable. The most viable strategy is one that
incremental deployment without having to resort to "islands"
ECN-capable and non-ECN-capable environments
* New mechanisms for congestion control and avoidance need to co
exist and cooperate with existing mechanisms for
control. In particular, new mechanisms have to co-exist
TCP's current methods of adapting to congestion and
routers' current practice of dropping packets in periods
congestion
* Congestion may persist over different time-scales. The
scales that we are concerned with are congestion events that
last longer than a round-trip time
* The number of packets in an individual flow (e.g.,
connection or an exchange using UDP) may range from a
number of packets to quite a large number. We are interested
managing the congestion caused by flows that send enough
so that they are still active when network feedback
them
* Asymmetric routing is likely to be a normal occurrence in
Internet. The path (sequence of links and routers) followed
data packets may be different from the path followed by
acknowledgment packets in the reverse direction
Ramakrishnan, et al. Standards Track [Page 5]
RFC 3168 The Addition of ECN to IP September 2001
* Many routers process the "regular" headers in IP packets
efficiently than they process the header information in
options. This suggests keeping congestion
information in the regular headers of an IP packet
* It must be recognized that not all end-systems will cooperate
mechanisms for congestion control. However, new
shouldn't make it easier for TCP applications to disable
congestion control. The benefit of lying about participating
new mechanisms such as ECN-capability should be small
4. Active Queue Management (AQM
Random Early Detection (RED) is one mechanism for Active
Management (AQM) that has been proposed to detect
congestion [FJ93], and is currently being deployed in the
[RFC2309]. AQM is meant to be a general mechanism using one
several alternatives for congestion indication, but in the absence
ECN, AQM is restricted to using packet drops as a mechanism
congestion indication. AQM drops packets based on the average
length exceeding a threshold, rather than only when the
overflows. However, because AQM may drop packets before the
actually overflows, AQM is not always forced by memory limitations
discard the packet
AQM can set a Congestion Experienced (CE) codepoint in the
header instead of dropping the packet, when such a field is
in the IP header and understood by the transport protocol. The
of the CE codepoint with ECN allows the receiver(s) to receive
packet, avoiding the potential for excessive delays due
retransmissions after packet losses. We use the term 'CE packet'
denote a packet that has the CE codepoint set
5. Explicit Congestion Notification in
This document specifies that the Internet provide a
indication for incipient congestion (as in RED and earlier
[RJ90]) where the notification can sometimes be through
packets rather than dropping them. This uses an ECN field in the
header with two bits, making four ECN codepoints, '00' to '11'.
ECN-Capable Transport (ECT) codepoints '10' and '01' are set by
data sender to indicate that the end-points of the transport
are ECN-capable; we call them ECT(0) and ECT(1) respectively.
phrase "the ECT codepoint" in this documents refers to either of
two ECT codepoints. Routers treat the ECT(0) and ECT(1)
as equivalent. Senders are free to use either the ECT(0) or
ECT(1) codepoint to indicate ECT, on a packet-by-packet basis
Ramakrishnan, et al. Standards Track [Page 6]
RFC 3168 The Addition of ECN to IP September 2001
The use of both the two codepoints for ECT, ECT(0) and ECT(1),
motivated primarily by the desire to allow mechanisms for the
sender to verify that network elements are not erasing the
codepoint, and that data receivers are properly reporting to
sender the receipt of packets with the CE codepoint set, as
by the transport protocol. Guidelines for the senders and
to differentiate between the ECT(0) and ECT(1) codepoints will
addressed in separate documents, for each transport protocol.
particular, this document does not address mechanisms for TCP end
nodes to differentiate between the ECT(0) and ECT(1) codepoints
Protocols and senders that only require a single ECT codepoint
use ECT(0).
The not-ECT codepoint '00' indicates a packet that is not using ECN
The CE codepoint '11' is set by a router to indicate congestion
the end nodes. Routers that have a packet arriving at a full
drop the packet, just as they do in the absence of ECN
+-----+-----+
| ECN FIELD |
+-----+-----+
ECT CE [Obsolete] RFC 2481 names for the ECN bits
0 0 Not-
0 1 ECT(1)
1 0 ECT(0)
1 1
Figure 1: The ECN Field in IP
The use of two ECT codepoints essentially gives a one-bit ECN
in packet headers, and routers necessarily "erase" the nonce
they set the CE codepoint [SCWA99]. For example, routers that
the CE codepoint would face additional difficulty in
the original nonce, and thus repeated erasure of the CE
would be more likely to be detected by the end-nodes. The ECN
also can address the problem of misbehaving transport receivers
to the transport sender about whether or not the CE codepoint was
in a packet. The motivations for the use of two ECT codepoints
discussed in more detail in Section 20, along with some discussion
alternate possibilities for the fourth ECT codepoint (that is,
codepoint '01'). Backwards compatibility with earlier
implementations that do not understand the ECT(1) codepoint
discussed in Section 11.
In RFC 2481 [RFC2481], the ECN field was divided into the ECN-
Transport (ECT) bit and the CE bit. The ECN field with only
ECN-Capable Transport (ECT) bit set in RFC 2481 corresponds to
ECT(0) codepoint in this document, and the ECN field with both
Ramakrishnan, et al. Standards Track [Page 7]
RFC 3168 The Addition of ECN to IP September 2001
ECT and CE bit in RFC 2481 corresponds to the CE codepoint in
document. The '01' codepoint was left undefined in RFC 2481,
this is the reason for recommending the use of ECT(0) when only
single ECT codepoint is needed
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| DS FIELD, DSCP | ECN FIELD |
+-----+-----+-----+-----+-----+-----+-----+-----+
DSCP: differentiated services
ECN: Explicit Congestion
Figure 2: The Differentiated Services and ECN Fields in IP
Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field
The IPv4 TOS octet corresponds to the Traffic Class octet in IPv6,
and the ECN field is defined identically in both cases.
definitions for the IPv4 TOS octet [RFC791] and the IPv6
Class octet have been superseded by the six-bit DS (
Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed
[RFC2474] as Currently Unused, and are specified in RFC 2780
approved for experimental use for ECN. Section 22 gives a
history of the TOS octet
Because of the unstable history of the TOS octet, the use of the
field as specified in this document cannot be guaranteed to
backwards compatible with those past uses of these two bits
pre-date ECN. The potential dangers of this lack of
compatibility are discussed in Section 22.
Upon the receipt by an ECN-Capable transport of a single CE packet
the congestion control algorithms followed at the end-systems MUST
essentially the same as the congestion control response to a *single
dropped packet. For example, for ECN-Capable TCP the source TCP
required to halve its congestion window for any window of
containing either a packet drop or an ECN indication
One reason for requiring that the congestion-control response to
CE packet be essentially the same as the response to a dropped
is to accommodate the incremental deployment of ECN in both end
systems and in routers. Some routers may drop ECN-Capable
(e.g., using the same AQM policies for congestion detection)
other routers set the CE codepoint, for equivalent levels
congestion. Similarly, a router might drop a non-ECN-Capable
but set the CE codepoint in an ECN-Capable packet, for
Ramakrishnan, et al. Standards Track [Page 8]
RFC 3168 The Addition of ECN to IP September 2001
levels of congestion. If there were different congestion
responses to a CE codepoint than to a packet drop, this could
in unfair treatment for different flows
An additional goal is that the end-systems should react to
at most once per window of data (i.e., at most once per round-
time), to avoid reacting multiple times to multiple indications
congestion within a round-trip time
For a router, the CE codepoint of an ECN-Capable packet SHOULD
be set if the router would otherwise have dropped the packet as
indication of congestion to the end nodes. When the router's
is not yet full and the router is prepared to drop a packet to
end nodes of incipient congestion, the router should first check
see if the ECT codepoint is set in that packet's IP header. If so
then instead of dropping the packet, the router MAY instead set
CE codepoint in the IP header
An environment where all end nodes were ECN-Capable could allow
criteria to be developed for setting the CE codepoint, and
congestion control mechanisms for end-node reaction to CE packets
However, this is a research issue, and as such is not addressed
this document
When a CE packet (i.e., a packet that has the CE codepoint set)
received by a router, the CE codepoint is left unchanged, and
packet is transmitted as usual. When severe congestion has
and the router's queue is full, then the router has no choice but
drop some packet when a new packet arrives. We anticipate that
packet losses will become relatively infrequent when a majority
end-systems become ECN-Capable and participate in TCP or
compatible congestion control mechanisms. In an ECN-
environment that is adequately-provisioned, packet losses
occur primarily during transients or in the presence of non
cooperating sources
The above discussion of when CE may be set instead of dropping
packet applies by default to all Differentiated Services Per-
Behaviors (PHBs) [RFC 2475]. Specifications for PHBs MAY
more specifics on how a compliant implementation is to choose
setting CE and dropping a packet, but this is NOT REQUIRED. A
MUST NOT set CE instead of dropping a packet when the drop that
occur is caused by reasons other than congestion or the desire
indicate incipient congestion to end nodes (e.g., a diffserv
node may be configured to unconditionally drop certain classes
traffic to prevent them from entering its diffserv domain).
Ramakrishnan, et al. Standards Track [Page 9]
RFC 3168 The Addition of ECN to IP September 2001
We expect that routers will set the CE codepoint in response
incipient congestion as indicated by the average queue size,
the RED algorithms suggested in [FJ93, RFC2309]. To the best of
knowledge, this is the only proposal currently under discussion
the IETF for routers to drop packets proactively, before the
overflows. However, this document does not attempt to specify
particular mechanism for active queue management, leaving
endeavor, if needed, to other areas of the IETF. While ECN
inextricably tied up with the need to have a reasonable active
management mechanism at the router, the reverse does not hold;
queue management mechanisms have been developed and
independent of ECN, using packet drops as indications of
in the absence of ECN in the IP architecture
5.1. ECN as an Indication of Persistent
We emphasize that a *single* packet with the CE codepoint set in
IP packet causes the transport layer to respond, in terms
congestion control, as it would to a packet drop. The
queue size is likely to see considerable variations even when
router does not experience persistent congestion. As such, it
important that transient congestion at a router, reflected by
instantaneous queue size reaching a threshold much smaller than
capacity of the queue, not trigger a reaction at the transport layer
Therefore, the CE codepoint should not be set by a router based
the instantaneous queue size
For example, since the ATM and Frame Relay mechanisms for
indication have typically been defined without an associated
of average queue size as the basis for determining that
intermediate node is congested, we believe that they provide a
noisy signal. The TCP-sender reaction specified in this document
ECN is NOT the appropriate reaction for such a noisy signal
congestion notification. However, if the routers that interface
the ATM network have a way of maintaining the average queue at
interface, and use it to come to a reliable determination that
ATM subnet is congested, they may use the ECN notification that
defined here
We continue to encourage experiments in techniques at layer 2 (e.g.,
in ATM switches or Frame Relay switches) to take advantage of ECN
For example, using a scheme such as RED (where packet marking
based on the average queue length exceeding a threshold), layer 2
devices could provide a reasonably reliable indication of congestion
When all the layer 2 devices in a path set that layer's
Congestion Experienced codepoint (e.g., the EFCI bit for ATM,
FECN bit in Frame Relay) in this reliable manner, then the
router to the layer 2 network could copy the state of that layer 2
Ramakrishnan, et al. Standards Track [Page 10]
RFC 3168 The Addition of ECN to IP September 2001
Congestion Experienced codepoint into the CE codepoint in the
header. We recognize that this is not the current practice, nor
it in current standards. However, encouraging experimentation in
manner may provide the information needed to enable evolution
existing layer 2 mechanisms to provide a more reliable means
congestion indication, when they use a single bit for
congestion
5.2. Dropped or Corrupted
For the proposed use for ECN in this document (that is, for
transport protocol such as TCP for which a dropped data packet is
indication of congestion), end nodes detect dropped data packets,
the congestion response of the end nodes to a dropped data packet
at least as strong as the congestion response to a received
packet. To ensure the reliable delivery of the congestion
of the CE codepoint, an ECT codepoint MUST NOT be set in a
unless the loss of that packet in the network would be detected
the end nodes and interpreted as an indication of congestion
Transport protocols such as TCP do not necessarily detect all
drops, such as the drop of a "pure" ACK packet; for example, TCP
not reduce the arrival rate of subsequent ACK packets in response
an earlier dropped ACK packet. Any proposal for extending ECN
Capability to such packets would have to address issues such as
case of an ACK packet that was marked with the CE codepoint but
later dropped in the network. We believe that this aspect is
the subject of research, so this document specifies that at
time, "pure" ACK packets MUST NOT indicate ECN-Capability
Similarly, if a CE packet is dropped later in the network due
corruption (bit errors), the end nodes should still invoke
control, just as TCP would today in response to a dropped
packet. This issue of corrupted CE packets would have to
considered in any proposal for the network to distinguish
packets dropped due to corruption, and packets dropped due
congestion or buffer overflow. In particular, the
deployment of ECN would not, in and of itself, be a
development to allow end-nodes to interpret packet drops
indications of corruption rather than congestion
5.3.
ECN-capable packets MAY have the DF (Don't Fragment) bit set
Reassembly of a fragmented packet MUST NOT lose indications
congestion. In other words, if any fragment of an IP packet to
reassembled has the CE codepoint set, then one of two actions MUST
taken
Ramakrishnan, et al. Standards Track [Page 11]
RFC 3168 The Addition of ECN to IP September 2001
* Set the CE codepoint on the reassembled packet. However,
MUST NOT occur if any of the other fragments contributing
this reassembly carries the Not-ECT codepoint
* The packet is dropped, instead of being reassembled, for
other reason
If both actions are applicable, either MAY be chosen. Reassembly
a fragmented packet MUST NOT change the ECN codepoint when all of
fragments carry the same codepoint
We would note that because RFC 2481 did not specify
behavior, older ECN implementations conformant with that
RFC do not necessarily perform reassembly correctly, in terms
preserving the CE codepoint in a fragment. The sender could
the consequences of this behavior by setting the DF bit in ECN
Capable packets
Situations may arise in which the above reassembly specification
insufficiently precise. For example, if there is a malicious
broken entity in the path at or after the fragmentation point,
fragments could carry a mixture of ECT(0), ECT(1), and/or Not-
codepoints. The reassembly specification above does not
requirements on reassembly of fragments in this case. In
where more precise reassembly behavior would be required,
specifications SHOULD instead specify that DF MUST be set in
ECN-capable packets sent by the protocol
6. Support from the Transport
ECN requires support from the transport protocol, in addition to
functionality given by the ECN field in the IP packet header.
transport protocol might require negotiation between the
during setup to determine that all of the endpoints are ECN-capable
so that the sender can set the ECT codepoint in transmitted packets
Second, the transport protocol must be capable of
appropriately to the receipt of CE packets. This reaction could
in the form of the data receiver informing the data sender of
received CE packet (e.g., TCP), of the data receiver unsubscribing
a layered multicast group (e.g., RLM [MJV96]), or of some
action that ultimately reduces the arrival rate of that flow on
congested link. CE packets indicate persistent rather than
congestion (see Section 5.1), and hence reactions to the receipt
CE packets should be those appropriate for persistent congestion
This document only addresses the addition of ECN Capability to TCP
leaving issues of ECN in other transport protocols to
research. For TCP, ECN requires three new pieces of functionality
Ramakrishnan, et al. Standards Track [Page 12]
RFC 3168 The Addition of ECN to IP September 2001
negotiation between the endpoints during connection setup
determine if they are both ECN-capable; an ECN-Echo (ECE) flag in
TCP header so that the data receiver can inform the data sender
a CE packet has been received; and a Congestion Window Reduced (CWR
flag in the TCP header so that the data sender can inform the
receiver that the congestion window has been reduced. The
required from other transport protocols is likely to be different
particularly for unreliable or reliable multicast
protocols, and will have to be determined as other
protocols are brought to the IETF for standardization
In a mild abuse of terminology, in this document we refer to `
packets' instead of `TCP segments'.
6.1.
The following sections describe in detail the proposed use of ECN
TCP. This proposal is described in essentially the same form
[Floyd94]. We assume that the source TCP uses the standard
control algorithms of Slow-start, Fast Retransmit and Fast
[RFC2581].
This proposal specifies two new flags in the Reserved field of
TCP header. The TCP mechanism for negotiating ECN-Capability
the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the
field of the TCP header is designated as the ECN-Echo flag.
location of the 6-bit Reserved field in the TCP header is shown
Figure 4 of RFC 793 [RFC793] (and is reproduced below
completeness). This specification of the ECN Field leaves
Reserved field as a 4-bit field using bits 4-7.
To enable the TCP receiver to determine when to stop setting
ECN-Echo flag, we introduce a second new flag in the TCP header,
CWR flag. The CWR flag is assigned to Bit 8 in the Reserved field
the TCP header
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | U | A | P | R | S | F |
| Header Length | Reserved | R | C | S | S | Y | I |
| | | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 3: The old definition of bytes 13 and 14 of the
header
Ramakrishnan, et al. Standards Track [Page 13]
RFC 3168 The Addition of ECN to IP September 2001
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | C | E | U | A | P | R | S | F |
| Header Length | Reserved | W | C | R | C | S | S | Y | I |
| | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 4: The new definition of bytes 13 and 14 of the
Header
Thus, ECN uses the ECT and CE flags in the IP header (as shown
Figure 1) for signaling between routers and connection endpoints,
uses the ECN-Echo and CWR flags in the TCP header (as shown in
4) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection
a typical sequence of events in an ECN-based reaction to
is as follows
* An ECT codepoint is set in packets transmitted by the sender
indicate that ECN is supported by the transport entities
these packets
* An ECN-capable router detects impending congestion and
that an ECT codepoint is set in the packet it is about to drop
Instead of dropping the packet, the router chooses to set the
codepoint in the IP header and forwards the packet
* The receiver receives the packet with the CE codepoint set,
sets the ECN-Echo flag in its next TCP ACK sent to the sender
* The sender receives the TCP ACK with ECN-Echo set, and reacts
the congestion as if a packet had been dropped
* The sender sets the CWR flag in the TCP header of the
packet sent to the receiver to acknowledge its receipt of
reaction to the ECN-Echo flag
The negotiation for using ECN by the TCP transport entities and
use of the ECN-Echo and CWR flags is described in more detail in
sections below
6.1.1 TCP
In the TCP connection setup phase, the source and destination
exchange information about their willingness to use ECN.
to the completion of this negotiation, the TCP sender sets an
codepoint in the IP header of data packets to indicate to the
that the transport is capable and willing to participate in ECN
this packet. This indicates to the routers that they may mark
Ramakrishnan, et al. Standards Track [Page 14]
RFC 3168 The Addition of ECN to IP September 2001
packet with the CE codepoint, if they would like to use that as
method of congestion notification. If the TCP connection does
wish to use ECN notification for a particular packet, the sending
sets the ECN codepoint to not-ECT, and the TCP receiver ignores
CE codepoint in the received packet
For this discussion, we designate the initiating host as Host A
the responding host as Host B. We call a SYN packet with the ECE
CWR flags set an "ECN-setup SYN packet", and we call a SYN
with at least one of the ECE and CWR flags not set a "non-ECN-
SYN packet". Similarly, we call a SYN-ACK packet with only the
flag set but the CWR flag not set an "ECN-setup SYN-ACK packet",
we call a SYN-ACK packet with any other configuration of the ECE
CWR flags a "non-ECN-setup SYN-ACK packet".
Before a TCP connection can use ECN, Host A sends an ECN-setup
packet, and Host B sends an ECN-setup SYN-ACK packet. For a
packet, the setting of both ECE and CWR in the ECN-setup SYN
is defined as an indication that the sending TCP is ECN-Capable
rather than as an indication of congestion or of response
congestion. More precisely, an ECN-setup SYN packet indicates
the TCP implementation transmitting the SYN packet will
in ECN as both a sender and receiver. Specifically, as a receiver
it will respond to incoming data packets that have the CE
set in the IP header by setting ECE in outgoing TCP
(ACK) packets. As a sender, it will respond to incoming packets
have ECE set by reducing the congestion window and setting CWR
appropriate. An ECN-setup SYN packet does not commit the TCP
to setting the ECT codepoint in any or all of the packets it
transmit. However, the commitment to respond appropriately
incoming packets with the CE codepoint set remains even if the
sender in a later transmission, within this TCP connection, sends
SYN packet without ECE and CWR set
When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE
but not the CWR flag. An ECN-setup SYN-ACK packet is defined as
indication that the TCP transmitting the SYN-ACK packet is ECN
Capable. As with the SYN packet, an ECN-setup SYN-ACK packet
not commit the TCP host to setting the ECT codepoint in
packets
The following rules apply to the sending of ECN-setup packets
a TCP connection, where a TCP connection is defined by the
rules for TCP connection establishment and termination
* If a host has received an ECN-setup SYN packet, then it MAY
an ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send
ECN-setup SYN-ACK packet
Ramakrishnan, et al. Standards Track [Page 15]
RFC 3168 The Addition of ECN to IP September 2001
* A host MUST NOT set ECT on data packets unless it has sent
least one ECN-setup SYN or ECN-setup SYN-ACK packet, and
received at least one ECN-setup SYN or ECN-setup SYN-ACK packet
and has sent no non-ECN-setup SYN or non-ECN-setup SYN-
packet. If a host has received at least one non-ECN-setup
or non-ECN-setup SYN-ACK packet, then it SHOULD NOT set ECT
data packets
* If a host ever sets the ECT codepoint on a data packet,
that host MUST correctly set/clear the CWR TCP bit on
subsequent packets in the connection
* If a host has sent at least one ECN-setup SYN or ECN-setup SYN
ACK packet, and has received no non-ECN-setup SYN or non-ECN
setup SYN-ACK packet, then if that host receives TCP
packets with ECT and CE codepoints set in the IP header,
that host MUST process these packets as specified for an ECN
capable connection
* A host that is not willing to use ECN on a TCP connection
clear both the ECE and CWR flags in all non-ECN-setup SYN and/
SYN-ACK packets that it sends to indicate this unwillingness
Receivers MUST correctly handle all forms of the non-ECN-
SYN and SYN-ACK packets
* A host MUST NOT set ECT on SYN or SYN-ACK packets
A TCP client enters TIME-WAIT state after receiving a FIN-ACK,
transitions to CLOSED state after a timeout. Many
implementations create a new TCP connection if they receive an in
window SYN packet during TIME-WAIT state. When a TCP host
TIME-WAIT or CLOSED state, it should ignore any previous state
the negotiation of ECN for that connection
6.1.1.1. Middlebox
ECN introduces the use of the ECN-Echo and CWR flags in the
header (as shown in Figure 3) for initialization. There exist
faulty firewalls, load balancers, and intrusion detection systems
the Internet that either drop an ECN-setup SYN packet or respond
a RST, in the belief that such a packet (with these bits set) is
signature for a port-scanning tool that could be used in a denial
of-service attack. Some of the offending equipment has
identified, and a web page [FIXES] contains a list of non-
products and the fixes posted by the vendors, where these
available. The TBIT web page [TBIT] lists some of the web
affected by this faulty equipment. We mention this in this
as a warning to the community of this problem
Ramakrishnan, et al. Standards Track [Page 16]
RFC 3168 The Addition of ECN to IP September 2001
To provide robust connectivity even in the presence of such
equipment, a host that receives a RST in response to the
of an ECN-setup SYN packet MAY resend a SYN with CWR and ECE cleared
This could result in a TCP connection being established without
ECN
A host that receives no reply to an ECN-setup SYN within the
SYN retransmission timeout interval MAY resend the SYN and
subsequent SYN retransmissions with CWR and ECE cleared. To
normal packet loss that results in the original SYN being lost,
originating host may retransmit one or more ECN-setup SYN
before giving up and retransmitting the SYN with the CWR and ECE
cleared
We note that in this case, the following example scenario
possible
(1) Host A: Sends an ECN-setup SYN
(2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or delayed
(3) Host A: Sends a non-ECN-setup SYN
(4) Host B: Sends a non-ECN-setup SYN/ACK
We note that in this case, following the procedures above,
Host A nor Host B may set the ECT bit on data packets. Further,
important consequence of the rules for ECN setup and usage in
6.1.1 is that a host is forbidden from using the reception of
data packets as an implicit signal that the other host is ECN
capable
6.1.1.2. Robust TCP Initialization with an Echoed Reserved
There is the question of why we chose to have the TCP sending the
set two ECN-related flags in the Reserved field of the TCP header
the SYN packet, while the responding TCP sending the SYN-ACK
only one ECN-related flag in the SYN-ACK packet. This asymmetry
necessary for the robust negotiation of ECN-capability with
deployed TCP implementations. There exists at least one faulty
implementation in which TCP receivers set the Reserved field of
TCP header in ACK packets (and hence the SYN-ACK) simply to
the Reserved field of the TCP header in the received data packet
Because the TCP SYN packet sets the ECN-Echo and CWR flags
indicate ECN-capability, while the SYN-ACK packet sets only the ECN
Echo flag, the sending TCP correctly interprets a receiver'
reflection of its own flags in the Reserved field as an
that the receiver is not ECN-capable. The sending TCP is not
by a faulty TCP implementation sending a SYN-ACK packet that
reflects the Reserved field of the incoming SYN packet
Ramakrishnan, et al. Standards Track [Page 17]
RFC 3168 The Addition of ECN to IP September 2001
6.1.2. The TCP
For a TCP connection using ECN, new data packets are transmitted
an ECT codepoint set in the IP header. When only one ECT
is needed by a sender for all packets sent on a TCP connection
ECT(0) SHOULD be used. If the sender receives an ECN-Echo (ECE)
packet (that is, an ACK packet with the ECN-Echo flag set in the
header), then the sender knows that congestion was encountered in
network on the path from the sender to the receiver. The
of congestion should be treated just as a congestion loss in non
ECN-Capable TCP. That is, the TCP source halves the congestion
"cwnd" and reduces the slow start threshold "ssthresh". The
TCP SHOULD NOT increase the congestion window in response to
receipt of an ECN-Echo ACK packet
TCP should not react to congestion indications more than once
window of data (or more loosely, more than once every round-
time). That is, the TCP sender's congestion window should be
only once in response to a series of dropped and/or CE packets from
single window of data. In addition, the TCP source should
decrease the slow-start threshold, ssthresh, if it has been
within the last round trip time. However, if any
packets are dropped, then this is interpreted by the source TCP as
new instance of congestion
After the source TCP reduces its congestion window in response to
CE packet, incoming acknowledgments that continue to arrive
"clock out" outgoing packets as allowed by the reduced
window. If the congestion window consists of only one MSS (
segment size), and the sending TCP receives an ECN-Echo ACK packet
then the sending TCP should in principle still reduce its
window in half. However, the value of the congestion window
bounded below by a value of one MSS. If the sending TCP were
continue to send, using a congestion window of 1 MSS, this results
the transmission of one packet per round-trip time. It is
to still reduce the sending rate of the TCP sender even further,
receipt of an ECN-Echo packet when the congestion window is one.
use the retransmit timer as a means of reducing the rate further
this circumstance. Therefore, the sending TCP MUST reset
retransmit timer on receiving the ECN-Echo packet when the
window is one. The sending TCP will then be able to send a
packet only when the retransmit timer expires
When an ECN-Capable TCP sender reduces its congestion window for
reason (because of a retransmit timeout, a Fast Retransmit, or
response to an ECN Notification), the TCP sender sets the CWR flag
the TCP header of the first new data packet sent after the
reduction. If that data packet is dropped in the network, then
Ramakrishnan, et al. Standards Track [Page 18]
RFC 3168 The Addition of ECN to IP September 2001
sending TCP will have to reduce the congestion window again
retransmit the dropped packet
We ensure that the "Congestion Window Reduced" information
reliably delivered to the TCP receiver. This comes about from
fact that if the new data packet carrying the CWR flag is dropped
then the TCP sender will have to again reduce its congestion window
and send another new data packet with the CWR flag set. Thus,
CWR bit in the TCP header SHOULD NOT be set on retransmitted packets
When the TCP data sender is ready to set the CWR bit after
the congestion window, it SHOULD set the CWR bit only on the
new data packet that it transmits
[Floyd94] discusses TCP's response to ECN in more detail. [Floyd98]
discusses the validation test in the ns simulator, which
a wide range of ECN scenarios. These scenarios include the following
an ECN followed by another ECN, a Fast Retransmit, or a
Timeout; a Retransmit Timeout or a Fast Retransmit followed by
ECN; and a congestion window of one packet followed by an ECN
TCP follows existing algorithms for sending data packets in
to incoming ACKs, multiple duplicate acknowledgments, or
timeouts [RFC2581]. TCP also follows the normal procedures
increasing the congestion window when it receives ACK packets
the ECN-Echo bit set [RFC2581].
6.1.3. The TCP
When TCP receives a CE data packet at the destination end-system,
TCP data receiver sets the ECN-Echo flag in the TCP header of
subsequent ACK packet. If there is any ACK withholding implemented
as in current "delayed-ACK" TCP implementations where the
receiver can send an ACK for two arriving data packets, then
ECN-Echo flag in the ACK packet will be set to '1' if the
codepoint is set in any of the data packets being acknowledged.
is, if any of the received data packets are CE packets, then
returning ACK has the ECN-Echo flag set
To provide robustness against the possibility of a dropped ACK
carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag
a series of ACK packets sent subsequently. The TCP receiver uses
CWR flag received from the TCP sender to determine when to
setting the ECN-Echo flag
After a TCP receiver sends an ACK packet with the ECN-Echo bit set
that TCP receiver continues to set the ECN-Echo flag in all the
packets it sends (whether they acknowledge CE data packets or non-
Ramakrishnan, et al. Standards Track [Page 19]
RFC 3168 The Addition of ECN to IP September 2001
data packets) until it receives a CWR packet (a packet with the
flag set). After the receipt of the CWR packet, acknowledgments
subsequent non-CE data packets do not have the ECN-Echo flag set.
another CE packet is received by the data receiver, the
would once again send ACK packets with the ECN-Echo flag set.
the receipt of a CWR packet does not guarantee that the data
received the ECN-Echo message, this does suggest that the data
reduced its congestion window at some point *after* it sent the
packet for which the CE codepoint was set
We have already specified that a TCP sender is not required to
its congestion window more than once per window of data. Some
is required if the TCP sender is to avoid unnecessary reductions
the congestion window when a window of data includes both
packets and (marked) CE packets. This is illustrated in [Floyd98].
6.1.4. Congestion on the ACK-
For the current generation of TCP congestion control algorithms,
acknowledgement packets (e.g., packets that do not contain
accompanying data) MUST be sent with the not-ECT codepoint.
TCP receivers have no mechanisms for reducing traffic on the ACK-
in response to congestion notification. Mechanisms for responding
congestion on the ACK-path are areas for current and future research
(One simple possibility would be for the sender to reduce
congestion window when it receives a pure ACK packet with the
codepoint set). For current TCP implementations, a single dropped
generally has only a very small effect on the TCP's sending rate
6.1.5. Retransmitted TCP
This document specifies ECN-capable TCP implementations MUST NOT
either ECT codepoint (ECT(0) or ECT(1)) in the IP header
retransmitted data packets, and that the TCP data receiver
ignore the ECN field on arriving data packets that are outside of
receiver's current window. This is for greater security
denial-of-service attacks, as well as for robustness of the
congestion indication with packets that are dropped later in
network
First, we note that if the TCP sender were to set an ECT codepoint
a retransmitted packet, then if an unnecessarily-retransmitted
was later dropped in the network, the end nodes would never
the indication of congestion from the router setting the
codepoint. Thus, setting an ECT codepoint on retransmitted
packets is not consistent with the robust delivery of the
indication even for packets that are later dropped in the network
Ramakrishnan, et al. Standards Track [Page 20]
RFC 3168 The Addition of ECN to IP September 2001
In addition, an attacker capable of spoofing the IP source address
the TCP sender could send data packets with arbitrary
numbers, with the CE codepoint set in the IP header. On
this spoofed data packet, the TCP data receiver would determine
the data does not lie in the current receive window, and return
duplicate acknowledgement. We define an out-of-window packet at
TCP data receiver as a data packet that lies outside the receiver'
current window. On receiving an out-of-window packet, the TCP
receiver has to decide whether or not to treat the CE codepoint
the packet header as a valid indication of congestion, and
whether to return ECN-Echo indications to the TCP data sender.
the TCP data receiver ignored the CE codepoint in an out-of-
packet, then the TCP data sender would not receive this possibly
legitimate indication of congestion from the network, resulting in
violation of end-to-end congestion control. On the other hand,
the TCP data receiver honors the CE indication in the out-of-
packet, and reports the indication of congestion to the TCP
sender, then the malicious node that created the spoofed, out-of
window packet has successfully "attacked" the TCP connection
forcing the data sender to unnecessarily reduce (halve)
congestion window. To prevent such a denial-of-service attack,
specify that a legitimate TCP data sender MUST NOT set an
codepoint on retransmitted data packets, and that the TCP
receiver SHOULD ignore the CE codepoint on out-of-window packets
One drawback of not setting ECT(0) or ECT(1) on retransmitted
is that it denies ECN protection for retransmitted packets. However
for an ECN-capable TCP connection in a fully-ECN-capable
with mild congestion, packets should rarely be dropped due
congestion in the first place, and so instances of
packets should rarely arise. If packets are being retransmitted
then there are already packet losses (from corruption or
congestion) that ECN has been unable to prevent
We note that if the router sets the CE codepoint for an ECN-
data packet within a TCP connection, then the TCP connection
guaranteed to receive that indication of congestion, or to
some other indication of congestion within the same window of data
even if this packet is dropped or reordered in the network.
consider two cases, when the packet is later retransmitted, and
the packet is not later retransmitted
In the first case, if the packet is either dropped or delayed, and
some point retransmitted by the data sender, then the
is a result of a Fast Retransmit or a Retransmit Timeout for
that packet or for some prior packet in the same window of data.
this case, because the data sender already has retransmitted
packet, we know that the data sender has already responded to
Ramakrishnan, et al. Standards Track [Page 21]
RFC 3168 The Addition of ECN to IP September 2001
indication of congestion for some packet within the same window
data as the original packet. Thus, even if the first transmission
the packet is dropped in the network, or is delayed, if it had the
codepoint set, and is later ignored by the data receiver as an out
of-window packet, this is not a problem, because the sender
already responded to an indication of congestion for that window
data
In the second case, if the packet is never retransmitted by the
sender, then this data packet is the only copy of this data
by the data receiver, and therefore arrives at the data receiver
an in-window packet, regardless of how much the packet might
delayed or reordered. In this case, if the CE codepoint is set
the packet within the network, this will be treated by the
receiver as a valid indication of congestion
6.1.6. TCP Window Probes
When the TCP data receiver advertises a zero window, the TCP
sender sends window probes to determine if the receiver's window
increased. Window probe packets do not contain any user data
for the sequence number, which is a byte. If a window probe
is dropped in the network, this loss is not detected by the receiver
Therefore, the TCP data sender MUST NOT set either an ECT
or the CWR bit on window probe packets
However, because window probes use exact sequence numbers,
cannot be easily spoofed in denial-of-service attacks. Therefore,
a window probe arrives with the CE codepoint set, then the
SHOULD respond to the ECN indications
7. Non-compliance by the End
This section discusses concerns about the vulnerability of ECN
non-compliant end-nodes (i.e., end nodes that set the ECT
in transmitted packets but do not respond to received CE packets).
We argue that the addition of ECN to the IP architecture will
significantly increase the current vulnerability of the
to unresponsive flows
Even for non-ECN environments, there are serious concerns about
damage that can be done by non-compliant or unresponsive flows (
is, flows that do not respond to congestion control indications
reducing their arrival rate at the congested link). For example,
end-node could "turn off congestion control" by not reducing
congestion window in response to packet drops. This is a concern
the current Internet. It has been argued that routers will have
deploy mechanisms to detect and differentially treat packets
Ramakrishnan, et al. Standards Track [Page 22]
RFC 3168 The Addition of ECN to IP September 2001
non-compliant flows [RFC2309,FF99]. It has also been suggested
techniques such as end-to-end per-flow scheduling and isolation
one flow from another, differentiated services, or end-to-
reservations could remove some of the more damaging effects
unresponsive flows
It might seem that dropping packets in itself is an
deterrent for non-compliance, and that the use of ECN removes
deterrent. We would argue in response that (1) ECN-capable
preserve packet-dropping behavior in times of high congestion;
(2) even in times of high congestion, dropping packets in itself
not an adequate deterrent for non-compliance
First, ECN-Capable routers will only mark packets (as opposed
dropping them) when the packet marking rate is reasonably low.
periods where the average queue size exceeds an upper threshold,
therefore the potential packet marking rate would be high,
recommendation is that routers drop packets rather then set the
codepoint in packet headers
During the periods of low or moderate packet marking rates when
would be deployed, there would be little deterrent effect
unresponsive flows of dropping rather than marking those packets.
example, delay-insensitive flows using reliable delivery might
an incentive to increase rather than to decrease their sending
in the presence of dropped packets. Similarly, delay-sensitive
using unreliable delivery might increase their use of FEC in
to an increased packet drop rate, increasing rather than
their sending rate. For the same reasons, we do not believe
packet dropping itself is an effective deterrent for non-
even in an environment of high packet drop rates, when all flows
sharing the same packet drop rate
Several methods have been proposed to identify and restrict non
compliant or unresponsive flows. The addition of ECN to the
environment would not in any way increase the difficulty of
and deploying such mechanisms. If anything, the addition of ECN
the architecture would make the job of identifying unresponsive
slightly easier. For example, in an ECN-Capable environment
are not limited to information about packets that are dropped or
the CE codepoint set at that router itself; in such an environment
routers could also take note of arriving CE packets that
congestion encountered by that packet earlier in the path
Ramakrishnan, et al. Standards Track [Page 23]
RFC 3168 The Addition of ECN to IP September 2001
8. Non-compliance in the
This section considers the issues when a router is operating
possibly maliciously, to modify either of the bits in the ECN field
We note that in IPv4, the IP header is protected from bit errors by
header checksum; this is not the case in IPv6. Thus for IPv6
ECN field can be accidentally modified by bit errors on links or
routers without being detected by an IP header checksum
By tampering with the bits in the ECN field, an adversary (or
broken router) could do one or more of the following: falsely
congestion, disable ECN-Capability for an individual packet,
the ECN congestion indication, or falsely indicate ECN-Capability
Section 18 systematically examines the various cases by which the
field could be modified. The important criterion considered
determining the consequences of such modifications is whether it
likely to lead to poorer behavior in any dimension (throughput
delay, fairness or functionality) than if a router were to drop