As per Relevance of the word configured, we have this rfc below:
Network Working Group A.
Request for Comments: 3247 Cisco Systems, Inc
Category: Informational J.C.R.
K.
J.Y. Le
A.
Celion
W.
S.
PMC-
V.
Nortel
C.
AT&T
K.K.
TeraOptic
March 2002
Supplemental Information for the New
of the EF PHB (Expedited Forwarding Per-Hop Behavior
Status of this
This memo provides information for the Internet community. It
not specify an Internet standard of any kind. Distribution of
memo is unlimited
Copyright
Copyright (C) The Internet Society (2001). All Rights Reserved
This document was written during the process of clarification
RFC2598 "An Expedited Forwarding PHB" that led to the publication
revised specification of EF "An Expedited Forwarding PHB".
primary motivation is providing additional explanation to the
EF definition and its properties. The document also
additional implementation examples and gives some guidance
computation of the numerical parameters of the new definition
several well known schedulers and router architectures
Charny, et. al. Informational [Page 1]
RFC 3247 Supplemental Information March 2002
Table of
1 Introduction ........................................... 2
2 Definition of EF PHB ................................... 3
2.1 The formal definition .................................. 3
2.2 Relation to Packet Scale Rate Guarantee ................ 6
2.3 The need for dual characterization of EF PHB ........... 7
3 Per Packet delay ....................................... 9
3.1 Single hop delay bound ................................. 9
3.2 Multi-hop worst case delay ............................. 10
4 Packet loss ............................................ 10
5 Implementation considerations .......................... 11
5.1 The output buffered model with EF FIFO at the output. .. 12
5.1.1 Strict Non-preemptive Priority Queue ................... 12
5.1.2 WF2Q ................................................... 13
5.1.3 Deficit Round Robin (DRR) .............................. 13
5.1.4 Start-Time Fair Queuing and Self-Clocked Fair Queuing .. 13
5.2 Router with Internal Delay and EF FIFO at the output ... 13
6 Security Considerations ................................ 14
7 References ............................................. 14
Appendix A. Difficulties with the RFC 2598 EF PHB Definition .. 16
Appendix B. Alternative Characterization of Packet Scale
Guarantee ......................................... 20
Acknowledgements .............................................. 22
Authors' Addresses ............................................ 22
Full Copyright Statement ...................................... 24
1.
The Expedited Forwarding (EF) Per-Hop Behavior (PHB) was designed
be used to build a low-loss, low-latency, low-jitter,
bandwidth service. The potential benefits of this service,
therefore the EF PHB, are enormous. Because of the great value
this PHB, it is critical that the forwarding behavior required of
delivered by an EF-compliant node be specific, quantifiable,
unambiguous
Unfortunately, the definition of EF PHB in the original RFC2598 [10]
was not sufficiently precise (see Appendix A and [4]). A
precise definition is given in [6]. This document is intended to
in the understanding of the properties of the new definition
provide supplemental information not included in the text of [6]
sake of brevity
This document is outlined as follows. In section 2, we
restate the definition for EF PHB of [6]. We then provide
additional discussion of this definition and describe some of
properties. We discuss the issues associated with per-packet
Charny, et. al. Informational [Page 2]
RFC 3247 Supplemental Information March 2002
and loss in sections 3 and 4. In section 5 we discuss the impact
known scheduling architectures on the critical parameters of the
definition. We also discuss the impact of deviation of real
from the ideal output-buffered model on the magnitude of the
parameters in the definition
2. Definition of EF
2.1. The formal
An intuitive explanation of the new EF definition is described
[6]. Here we restate the formal definition from [6] verbatim
A node that supports EF on an interface I at some configured rate
MUST satisfy the following equations
d_j <= f_j + E_a for all j>0 (eq_1)
where f_j is defined iteratively
f_0 = 0, d_0 = 0
f_j = max(a_j, min(d_j-1, f_j-1)) + l_j/R, for all j > 0 (eq_2)
In this definition
- d_j is the time that the last bit of the j-th EF packet
depart actually leaves the node from the interface I
- f_j is the target departure time for the j-th EF packet
depart from I, the "ideal" time at or before which the last
of that packet should leave the node
- a_j is the time that the last bit of the j-th EF
destined to the output I actually arrives at the node
- l_j is the size (bits) of the j-th EF packet to depart from I
l_j is measured on the IP datagram (IP header plus payload)
does not include any lower layer (e.g. MAC layer) overhead
- R is the EF configured rate at output I (in bits/second).
- E_a is the error term for the treatment of the EF aggregate
Note that E_a represents the worst case deviation
actual departure time of an EF packet and ideal departure
of the same packet, i.e. E_a provides an upper bound on (d_j -
f_j) for all j
Charny, et. al. Informational [Page 3]
RFC 3247 Supplemental Information March 2002
- d_0 and f_0 do not refer to a real packet departure but
used purely for the purposes of the recursion. The time
should be chosen such that no EF packets are in the system
time 0.
- for the definitions of a_j and d_j, the "last bit" of
packet includes the layer 2 trailer if present, because
packet cannot generally be considered available for
until such a trailer has been received
An EF-compliant node MUST be able to be characterized by the range
possible R values that it can support on each of its interfaces
conforming to these equations, and the value of E_a that can be
on each interface. R may be line rate or less. E_a MAY be
as a worst-case value for all possible R values or MAY be
as a function of R
Note also that, since a node may have multiple inputs and
internal scheduling, the j-th EF packet to arrive at the
destined for a certain interface may not be the j-th EF packet
depart from that interface. It is in this sense that eq_1 and eq_2
are unaware of packet identity
In addition, a node that supports EF on an interface I at
configured rate R MUST satisfy the following equations
D_j <= F_j + E_p for all j>0 (eq_3)
where F_j is defined iteratively
F_0 = 0, D_0 = 0
F_j = max(A_j, min(D_j-1, F_j-1)) + L_j/R, for all j > 0 (eq_4)
In this definition
- D_j is the actual departure time of the individual EF
that arrived at the node destined for interface I at time A_j
i.e., given a packet which was the j-th EF packet destined
I to arrive at the node via any input, D_j is the time at
the last bit of that individual packet actually leaves the
from the interface I
- F_j is the target departure time for the individual EF
that arrived at the node destined for interface I at time A_j
Charny, et. al. Informational [Page 4]
RFC 3247 Supplemental Information March 2002
- A_j is the time that the last bit of the j-th EF
destined to the output I to arrive actually arrives at
node
- L_j is the size (bits) of the j-th EF packet to arrive at
node that is destined to output I. L_j is measured on the
datagram (IP header plus payload) and does not include
lower layer (e.g. MAC layer) overhead
- R is the EF configured rate at output I (in bits/second).
- E_p is the error term for the treatment of individual
packets. Note that E_p represents the worst case
between the actual departure time of an EF packet and the
departure time of the same packet, i.e. E_p provides an
bound on (D_j - F_j) for all j
- D_0 and F_0 do not refer to a real packet departure but
used purely for the purposes of the recursion. The time
should be chosen such that no EF packets are in the system
time 0.
- for the definitions of A_j and D_j, the "last bit" of
packet includes the layer 2 trailer if present, because
packet cannot generally be considered available for
until such a trailer has been received
It is the fact that D_j and F_j refer to departure times for the j-
packet to arrive that makes eq_3 and eq_4 aware of packet identity
This is the critical distinction between the last two equations
the first two
An EF-compliant node SHOULD be able to be characterized by the
of possible R values that it can support on each of its
while conforming to these equations, and the value of E_p that can
met on each interface. E_p MAY be specified as a worst-case
for all possible R values or MAY be expressed as a function of R.
E_p value of "undefined" MAY be specified
Finally, there is an additional recommendation in [6] that an
compliant node SHOULD NOT reorder packets within a micorflow
The definitions described in this section are referred to
aggregate and packet-identity-aware packet scale rate
[4],[2]. An alternative mathematical characterization of
scale rate guarantee is given in Appendix B
Charny, et. al. Informational [Page 5]
RFC 3247 Supplemental Information March 2002
2.2. Relation to Packet Scale Rate
Consider the case of an ideal output-buffered device with an EF
at the output. For such a device, the i-th packet to arrive to
device is also the i-th packet to depart from the device. Therefore
in this ideal model the aggregate behavior and packet-identity-
characteristics are identical, and E_a = E_p. In this section
therefore omit the subscript and refer to the latency term simply
E
It could be shown that for such an ideal device the definition
section 2.1 is stronger than the well-known rate-latency curve [2]
the sense that if a scheduler satisfies the EF definition it
satisfies the rate-latency curve. As a result, all the
known for the rate-latency curve also apply to the modified
definition. However, we argue below that the definition of
2.1 is more suitable to reflect the intent of EF PHB than the rate
latency curve
It is shown in [2] that the rate-latency curve is equivalent to
following definition
Definition of Rate Latency Curve (RLC):
D(j) <= F'(j) + E (eq_5)
F'(0)=0, F'(j)=max(a(j), F'(j-1))+ L(j)/R for all j>0 (eq_6)
It can be easily verified that the EF definition of section 2.1
stronger than RLC by noticing that for all j, F'(j) >= F(j).
It is easy to see that F'(j) in the definition of RLC corresponds
the time the j-th departure should have occurred should the
aggregate be constantly served exactly at its configured rate R
Following the common convention, we refer to F'(j) as the "
finish time" of the j-th packet to depart
The intuitive meaning of the rate-latency curve of RLC is that
packet is served at most time E later than this packet would
service in the fluid model
For RLC (and hence for the stronger EF definition) it holds that
any interval (0,t) the EF aggregate gets close to the desired
rate R (as long as there is enough traffic to sustain this rate).
The discrepancy between the ideal and the actual service in
interval depends on the latency term E, which in turn depends on
Charny, et. al. Informational [Page 6]
RFC 3247 Supplemental Information March 2002
scheduling implementation. The smaller E, the smaller the
between the configured rate and the actual rate achieved by
scheduler
While RLC guarantees the desired rate to the EF aggregate in
intervals (0,t) to within a specified error, it may
result in large gaps in service. For example, suppose that (a
number) N of identical EF packets of length L arrived from
interfaces to the EF queue in the absence of any non-EF traffic
Then any work-conserving scheduler will serve all N packets at
speed. When the last packet is sent at time NL/C, where C is
capacity of output link, F'(N) will be equal to NL/R. That is,
scheduler is running ahead of ideal, since NL/C < NL/R for R < C
Suppose now that at time NL/C a large number of non-EF
arrive, followed by a single EF packet. Then the scheduler
legitimately delay starting to send the EF packet until
F'(N+1)=(N+1)L/R + E - L/C. This means that the EF aggregate
have no service at all in the interval (NL/C, (N+1)L/R + E - L/C).
This interval can be quite large if R is substantially smaller
C. In essence, the EF aggregate can be "punished" by a gap
service for receiving faster service than its configured rate at
beginning
The new EF definition alleviates this problem by introducing the
min(D(j-1), F(j-1)) in the recursion. Essentially, this means
the fluid finishing time is "reset" if that packet is sent before
"ideal" departure time. As a consequence of that, for the case
the EF aggregate is served in the FIFO order, suppose a
arrives at time t to a server satisfying the EF definition.
packet will be transmitted no later than time t + Q(t)/R + E,
Q(t) is the EF queue size at time t (including the packet
discussion)[4].
2.3. The need for dual characterization of EF
In a more general case, where either the output scheduler does
serve the EF packets in a FIFO order, or the variable internal
in the device reorders packets while delivering them to the
(or both), the i-th packet destined to a given output interface
arrive to the device may no longer be the i-th packet to depart
that interface. In that case the packet-identity-aware and
aggregate definitions are no longer identical
The aggregate behavior definition can be viewed as a truly
characteristic of the service provided to EF packets. For
analogy, consider a dark reservoir to which all arriving packets
placed. A scheduler is allowed to pick a packet from the
in a random order, without any knowledge of the order of
Charny, et. al. Informational [Page 7]
RFC 3247 Supplemental Information March 2002
arrivals. The aggregate part of the definition measures the
of the output rate provided to the EF aggregate as a whole.
smaller E_a, the more accurate is the assurance that the reservoir
drained at least at the configured rate
Note that in this reservoir analogy packets of EF aggregate may
arbitrarily reordered. However, the definition of EF PHB given
[6] explicitly requires that no packet reordering occur within
microflow. This requirement restricts the
implementations, or, in the reservoir analogy, the order of
packets out of the reservoir to make sure that packets within
microflow are not reordered, but it still allows reordering at
aggregate level
Note that reordering within the aggregate, as long as there is
flow-level reordering, does not necessarily reflect a "bad" service
Consider for example a scheduler that arbitrates among 10
EF "flows" with diverse rates. A scheduler that is aware of the
requirements may choose to send a packet of the faster flow before
packet of the slower flow to maintain lower jitter at the flow level
In particular, an ideal "flow"-aware WFQ scheduler will
reordering within the aggregate, while maintaining packet
and small jitter at the flow level
It is intuitively clear that for such a scheduler, as well as for
simpler FIFO scheduler, the "accuracy" of the service rate is
for minimizing "flow"-level jitter. The packet-identity-
definition quantifies this accuracy of the service rate
However, the small value of E_a does not give any assurances
the absolute value of per-packet delay. In fact, if the input
exceeds the configured rate, the aggregate behavior definition
result in arbitrarily large delay of a subset of packets. This
the primary motivation for the packet-identity-aware definition
The primary goal of the packet-aware characterization of the
implementation is that, unlike the aggregate
characterization, it provides a way to find a per-packet delay
as a function of input traffic parameters
While the aggregate behavior definition characterizes the accuracy
the service rate of the entire EF aggregate, the packet-identity
aware part of the definition characterizes the deviation of
device from an ideal server that serves the EF aggregate in
order at least at the configured rate
The value of E_p in the packet-identity-aware definition is
affected by two factors: the accuracy of the aggregate rate
Charny, et. al. Informational [Page 8]
RFC 3247 Supplemental Information March 2002
and the degree of packet reordering within the EF aggregate (
the constraint that packets within the same microflow are
reordered). Therefore, a sub-aggregate aware device that provides
ideal service rate to the aggregate, and also provides an ideal
service for each of the sub-aggregates, may nevertheless have a
large value of E_p (in this case E_p must be at least equal to
ratio of the maximum packet size divided by the smallest rate of
sub aggregate). As a result, a large value of E_p does
necessarily mean that the service provided to EF aggregate is bad -
rather it may be an indication that the service is good, but non
FIFO. On the other hand, a large value of E_p may also mean that
aggregate service is very inaccurate (bursty), and hence in this
the large value of E_p reflects a poor quality of implementation
As a result, a large number of E_p does not necessarily provide
guidance on the quality of the EF implementation. However, a
value of E_p does indicate a high quality FIFO implementation
Since E_p and E_a relate to different aspects of the
implementation, they should be considered together to determine
quality of the implementation
3. Per Packet
The primary motivation for the packet-identity-aware definition
that it allows quantification of the per-packet delay bound.
section discusses the issues with computing per-packet delay
3.1. Single hop delay
If the total traffic arriving to an output port I from all inputs
constrained by a leaky bucket with parameters (R, B), where R is
configured rate at I, and B is the bucket depth (burst), then
delay of any packet departing from I is bounded by D_p, given
D_p = B/R + E_p (eq_7)
(see appendix B).
Because the delay bound depends on the configured rate R and
input burstiness B, it is desirable for both of these parameters
be visible to a user of the device. A PDB desiring a
delay bound may need to limit the range of configured rates
allowed burstiness that it can support in order to deliver
bound. Equation (eq_7) provides a means for determining
acceptable operating region for the device with a given E_p. It
also be useful to limit the total offered load to a given output
some rate R_1 < R (e.g. to obtain end-to-end delay bounds [5]).
Charny, et. al. Informational [Page 9]
RFC 3247 Supplemental Information March 2002
is important to realize that, while R_1 may also be a
parameter of the device, the delay bound in (eq_7) does not depend
it. It may be possible to get better bounds explicitly using
bound on the input rate, but the bound (eq_7) does not take
of this information
3.2. Multi-hop worst case
Although the PHB defines inherently local behavior, in this
we briefly discuss the issue of per-packet delay as the
traverses several hops implementing EF PHB. Given a delay
(eq_7) at a single hop, it is tempting to conclude that per-
bound across h hops is simply h times the bound (eq_7). However
this is not necessarily the case, unless B represents the worst
input burstiness across all nodes in the network
Unfortunately, obtaining such a worst case value of B is not trivial
If EF PHB is implemented using aggregate class-based scheduling
all EF packets share a single FIFO, the effect of jitter
may result in an increase in burstiness from hop to hop.
particular, it can be shown that unless severe restrictions on
utilization are imposed, even if all EF flows are ideally shaped
the ingress, then for any value of delay D it is possible
construct a network where EF utilization on any link is bounded
to exceed a given factor, no flow traverses more than a
number of hops, but there exists a packet that experiences a
more than D [5]. This result implies that the ability to limit
worst case burstiness and the resulting end-to-end delay
several hops may require not only limiting EF utilization on
links, but also constraining the global network topology.
topology constraints would need to be specified in the definition
any PDB built on top of EF PHB, if such PDB requires a strict
case delay bound
4. Packet
Any device with finite buffering may need to drop packets if
input burstiness becomes sufficiently high. To meet the low
objective of EF, a node may be characterized by the operating
in which loss of EF due to congestion will not occur. This may
specified as a token bucket of rate r <= R and burst size B that
be offered from all inputs to a given output interface without loss
However, as discussed in the previous section, the phenomenon
jitter accumulation makes it generally difficult to guarantee
the input burstiness never exceeds the specified operating region
nodes internal to the DiffServ domain. A no-loss guarantee
multiple hops may require specification of constraints on
Charny, et. al. Informational [Page 10]
RFC 3247 Supplemental Information March 2002
topology which are outside the scope of inherently local
of a PHB. Thus, it must be possible to establish whether a
conforms to the EF definition even when some packets are lost
This can be done by performing an "off-line" test of conformance
equations (eq_1)- (eq_4). After observing a sequence of
entering and leaving the node, the packets which did not leave
assumed lost and are notionally removed from the input stream.
remaining packets now constitute the arrival stream and the
which left the node constitute the departure stream. Conformance
the equations can thus be verified by considering only those
that successfully passed through the node
Note that specification of which packets are lost in the case
loss does occur is beyond the scope of the definition of EF PHB
However, those packets that were not lost must conform to
equations definition of EF PHB in section 2.1.
5. Implementation
A packet passing through a router will experience delay for a
of reasons. Two familiar components of this delay are the time
packet spends in a buffer at an outgoing link waiting for
scheduler to select it and the time it takes to actually transmit
packet on the outgoing line
There may be other components of a packet's delay through a router
however. A router might have to do some amount of header
before the packet can be given to the correct output scheduler,
example. In another case a router may have a FIFO buffer (called
transmission queue in [7]) where the packet sits after being
by the output scheduler but before it is transmitted. In cases
as these, the extra delay a packet may experience can be
for by absorbing it into the latency terms E_a and E_p
Implementing EF on a router with a multi-stage switch fabric
special attention. A packet may experience additional delays due
the fact that it must compete with other traffic for
resources at multiple contention points in the switch core.
delay an EF packet may experience before it even reaches the output
link scheduler should be included in the latency term. Input
buffered and input/output-buffered routers based on crossbar
may also require modification of their latency terms. The
such as the speedup factor and the choice of crossbar
algorithms may affect the latency terms substantially
Charny, et. al. Informational [Page 11]
RFC 3247 Supplemental Information March 2002
Delay in the switch core comes from two sources, both of which
be considered. The first part of this delay is the fixed delay
packet experiences regardless of the other traffic. This
of the delay includes the time it takes for things such as
segmentation and reassembly in cell based cores, enqueueing
dequeuing at each stage, and transmission between stages. The
part of the switch core delay is variable and depends on the type
amount of other traffic traversing the core. This delay comes
if the stages in the core mix traffic flowing between
input/output port pairs. Thus, EF packets must compete against
traffic for forwarding resources in the core. Some of
competing traffic may even be traffic from other, non-EF aggregates
This introduces extra delay, that can also be absorbed by the
term in the definition
To capture these considerations, in this section we will consider
simplified implementation examples. The first is an ideal
buffered node where packets entering the device from an
interface are immediately delivered to the output scheduler. In
model the properties of the output scheduler fully define the
of the parameters E_a and E_p. We will consider the case where
output scheduler implements aggregate class-based queuing, so
all EF packets share a single queue. We will discuss the values
E_a and E_p for a variety of class-based schedulers
considered
The second example will consider a router modeled as a black box
a known bound on the variable delay a packet can experience from
time it arrives to an input to the time it is delivered to
destination output. The output scheduler in isolation is assumed
be an aggregate scheduler where all EF packets share a single
queue, with a known value of E_a(S)=E_p(S)=E(S). This model
a reasonable abstraction to a large class of router implementations
5.1. The output buffered model with EF FIFO at the output
As has been mentioned earlier, in this model E_a = E_p, so we
omit the subscript and refer to both terms as latency E.
remainder of this subsection discusses E for a number of
implementations
5.1.1. Strict Non-preemptive Priority
A Strict Priority scheduler in which all EF packets share a
FIFO queue which is served at strict non-preemptive priority
other queues satisfies the EF definition with the latency term E =
MTU/C where MTU is the maximum packet size and C is the speed of
output link
Charny, et. al. Informational [Page 12]
RFC 3247 Supplemental Information March 2002
5.1.2. WF2
Another scheduler that satisfies the EF definition with a
latency term is WF2Q described in [1]. A class-based WF2Q scheduler
in which all EF traffic shares a single queue with the
corresponding to the configured rate of the EF aggregate
the EF definition with the latency term E = MTU/C+MTU/R
5.1.3. Deficit Round Robin (DRR
For DRR [12], E can be shown to grow linearly
N*(r_max/r_min)*MTU, where r_min and r_max denote the smallest
the largest rate among the rate assignments of all queues in
scheduler, and N is the number of queues in the scheduler
5.1.4. Start-Time Fair Queuing and Self-Clocked Fair
For Start-Time Fair Queuing (SFQ) [9] and Self-Clocked Fair
(SCFQ) [8] E can be shown to grow linearly with the number of
in the scheduler
5.2. Router with Internal Delay and EF FIFO at the
In this section we consider a router which is modeled as follows.
packet entering the router may experience a variable delay D_v with
known upper bound D. That is, 0<=D_v<=D. At the output all
packets share a single class queue. Class queues are scheduled by
scheduler with a known value E_p(S)=E(S) (where E(S) corresponds
the model where this scheduler is implemented in an ideal
buffered device).
The computation of E_p is more complicated in this case. For
device, it can be shown that E_p = E(S)+2D+2B/R (see [13]).
Recall from the discussion of section 3 that bounding
burstiness B may not be easy in a general topology. In the
of the knowledge of a bound on B one can bound E_p as E_p = E(S) +
D*C_inp/R (see [13]).
Note also that the bounds in this section are derived using only
bound on the variable portion of the interval delay and the
bound of the output scheduler. If more details about
architecture of a device are available, it may be possible to
better bounds
Charny, et. al. Informational [Page 13]
RFC 3247 Supplemental Information March 2002
6. Security
This informational document provides additional information to aid
understanding of the EF PHB described in [6]. It adds no
functions to it. As a result, it adds no security issues to
described in that specification
7.
[1] J.C.R. Bennett and H. Zhang, "WF2Q: Worst-case Fair
Fair Queuing", INFOCOM'96, March 1996.
[2] J.-Y. Le Boudec, P. Thiran, "Network Calculus", Springer
Lecture Notes in Computer Science volume 2050, June 2001
(available online at http://lcawww.epfl.ch).
[3] Bradner, S., "Key Words for Use in RFCs to Indicate
Levels", BCP 14, RFC 2119, March 1997.
[4] J.C.R. Bennett, K. Benson, A. Charny, W. Courtney, J.Y.
Boudec, "Delay Jitter Bounds and Packet Scale Rate
for Expedited Forwarding", Proc. Infocom 2001, April 2001.
[5] A. Charny, J.-Y. Le Boudec "Delay Bounds in a Network
Aggregate Scheduling". Proc. of QoFIS'2000, September 25-26,
2000, Berlin, Germany
[6] Davie, B., Charny, A., Baker, F., Bennett, J.C.R., Benson, K.,
Boudec, J., Chiu, A., Courtney, W., Davari, S., Firoiu, V.,
Kalmanek, C., Ramakrishnan, K.K. and D. Stiliadis, "
Expedited Forwarding PHB (Per-Hop Behavior)", RFC 3246,
2002.
[7] T. Ferrari and P. F. Chimento, "A Measurement-Based Analysis
Expedited Forwarding PHB Mechanisms," Eighth
Workshop on Quality of Service, Pittsburgh, PA, June 2000.
[8] S.J. Golestani. "A Self-clocked Fair Queuing Scheme for Broad
band Applications". In Proceedings of IEEE INFOCOM'94,
636-646, Toronto, CA, April 1994.
[9] P. Goyal, H.M. Vin, and H. Chen. "Start-time Fair Queuing:
Scheduling Algorithm for Integrated Services". In
of the ACM-SIGCOMM 96, pages 157-168, Palo Alto, CA,
1996.
[10] Jacobson, V., Nichols, K. and K. Poduri, "An
Forwarding PHB", RFC 2598, June 1999.
Charny, et. al. Informational [Page 14]
RFC 3247 Supplemental Information March 2002
[11] Jacobson, V., Nichols, K. and K. Poduri, "The 'Virtual Wire
Behavior Aggregate", Work in Progress
[12] M. Shreedhar and G. Varghese. "Efficient Fair Queuing
Deficit Round Robin". In Proceedings of SIGCOMM'95,
231-243, Boston, MA, September 1995.
[13] Le Boudec, J.-Y., Charny, A. "Packet Scale Rate Guarantee
non-FIFO Nodes", Infocom 2002, New York, June 2002.
Charny, et. al. Informational [Page 15]
RFC 3247 Supplemental Information March 2002
Appendix A. Difficulties with the RFC 2598 EF PHB
The definition of the EF PHB as given in [10] states
"The EF PHB is defined as a forwarding treatment for a
diffserv aggregate where the departure rate of the aggregate'
packets from any diffserv node must equal or exceed a
rate. The EF traffic SHOULD receive this rate independent of
intensity of any other traffic attempting to transit the node.
[the EF PHB departure rate] SHOULD average at least the
rate when measured over any time interval equal to or longer than
time it takes to send an output link MTU sized packet at
configured rate."
A literal interpretation of the definition would consider
behaviors given in the next two subsections as non-compliant.
definition also unnecessarily constrains the maximum
rate of an EF aggregate
A.1 Perfectly-Clocked
Consider the following stream forwarded from a router with EF
configured rate R=C/2, where C is the output line rate. In
illustration, E is an MTU-sized EF packet while x is a non-EF
or unused capacity, also of size MTU
E x E x E x E x E x E x...
|-----|
The interval between the vertical bars is 3*MTU/C, which is
than MTU/(C/2), and so is subject to the EF PHB definition.
this interval, 3*MTU/2 bits of the EF aggregate should be forwarded
but only MTU bits are forwarded. Therefore, while this
pattern should be considered compliant under any
interpretation of the EF PHB, it actually does not formally
with the definition of RFC 2598.
Note that this forwarding pattern can occur in any work-
scheduler in an ideal output-buffered architecture where EF
arrive in a perfectly clocked manner according to the above
and are forwarded according to exactly the same pattern in
absence of any non-EF traffic
Trivial as this example may be, it reveals the lack of
precision in the formal definition. The fact that no work-
scheduler can formally comply with the definition is unfortunate,
appears to warrant some changes to the definition that would
this problem
Charny, et. al. Informational [Page 16]
RFC 3247 Supplemental Information March 2002
The underlying reason for the problem described here is quite
- one can only expect that the EF aggregate is served at
rate in some interval where there is enough backlog of EF packets
sustain that rate. In the example above the packets come in
at the rate at which they are served, and so there is no
backlog. Certainly, if the input rate is even smaller than
configured rate of the EF aggregate, there will be no backlog
well, and a similar formal difficulty will occur
A seemingly simple solution to this difficulty might be to
that the EF aggregate is served at its configured rate only when
queue is backlogged. However, as we show in the remainder of
section, this solution does not suffice
A.2 Router Internal
We now argue that the example considered in the previous section
not as trivial as it may seem at first glance
Consider a router with EF configured rate R = C/2 as in the
example, but with an internal delay of 3T (where T = MTU/C)
the time that a packet arrives at the router and the time that it
first eligible for forwarding at the output link. Such things
header processing, route look-up, and delay in switching through
multi-layer fabric could cause this delay. Now suppose that
traffic arrives regularly at a rate of (2/3)R = C/3. The router
perform as shown below
EF Packet Number 1 2 3 4 5 6 ...
Arrival (at router) 0 3T 6T 9T 12T 15T ...
Arrival (at scheduler) 3T 6T 9T 12T 15T 18T ...
Departure 4T 7T 10T 13T 16T 19T ...
Again, the output does not satisfy the RFC 2598 definition of EF PHB
As in the previous example, the underlying reason for this problem
that the scheduler cannot forward EF traffic faster than it arrives
However, it can be easily seen that the existence of internal
causes one packet to be inside the router at all times. An
observer will rightfully conclude that the number of EF packets
arrived to the router is always at least one greater than the
of EF packets that left the router, and therefore the EF aggregate
constantly backlogged. However, while the EF aggregate
continuously backlogged, the observed output rate is
strictly less that the configured rate
Charny, et. al. Informational [Page 17]
RFC 3247 Supplemental Information March 2002
This example indicates that the simple addition of the condition
EF aggregate must receive its configured rate only when the
aggregate is backlogged does not suffice in this case
Yet, the problem described here is of fundamental importance
practice. Most routers have a certain amount of internal delay.
vendor declaring EF compliance is not expected to
declare the details of the internals of the router. Therefore,
existence of internal delay may cause a perfectly reasonable
implementation to display seemingly non-conformant behavior, which
clearly undesirable
A.3 Maximum Configurable Rate and Provisioning
It is well understood that with any non-preemptive scheduler,
RFC-2598-compliant configurable rate for an EF aggregate
exceed C/2 [11]. This is because an MTU-sized EF packet may
to an empty queue at time t just as an MTU-sized non-EF packet
service. The maximum number of EF bits that could be
during the interval [t, t + 2*MTU/C] is MTU. But if configured
R > C/2, then this interval would be of length greater than MTU/R
and more than MTU EF bits would have to be served during
interval for the router to be compliant. Thus, R must be no
than C/2.
It can be shown that for schedulers other than PQ, such as
implementations of WFQ, the maximum compliant configured rate may
much smaller than 50%. For example, for SCFQ [8] the
configured rate cannot exceed C/N, where N is the number of queues
the scheduler. For WRR, mentioned as compliant in section 2.2 of
2598, this limitation is even more severe. This is because in
schedulers a packet arriving to an empty EF queue may be forced
wait until one packet from each other queue (in the case of SCFQ)
until several packets from each other queue (in the case of WRR)
served before it will finally be forwarded
While it is frequently assumed that the configured rate of EF
will be substantially smaller than the link bandwidth,
requirement that this rate should never exceed 50% of the
bandwidth appears unnecessarily limiting. For example, in a
connected mesh network, where any flow traverses a single link on
way from source to its destination there seems no compelling
to limit the amount of EF traffic to 50% (or an even
percentage for some schedulers) of the link bandwidth
Another, perhaps even more striking example is the fact that even
TDM circuit with dedicated slots cannot be configured to forward
packets at more than 50% of the link speed without violating RFC 2598
Charny, et. al. Informational [Page 18]
RFC 3247 Supplemental Information March 2002
(unless the entire link is configured for EF). If the
rate of EF traffic is greater than 50% (but less than the
speed), there will always exist an interval longer than MTU/R
which less than the configured rate is achieved. For example
suppose the configured rate of the EF aggregate is 2C/3. Then
forwarding pattern of the TDM circuit might
E E x E E x E E x ...
|---|
where only one packet is served in the marked interval of length 2T =
2MTU/C. But at least 4/3 MTU would have to be served during
interval by a router in compliance with the definition in RFC 2598.
The fact that even a TDM line cannot be booked over 50% by EF
indicates that the restriction is artificial and unnecessary
A.4 The Non-trivial Nature of the
One possibility to correct the problems discussed in the
sections might be to attempt to clarify the definition of
intervals to which the definition applied or by averaging
multiple intervals. However, an attempt to do so meets
considerable analytical and implementation difficulties.
example, attempting to align interval start times with some epochs
the forwarded stream appears to require a certain degree of
clock synchronization and is fraught with the risk
misinterpretation and mistake in practice
Another approach might be to allow averaging of the rates over
larger time scale. However, it is unclear exactly what finite
scale would suffice in all reasonable cases. Furthermore,
approach would compromise the notion of very short-term time
guarantees that are the essence of EF PHB
We also explored a combination of two simple fixes. The first is
addition of the condition that the only intervals subject to
definition are those that fall inside a period during which the
aggregate is continuously backlogged in the router (i.e., when an
packet is in the router). The second is the addition of an
(latency) term that could serve as a figure-of-merit in
advertising of EF services
With the addition of these two changes the candidate
becomes as follows
Charny, et. al. Informational [Page 19]
RFC 3247 Supplemental Information March 2002
In any interval of time (t1, t2) in which EF traffic is
backlogged, at least R(t2 - t1 - E) bits of EF traffic must
served, where R is the configured rate for the EF aggregate and E
an implementation-specific latency term
The "continuously backlogged" condition eliminates the insufficient
packets-to-forward difficulty while the addition of the latency
of size MTU/C resolves the perfectly-clocked forwarding
(section A.1), and also removes the limitation on EF configured rate
However, neither fix (nor the two of them together) resolves
example of section A.2. To see this, recall that in the example
section A.2 the EF aggregate is continuously backlogged, but
service rate of the EF aggregate is consistently smaller than
configured rate, and therefore no finite latency term will suffice
bring the example into conformance
Appendix B. Alternative Characterization of Packet Scale Rate
The proofs of several bounds in this document can be found in [13].
These proofs use an algebraic characterization of the
definition given by (eq_1), (eq_2), and packet identity
definition given by (eq_3), (eq_4). Since this characterization
of interest on its own, we present it in this section
Theorem B1. Characterization of the aggregate definition
f_n
Consider a system where packets are numbered 1, 2, ... in order
arrival. As in the aggregate definition, call a_n the n-th
time, d_n - the n-th departure time, and l_n the size of the n-
packet to depart. Define by convention d_0=0. The
definition with rate R and latency E_a is equivalent to saying
for all n and all 0<=j<= n-1:
d_n <= E_a + d_j + (l_(j+1) + ... + l_n)/R (eq_b1)
there exists some j+1 <= k <= n such
d_n <= E_a + a_k + (l_k + ... + l_n)/R (eq_b2)
Charny, et. al. Informational [Page 20]
RFC 3247 Supplemental Information March 2002
Theorem B2. Characterization of packet-identity-aware
without F_n
Consider a system where packets are numbered 1, 2, ... in order
arrival. As in the packet-identity-aware definition, call A_n, D_
the arrival and departure times for the n-th packet, and L_n the
of this packet. Define by convention D_0=0. The packet
aware definition with rate R and latency E_p is equivalent to
that for all n and all 0<=j<= n-1:
D_n <= E_p + D_j + (L_{j+1} + ... + L_n)/R (eq_b3)
there exists some j+1 <= k <= n such
D_n <= E_p + A_k + (L_k + ... + L_n)/R (eq_b4)
For the proofs of both Theorems, see [13].
Charny, et. al. Informational [Page 21]
RFC 3247 Supplemental Information March 2002
This document could not have been written without Fred Baker,
Davie and Dimitrios Stiliadis. Their time, support and
comments were invaluable
Authors'
Anna
Cisco
300 Apollo
Chelmsford, MA 01824
EMail: acharny@cisco.
Jon
3 Highwood Drive
Tewksbury, MA 01876
EMail: jcrb@motorola.
Kent
Tellabs Research
3740 Edison Lake Parkway #101
Mishawaka, IN 46545
EMail: Kent.Benson@tellabs.
Jean-Yves Le
ICA-EPFL,
Ecublens, CH-1015
Lausanne-EPFL,
EMail: jean-yves.leboudec@epfl.
Angela
Celion
1 Sheila Drive, Suite 2
Tinton Falls, NJ 07724
EMail: angela.chiu@celion.
Charny, et. al. Informational [Page 22]
RFC 3247 Supplemental Information March 2002
Bill
Bldg. 201/3702
One Space
Redondo Beach, CA 90278
EMail: bill.courtney@trw.
Shahram
PMC-Sierra
411 Legget
Ottawa, ON K2K 3C9,
EMail: shahram_davari@pmc-sierra.
Victor
Nortel
600 Tech
Billerica, MA 01821
EMail: vfiroiu@nortelnetworks.
Charles
AT&T Labs-
180 Park Avenue, Room A113,
Florham Park
EMail: crk@research.att.
K.K.
TeraOptic Networks, Inc
686 W. Maude
Sunnyvale, CA 94086
EMail: kk@teraoptic.
Charny, et. al. Informational [Page 23]
RFC 3247 Supplemental Information March 2002
Full Copyright
Copyright (C) The Internet Society (2001). All Rights Reserved
This document and translations of it may be copied and furnished
others, and derivative works that comment on or otherwise explain
or assist in its implementation may be prepared, copied,
and distributed, in whole or in part, without restriction of
kind, provided that the above copyright notice and this paragraph
included on all such copies and derivative works. However,
document itself may not be modified in any way, such as by
the copyright notice or references to the Internet Society or
Internet organizations, except as needed for the purpose
developing Internet standards in which case the procedures
copyrights defined in the Internet Standards process must
followed, or as required to translate it into languages other
English
The limited permissions granted above are perpetual and will not
revoked by the Internet Society or its successors or assigns
This document and the information contained herein is provided on
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED,
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE
Funding for the RFC Editor function is currently provided by
Internet Society
Charny, et. al. Informational [Page 24]
if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.
RFC documents can be found at I.E.T.F.
Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX