CBT is a multicast routing architecture that builds a single
tree per group which is shared by all of the group's senders
receivers. Most multicast algorithms build one multicast tree
sender (subnetwork), the tree being rooted at the sender' subnetwork. The primary advantage of the shared tree approach
that it typically offers more favourable scaling characteristics
all other multicast algorithms
CBT is progressing through the IDMR working group of the IETF.
CBT protocol is described in an accompanying document [1]. For this
and all IDMR-related documents, see http://www.cs.ucl.ac.uk/ietf/
TABLE OF
1. Background................................................... 2
2. Introduction................................................. 2
3. Source Based Tree Algorithms................................. 3
3.1 Distance-Vector MulticastAlgorithm...................... 4
3.2 Link State MulticastAlgorithm........................... 5
3.3 The Motivation for Shared Trees.......................... 5
4. CBT - The New Architecture................................... 7
4.1 Design Requirements...................................... 7
4.2 Components & Functions................................... 8
4.2.1 CBT Control Message RetransmissionStrategy........ 10
4.2.2 Non-Member Sending................................. 11
5. Interoperability with Other Multicast Routing Protocols ..... 11
Shared trees were first described by Wall in his investigation
low-delay approaches to broadcast and selectivebroadcast [3].
concluded that delay will not be minimal, as with shortest-
trees, but the delay can be kept within bounds that may
acceptable. Back then, the benefits and uses of multicast were
fully understood, and it wasn't until much later that the multicast address space was defined (class D space [4]). Deering'
work [2] in the late 1980's was pioneering in that he defined the multicast service model, and invented algorithms which allow hosts
arbitrarily join and leave a multicast group. All of Deering' multicast algorithms build source-rooted delivery trees, with delivery tree per sender subnetwork. These algorithms are
in [2].
The multicast-capable part of the Internet, the MBONE, continues
expand rapidly. The obvious popularity and growth of multicast
that the scaling aspects of wide-area multicasting cannot
overlooked; some predictions talk of thousands of groups
present at any one time in the Internet
All multicast routing protocols make use of IGMP [5], a protocol
operates between hosts and multicast router(s) belonging to the subnetwork. IGMP enables the subnet's multicast router(s) to
group membershippresence on its directly attached links, so that multicast data arrives, it knows over which of its links to send
copy of the packet
In our description of the MBONE so far, we have assumed that multicast routers on the MBONE are running the same multicast protocol. In reality, this is not the case; the MBONE is a
of autonomously administeredmulticast regions, each region
by one or more multicast-capable border routers. Each
independently chooses to run whichever multicast routing
best suits its needs, and the regions interconnect via the "
region", which currently runs the Distance Vector Multicast Protocol (DVMRP) [6]. Therefore, it follows that a region's
router(s) must interoperate with DVMRP
Different algorithms use differenttechniques for establishing distribution tree. If we classify these algorithms into source-
tree algorithms and shared tree algorithms, we'll see that different classes have considerably different
characteristics, and the characteristics of the resulting
differ too, for example, average delay. Let's look at source-
tree algorithms first
Most source-based tree multicast algorithms are often referred to
"dense-mode" algorithms; they assume that the receiver
densely populates the domain of operation, and therefore
accompanying overhead (in terms of state, bandwidth usage, and/ processing costs) is justified. Whilst this might be the case in
local environment, wide-area group membership tends to be
distributed throughout the Internet. There may be "pockets"
denseness, but if one views the global picture, wide-area groups
to be sparsely distributed
Source-based multicast trees are either built by a distance-
style algorithm, which may be implemented separately from the
routing algorithm (as is the case with DVMRP), or the multicast
may be built using the information present in the underlying
routing table (as is the case with PIM-DM [7]). The other
used for building source-based trees is the link-state algorithm ( protocolinstance being M-OSPF [8]).
The distance-vector multicastalgorithm builds a multicast
tree using a variant of the Reverse-Path Forwardingtechnique [9].
The technique basically is as follows: when a multicast receives a multicast data packet, if the packet arrives on interface used to reach the source of the packet, the packet
forwarded over all outgoing interfaces, except leaf subnets with
members attached. A "leaf" subnet is one which no router would
to reach the souce of a multicast packet. If the data packet does
arrive over the link that would be used to reach the source,
packet is discarded
This constitutes a "broadcast & prune" approach to multicast
construction; when a data packet reaches a leaf router, if
router has no membershipregistered on any of its directly
subnetworks, the router sends a prune message one hop back
the source. The receiving router then checks its leaf subnets
group membership, and checks whether it has received a prune from
of its downstream routers (downstream with respect to the source).
If so, the router itself can send a prune upstream over the
leading to the source
The sender and receiver of a prune message must cache the
group> pair being reported, for a "lifetime" which is at
granularity of minutes. Unless a router's prune information
refreshed by the receipt of a new prune for
its "lifetime" expires, that information is removed, allowing data
flow over the branch again. State that expires in this way
referred to as "soft state".
Interestingly, routers that do not lead to group members are
the state overhead incurred by prune messages. For wide-
multicasting, which potentially has to support many thousands
active groups, each of which may be sparsely distributed, technique clearly does not scale
Routers implementing a link state algorithm periodically reachabilityinformation to their directly attached neighbours,
flood this throughout the routing domain in so-called link
update packets. Deering extended the link state algorithm
multicasting by having a router additionally detect group
changes on its incident links before flooding this information
link state packets
However, the flooding (reliable broadcasting) of group information is the predominant factor preventing the link multicastalgorithm being applicable over the wide-area. The
limiting factor is the processing cost of the Dijkstra calculation
compute the shortest-path tree for each active source
3.3. The Motivation for Shared
The algorithms described in the previous sections clearly
the need for a multicastalgorithm(s) that is more scalable. CBT
designed primarily to address the topic of scalability; a shared architecture offers an improvement in scalability over source
architectures by a factor of the number of active sources (
source is usually a subnetworkaggregate). Source trees scale O(S *
G), since a distinctdelivery tree is built per active source.
trees eliminate the source (S) scaling factor; all sources use
same shared tree, and hence a shared tree scales O(G).
implication of this is that applications with many active senders
such as distributedinteractivesimulationapplications, distributed video-gaming (where most receivers are also senders),
have a significantly lesser impact on underlying multicast routing
shared trees are used
In the "back of the envelope" table below we compare the amount
state required by CBT and DVMRP for different group sizes different numbers of active sources
Shared trees also incur significantbandwidth and state
compared with source trees; firstly, the tree only spans a group'
receivers (including links/routers leading to receivers) -- there
no cost to routers/links in other parts of the network. Secondly
routers between a non-member sender and the delivery tree are
incurred any cost pertaining to multicast, and indeed, these
need not even be multicast-capable -- packets from non-member
are encapsulated and unicast to a core on the tree
b b b-----
\ | |
\ | |
b---b b------
/ \ / KEY....
/ \/
b X---b-----b X =
/ \ b = on-tree
/ \
/ \
b b------
/ \ |
/ \ |
b b
Figure 2: CBT
4. CBT - The New
4.1. Design
The CBT shared tree design was geared towards several
objectives
o scalability - the CBT designers decided not to sacrifice CBT'
O(G) scaling characteric to optimize delay using SPTs, as
PIM. This was an important design decision, and one, we think
was taken with foresight; once multicasting becomes ubiquitous
router state maintenance will be a predominant scaling factor
It is possible in some circumstances to improve/optimize
delay of shared trees by other means. For example, a broadcast
type lecture with a single sender (or limited set
infrequently changing senders) could have its core placed in locality of the sender, allowing the CBT to emulate a shortest
path tree (SPT) whilst still maintaining its O(G)
characteristic. More generally, because CBT does not
source-specific state, it is particularly suited to many applications
o robustness - source-based tree algorithms are clearly robust;
sender simply sends its data, and intervening routers "conspire
to get the data where it needs to, creating state along the way
This is the so-called "data driven" approach -- there is
set-up protocol involved
It is not as easy to achieve the same degree of robustness
shared tree algorithms; a shared tree's core router connectivity between all group members, and is thus a
point of failure. Protocol mechanisms must be present
ensure a core failure is detected quickly, and the
reconnected quickly using a replacement core router
o simplicity - the CBT protocol is relatively simple compared
most other multicast routing protocols. This simplicity can
to enhancedperformance compared to other protocols
The CBT protocol is designed to build and maintain a shared distribution tree that spans only those networks and links leading
interested receivers
To achieve this, a host first expresses its interest in joining
group by multicasting an IGMP host membership report [5] across attached link. On receiving this report, a local CBT aware
invokes the tree joining process (unless it has already)
generating a JOIN_REQUEST message, which is sent to the next hop
the path towards the group's core router (how the local
discovers which core to join is discussed in section 6). This
message must be explicitly acknowledged (JOIN_ACK) either by the
router itself, or by another router that is on the unicast
between the sending router and the core, which itself has successfully joined the tree
router that originated the join message, the new receiver can
traffic sent to the group
Loops cannot be created in a CBT tree because a) there is only
active core per group, and b) tree building/maintenance
which may lead to the creation of tree loops are avoided.
example, if a router's upstream neighbour becomes unreachable,
router immediately "flushes" all of its downstream branches,
them to individually rejoin if necessary. Transient unicast loops
not pose a threat because a new join message that loops back
itself will never get acknowledged, and thus eventually times out
The state created in routers by the sending or receiving of
JOIN_ACK is bi-directional - data can flow either way along a
"branch", and the state is group specific - it consists of the
address and a list of local interfaces over which join messages
the group have previously been acknowledged. There is no concept
"incoming" or "outgoing" interfaces, though it is necessary to
able to distinguish the upstreaminterface from any
interfaces. In CBT, these interfaces are known as the "parent"
"child" interfaces, respectively
Each router that comprises a CBT multicast tree, except the
router, is responsible for maintaining its upstream link, provided
has interested downstream receivers, i.e. the child interface list
not NULL. A child interface is one over which a member host
directly attached, or one over which a downstream on-tree router attached. This "tree maintenance" is achieved by each
router periodically sending a "keepalive" message (ECHO_REQUEST)
its upstream neighbour, i.e. its parent router on the tree.
keepalive message is sent to represent entries with the same parent
thereby improving scalability on links which are shared by
groups. On multicast capable links, a keepalive is multicast to
"all-cbt-routers" group (IANA assigned as 224.0.0.15); this has
The ECHO_REQUEST does not contain any group information;
ECHO_REPLY does, but only periodically. To maintain information between parent and child, the parent
reports, in a ECHO_REPLY, all groups for which it has state,
each of its child interfaces for those groups. This group-
echo reply is not prompted explicitly by the receipt of an
request message. A child is notified of the time to expect the
echo reply message containing group information in an echo
prompted by a child's echo request. The frequency of parent reporting is at the granularity of minutes
It cannot be assumed all of the routers on a multi-access link have
uniform view of unicast routing; this is particularly the case when
multi-access link spans two or more unicast routing domains.
could lead to multipleupstream tree branches being formed (an condition) unless steps are taken to ensure all routers on the
agree which is the upstream router for a particular group.
routers attached to a multi-access link participate in an electionmechanism that elects a single router, the designated
(DR), as the link's upstream router for all groups. Since the
might not be the link's best next-hop for a particular core router
this may result in join messages being re-directed back across
multi-access link. If this happens, the re-directed join message
unicast across the link by the DR to the best next-hop,
preventing a looping scenario. This re-direction only ever
to join messages. Whilst this is suboptimal for join messages,
are generated infrequently, multicast data never traverses a
more than once (either natively, or encapsulated).
Certain CBT control messages illicit a response of some sort. Lack response may be due to an upstream router crashing, or the loss
the original message, or its response. To detect these events,
retransmits those control messages for which it expects a response
if that response is not forthcoming within the retransmission interval, which varies depending on the type of message involved
There is an upper bound (typically 3) on the number
retransmissions of the original message before an exception
is raised
If a non-member sender's local router is already on-tree for
group being sent to, the subnet's upstream router simply forwards
data packet over all outgoing interfaces corresponding to
group's forwarding cache entry. This is in contrast to PIM-SM [18]
which must encapsulate data from a non-member sender, irrespective
whether the local router has joined the tree. This is due to PIM'
uni-directional state
If the sender's subnet is not attached to the group tree, the
DR must encapsulate the data packet and unicast it to the group'
core router, where it is decapsulated and disseminated over all
interfaces, as specified by the core's forwarding cache entry for
group. The data packet encapsulation method is IP-in-IP [14].
Routers in between a non-member sender and the group's core need
know anything about the multicast group, and indeed may even multicast-unaware. This makes CBT particulary attractive applications with non-member senders
The interoperability mechanisms for interfacing CBT with DVMRP
defined in [15].
6. Core Router
Core router discovery is by far the most controversial and
aspect of shared tree multicast architectures, particularly in
context of inter-domain multicast routing (IDMR). There have
many proposals over the past three years or so, including
core addresses in a multicast session directory like "sdr" [11],
manual placement, and the HPIM [12] approach of strictly dividing
Manual configuration of leaf routers with mappings
the other option (note: leaf routers only); this imposes a degree administrative burden - the mapping for a particular group must
coordinated across all leaf routers to ensure consistency. Hence
this method does not scale particularly well. However, it is
that "better" trees will result from this method, and it is also
only available option for inter-domain core discovery available
A subset of the domain's routers are configured to be CBT
core routers. Each candidate core router periodically (default
60 secs) advertises itself to the domain's Bootstrap Router (BSR),
using "Core Advertisement" messages. The BSR is itself
dynamically from all (or participating) routers in the domain.
domain's elected BSR collects "Core Advertisement" messages candidate core routers and periodically advertises a candidate
set (CC-set) to each other router in the domain, using
hopby-hop unicast forwarding. The BSR uses "BootstrapMessages" advertise the CC-set. Together, "Core Advertisements" and " Messages" comprise the "bootstrap" protocol
When a router receives an IGMP host membership report from one of
directly attached hosts, the local router uses a hash function on reported group address, the result of which is used as an index
the CC-set. This is how local routers discover which core to use
a particular group
Note the hash function is specifically tailored such that a
number of consecutive groups always hash to the same core
Furthermore, bootstrapmessages can carry a "group mask",
limiting a CC-set to a particular range of groups. This can
reduce traffic concentration at the core
If a BSR detects a particular core as being unreachable (it has
announced its availability within some period), it deletes relevant core from the CC-set sent in its next bootstrap message
This is how a local router discovers a group's core is unreachable
the router must re-hash for each affected group and join the new
after removing the old state. The removal of the "old" state
the sending of a QUIT_NOTIFICATIONupstream, and a FLUSH_TREE downstream
Beyond these, little published work exists on the topic of security
Special thanks goes to Paul Francis, NTT Japan, for the
brainstorming sessions that brought about this work
Clay Shields' work on OCBT [17] identified various failure
with a multi-core architecture, resulting in the specification of
single core architecture
Others that have contributed to the progress of CBT include
Carlberg, Eric Crawley, Jon Crowcroft, Mark Handley, Ahmed Helmy
Nitin Jain, Alan O'Neill, Steven Ostrowsksi, Radia Perlman,
Reeve, Benny Rodrig, Martin Tatham, Dave Thaler, Sue Thompson,
White, and other participants of the IETF IDMR working group
Thanks also to 3Com Corporation and British Telecom Plc for
this work
[17] The Ordered Core Based Tree Protocol; C. Shields and J.J
Garcia- Luna-Aceves; In Proceedings of IEEE Infocom'97, Kobe, Japan
April 1997; http://www.cse.ucsc.edu/research/ccrg/publications/info
comm97ocbt.ps.
if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.