As per Relevance of the word structure, we have this rfc below:






RFC: 817



MODULARITY AND EFFICIENCY IN PROTOCOL

David D.
MIT Laboratory for Computer
Computer Systems and Communications
July, 1982


1.


Many protocol implementers have made the unpleasant discovery

their packages do not run quite as fast as they had hoped. The

for this widely observed problem has been attributed to a variety

causes, ranging from details in the design of the protocol to

underlying structure of the host operating system. This RFC

discuss some of the commonly encountered reasons why

implementations seem to run slowly


Experience suggests that one of the most important factors

determining the performance of an implementation is the manner in

that implementation is modularized and integrated into the

operating system. For this reason, it is useful to discuss the

of how an implementation is structured at the same time that we

how it will perform. In fact, this RFC will argue that modularity

one of the chief villains in attempting to obtain good performance,

that the designer is faced with a delicate and inevitable

between good structure and good performance. Further, the single

which most strongly determines how well this conflict can be resolved

not the protocol but the operating system

2


2. Efficiency


There are many aspects to efficiency. One aspect is sending

at minimum transmission cost, which is a critical aspect of

carrier communications, if not in local area network communications

Another aspect is sending data at a high rate, which may not be

at all if the net is very slow, but which may be the one central

constraint when taking advantage of a local net with high raw bandwidth

The final consideration is doing the above with minimum expenditure

computer resources. This last may be necessary to achieve high speed

but in the case of the slow net may be important only in that

resources used up, for example cpu cycles, are costly or

needed. It is worth pointing out that these different goals

conflict; for example it is often possible to trade off efficient use

the computer against efficient use of the network. Thus, there may

no such thing as a successful general purpose protocol implementation


The simplest measure of performance is throughput, measured in

per second. It is worth doing a few simple computations in order to

a feeling for the magnitude of the problems involved. Assume that

is being sent from one machine to another in packets of 576 bytes,

maximum generally acceptable internet packet size. Allowing for

overhead, this packet size permits 4288 bits in each packet. If

useful throughput of 10,000 bits per second is desired, then a

bearing packet must leave the sending host about every 430 milliseconds

a little over two per second. This is clearly not difficult to achieve

However, if one wishes to achieve 100 kilobits per second throughput

3


the packet must leave the host every 43 milliseconds, and to achieve

megabit per second, which is not at all unreasonable on a high-

local net, the packets must be spaced no more than 4.3 milliseconds


These latter numbers are a slightly more alarming goal for which

set one's sights. Many operating systems take a substantial fraction

a millisecond just to service an interrupt. If the protocol has

structured as a process, it is necessary to go through a

scheduling before the protocol code can even begin to run. If any

of a protocol package or its data must be fetched from disk, real

delays of between 30 to 100 milliseconds can be expected. If

protocol must compete for cpu resources with other processes of

system, it may be necessary to wait a scheduling quantum before

protocol can run. Many systems have a scheduling quantum of 100

milliseconds or more. Considering these sorts of numbers, it

immediately clear that the protocol must be fitted into the

system in a thorough and effective manner if any like

throughput is to be achieved


There is one obvious conclusion immediately suggested by even

simple analysis. Except in very special circumstances, when

packets are being processed at once, the cost of processing a packet

dominated by factors, such as cpu scheduling, which are independent

the packet size. This suggests two general rules which

implementation ought to obey. First, send data in large packets

Obviously, if processing time per packet is a constant, then

will be directly proportional to the packet size. Second, never send

4


unneeded packet. Unneeded packets use up just as many resources as

packet full of data, but perform no useful function. RFC 813, "

and Acknowledgement Strategy in TCP", discusses one aspect of

the number of packets sent per useful data byte. This document

mention other attacks on the same problem


The above analysis suggests that there are two main parts to

problem of achieving good protocol performance. The first has to

with how the protocol implementation is integrated into the

operating system. The second has to do with how the protocol

itself is organized internally. This document will consider each

these topics in turn


3. The Protocol vs. the Operating


There are normally three reasonable ways in which to add a

to an operating system. The protocol can be in a process that

provided by the operating system, or it can be part of the kernel of

operating system itself, or it can be put in a separate

processor or front end machine. This decision is strongly influenced

details of hardware architecture and operating system design; each

these three approaches has its own advantages and disadvantages


The "process" is the abstraction which most operating systems

to provide the execution environment for user programs. A very

path for implementing a protocol is to obtain a process from

operating system and implement the protocol to run in it

Superficially, this approach has a number of advantages.

5


modifications to the kernel are not required, the job can be done

someone who is not an expert in the kernel structure. Since it is

impossible to find somebody who is experienced both in the structure

the operating system and the structure of the protocol, this path,

a management point of view, is often extremely appealing. Unfortunately

putting a protocol in a process has a number of disadvantages,

to both structure and performance. First, as was discussed above

process scheduling can be a significant source of real-time delay

There is not only the actual cost of going through the scheduler,

the problem that the operating system may not have the right sort

priority tools to bring the process into execution quickly

there is work to be done


Structurally, the difficulty with putting a protocol in a

is that the protocol may be providing services, for example support

data streams, which are normally obtained by going to special

entry points. Depending on the generality of the operating system,

may be impossible to take a program which is accustomed to

through a kernel entry point, and redirect it so it is reading the

from a process. The most extreme example of this problem occurs

implementing server telnet. In almost all systems, the device

for the locally attached teletypes is located inside the kernel,

programs read and write from their teletype by making kernel calls.

server telnet is implemented in a process, it is then necessary to

the data streams provided by server telnet and somehow get them

down inside the kernel so that they mimic the interface provided

local teletypes. It is usually the case that special

6


modification is necessary to achieve this structure, which

defeats the benefit of having removed the protocol from the kernel

the first place


Clearly, then, there are advantages to putting the protocol

in the kernel. Structurally, it is reasonable to view the network as

device, and device drivers are traditionally contained in the kernel

Presumably, the problems associated with process scheduling can

sidesteped, at least to a certain extent, by placing the code inside

kernel. And it is obviously easier to make the server telnet

mimic the local teletype channels if they are both realized in the

level in the kernel


However, implementation of protocols in the kernel has its own

of pitfalls. First, network protocols have a characteristic which

shared by almost no other device: they require rather complex

to be performed as a result of a timeout. The problem with

requirement is that the kernel often has no facility by which a

can be brought into execution as a result of the timer event. What

really needed, of course, is a special sort of process inside

kernel. Most systems lack this mechanism. Failing that, the

execution mechanism available is to run at interrupt time


There are substantial drawbacks to implementing a protocol to

at interrupt time. First, the actions performed may be somewhat

and time consuming, compared to the maximum amount of time that

operating system is prepared to spend servicing an interrupt.

can arise if interrupts are masked for too long. This is

7


bad when running as a result of a clock interrupt, which can imply

the clock interrupt is masked. Second, the environment provided by

interrupt handler is usually extremely primitive compared to

environment of a process. There are usually a variety of

facilities which are unavailable while running in an interrupt handler

The most important of these is the ability to suspend execution

the arrival of some event or message. It is a cardinal rule of

every known operating system that one must not invoke the

while running in an interrupt handler. Thus, the programmer who

forced to implement all or part of his protocol package as an

handler must be the best sort of expert in the operating

involved, and must be prepared for development sessions filled

obscure bugs which crash not just the protocol package but the

operating system


A final problem with processing at interrupt time is that

system scheduler has no control over the percentage of system time

by the protocol handler. If a large number of packets arrive, from

foreign host that is either malfunctioning or fast, all of the time

be spent in the interrupt handler, effectively killing the system


There are other problems associated with putting protocols into

operating system kernel. The simplest problem often encountered is

the kernel address space is simply too small to hold the piece of

in question. This is a rather artificial sort of problem, but it is

severe problem none the less in many machines. It is an

unpleasant experience to do an implementation with the knowledge

8


for every byte of new feature put in one must find some other byte

old feature to throw out. It is hopeless to expect an effective

general implementation under this kind of constraint. Another

is that the protocol package, once it is thoroughly entwined in

operating system, may need to be redone every time the operating

changes. If the protocol and the operating system are not maintained

the same group, this makes maintenance of the protocol package

perpetual headache


The third option for protocol implementation is to take

protocol package and move it outside the machine entirely, on to

separate processor dedicated to this kind of task. Such a machine

often described as a communications processor or a front-end processor

There are several advantages to this approach. First, the

system on the communications processor can be tailored for

this kind of task. This makes the job of implementation much easier

Second, one does not need to redo the task for every machine to

the protocol is to be added. It may be possible to reuse the

front-end machine on different host computers. Since the task need

be done as many times, one might hope that more attention could be

to doing it right. Given a careful implementation in an

which is optimized for this kind of task, the resulting package

turn out to be very efficient. Unfortunately, there are also

with this approach. There is, of course, a financial problem

with buying an additional computer. In many cases, this is not

problem at all since the cost is negligible compared to what

programmer would cost to do the job in the mainframe itself.

9


fundamentally, the communications processor approach does not

sidestep any of the problems raised above. The reason is that

communications processor, since it is a separate machine, must

attached to the mainframe by some mechanism. Whatever that mechanism

code is required in the mainframe to deal with it. It can be

that the program to deal with the communications processor is

than the program to implement the entire protocol package. Even if

is so, the communications processor interface package is still

protocol in nature, with all of the same structural problems. Thus,

of the issues raised above must still be faced. In addition to

problems, there are some other, more subtle problems associated with

outboard implementation of a protocol. We will return to these

later


There is a way of attaching a communications processor to

mainframe host which sidesteps all of the mainframe

problems, which is to use some preexisting interface on the host

as the port by which a communications processor is attached.

strategy is often used as a last stage of desperation when the

on the host computer is so intractable that it cannot be changed in

way. Unfortunately, it is almost inevitably the case that all of

available interfaces are totally unsuitable for this purpose, so

result is unsatisfactory at best. The most common way in which

form of attachment occurs is when a network connection is being used

mimic local teletypes. In this case, the front-end processor can

attached to the mainframe by simply providing a number of wires out

the front-end processor, each corresponding to a connection, which

10


plugged into teletype ports on the mainframe computer. (Because of

appearance of the physical configuration which results from

arrangement, Michael Padlipsky has described this as the "

machine" approach to computer networking.) This strategy solves

immediate problem of providing remote access to a host, but it

extremely inflexible. The channels being provided to the host

restricted by the host software to one purpose only, remote login.

is impossible to use them for any other purpose, such as file

or sending mail, so the host is integrated into the network

in an extremely limited and inflexible manner. If this is the best

can be done, then it should be tolerated. Otherwise,

should be strongly encouraged to take a more flexible approach


4. Protocol


The previous discussion suggested that there was a decision to

made as to where a protocol ought to be implemented. In fact,

decision is much more complicated than that, for the goal is not

implement a single protocol, but to implement a whole family of

layers, starting with a device driver or local network driver at

bottom, then IP and TCP, and eventually reaching the

specific protocol, such as Telnet, FTP and SMTP on the top. Clearly

the bottommost of these layers is somewhere within the kernel, since

physical device driver for the net is almost inevitably located there

Equally clearly, the top layers of this package, which provide the

his ability to perform the remote login function or to send mail,

not entirely contained within the kernel. Thus, the question is

11


whether the protocol family shall be inside or outside the kernel,

how it shall be sliced in two between that part inside and that

outside


Since protocols come nicely layered, an obvious proposal is

one of the layer interfaces should be the point at which the inside

outside components are sliced apart. Most systems have been

in this way, and many have been made to work quite effectively.

obvious place to slice is at the upper interface of TCP. Since

provides a bidirectional byte stream, which is somewhat similar to

I/O facility provided by most operating systems, it is possible to

the interface to TCP almost mimic the interface to other

devices. Except in the matter of opening a connection, and dealing

peculiar failures, the software using TCP need not know that it is

network connection, rather than a local I/O stream that is providing

communications function. This approach does put TCP inside the kernel

which raises all the problems addressed above. It also raises

problem that the interface to the IP layer can, if the programmer is

careful, become excessively buried inside the kernel. It must

remembered that things other than TCP are expected to run on top of IP

The IP interface must be made accessible, even if TCP sits on top of

inside the kernel


Another obvious place to slice is above Telnet. The advantage

slicing above Telnet is that it solves the problem of having

login channels emulate local teletype channels. The disadvantage

putting Telnet into the kernel is that the amount of code which has

12


been included there is getting remarkably large. In some

implementations, the size of the network package, when one

protocols at the level of Telnet, rivals the size of the rest of

supervisor. This leads to vague feelings that all is not right


Any attempt to slice through a lower layer boundary, for

between internet and TCP, reveals one fundamental problem. The

layer, as well as the IP layer, performs a demultiplexing function

incoming datagrams. Until the TCP header has been examined, it is

possible to know for which user the packet is ultimately destined

Therefore, if TCP, as a whole, is moved outside the kernel, it

necessary to create one separate process called the TCP process,

performs the TCP multiplexing function, and probably all of the rest

TCP processing as well. This means that incoming data destined for

user process involves not just a scheduling of the user process,

scheduling the TCP process first


This suggests an alternative structuring strategy which

through the protocols, not along an established layer boundary,

along a functional boundary having to do with demultiplexing. In

approach, certain parts of IP and certain parts of TCP are placed in

kernel. The amount of code placed there is sufficient so that when

incoming datagram arrives, it is possible to know for which process

datagram is ultimately destined. The datagram is then routed

to the final process, where additional IP and TCP processing

performed on it. This removes from the kernel any requirement for

based actions, since they can be done by the process provided by

13


user. This structure has the additional advantage of reducing

amount of code required in the kernel, so that it is suitable

systems where kernel space is at a premium. The RFC 814, titled "Names

Addresses, Ports, and Routes," discusses this rather orthogonal

strategy in more detail


A related discussion of protocol layering and multiplexing can

found in Cohen and Postel [1].


5. Breaking Down the


In fact, the implementor should be sensitive to the possibility

even more peculiar slicing strategies in dividing up the

protocol layers between the kernel and the one or more user processes

The result of the strategy proposed above was that part of TCP

execute in the process of the user. In other words, instead of

one TCP process for the system, there is one TCP process per connection

Given this architecture, it is not longer necessary to imagine that

of the TCPs are identical. One TCP could be optimized for

throughput applications, such as file transfer. Another TCP could

optimized for small low delay applications such as Telnet. In fact,

would be possible to produce a TCP which was somewhat integrated

the Telnet or FTP on top of it. Such an integration is

important, for it can lead to a kind of efficiency which

traditional structures are incapable of producing. Earlier, this

pointed out that one of the important rules to achieving efficiency

to send the minimum number of packets for a given amount of data.

idea of protocol layering interacts very strongly (and poorly) with

14


goal, because independent layers have independent ideas about

packets should be sent, and unless these layers can somehow be

into cooperation, additional packets will flow. The best example

this is the operation of server telnet in a character at a time

echo mode on top of TCP. When a packet containing a character

at a server host, each layer has a different response to that packet

TCP has an obligation to acknowledge the packet. Either server

or the application layer above has an obligation to echo the

received in the packet. If the character is a Telnet control sequence

then Telnet has additional actions which it must perform in response

the packet. The result of this, in most implementations, is

several packets are sent back in response to the one arriving packet

Combining all of these return messages into one packet is important

several reasons. First, of course, it reduces the number of

being sent over the net, which directly reduces the charges incurred

many common carrier tariff structures. Second, it reduces the number

scheduling actions which will occur inside both hosts, which, as

discussed above, is extremely important in improving throughput


The way to achieve this goal of packet sharing is to break down

barrier between the layers of the protocols, in a very restrained

careful manner, so that a limited amount of information can leak

the barrier to enable one layer to optimize its behavior with respect

the desires of the layers above and below it. For example, it

represent an improvement if TCP, when it received a packet, could

the layer above whether or not it would be worth pausing for a

milliseconds before sending an acknowledgement in order to see if

15


upper layer would have any outgoing data to send. Dallying

sending the acknowledgement produces precisely the right sort

optimization if the client of TCP is server Telnet. However,

before sending an acknowledgement is absolutely unacceptable if TCP

being used for file transfer, for in file transfer there is almost

data flowing in the reverse direction, and the delay in sending

acknowledgement probably translates directly into a delay in

the next packets. Thus, TCP must know a little about the layers

it to adjust its performance as needed


It would be possible to imagine a general purpose TCP which

equipped with all sorts of special mechanisms by which it would

the layer above and modify its behavior accordingly. In the

suggested above, in which there is not one but several TCPs, the TCP

simply be modified so that it produces the correct behavior as a

of course. This structure has the disadvantage that there will

several implementations of TCP existing on a single machine, which

mean more maintenance headaches if a problem is found where TCP needs

be changed. However, it is probably the case that each of the TCPs

be substantially simpler than the general purpose TCP which

otherwise have been built. There are some experimental

currently under way which suggest that this approach may make

of a TCP, or almost any other layer, substantially easier, so that

total effort involved in bringing up a complete package is actually

if this approach is followed. This approach is by no means

accepted, but deserves some consideration

16


The general conclusion to be drawn from this sort of

is that a layer boundary has both a benefit and a penalty. A

layer boundary, with a well specified interface, provides a form

isolation between two layers which allows one to be changed with

confidence that the other one will not stop working as a result

However, a firm layer boundary almost inevitably leads to

operation. This can easily be seen by analogy with other aspects

operating systems. Consider, for example, file systems. A

operating system provides a file system, which is a highly

representation of a disk. The interface is highly formalized,

presumed to be highly stable. This makes it very easy for naive

to have access to disks without having to write a great deal

software. The existence of a file system is clearly beneficial. On

other hand, it is clear that the restricted interface to a file

almost inevitably leads to inefficiency. If the interface is

as a sequential read and write of bytes, then there will be people

wish to do high throughput transfers who cannot achieve their goal.

the interface is a virtual memory interface, then other users

regret the necessity of building a byte stream interface on top of

memory mapped file. The most objectionable inefficiency results when

highly sophisticated package, such as a data base management package

must be built on top of an existing operating system.

inevitably, the implementors of the database system attempt to

the file system and obtain direct access to the disks. They

sacrificed modularity for efficiency


The same conflict appears in networking, in a rather extreme form

17


The concept of a protocol is still unknown and frightening to most

programmers. The idea that they might have to implement a protocol,

even part of a protocol, as part of some application package, is

dreadful thought. And thus there is great pressure to hide the

of the net behind a very hard barrier. On the other hand, the kind

inefficiency which results from this is a particularly undesirable

of inefficiency, for it shows up, among other things, in increasing

cost of the communications resource used up to achieve the

goal. In cases where one must pay for one's communications costs,

usually turn out to be the dominant cost within the system. Thus,

an excessively good job of packaging up the protocols in an

manner has a direct impact on increasing the cost of the

resource within the system. This is a dilemma which will probably

be solved when programmers become somewhat less alarmed about protocols

so that they are willing to weave a certain amount of protocol

into their application program, much as application programs today

parts of database management systems into the structure of

application program


An extreme example of putting the protocol package behind a

layer boundary occurs when the protocol package is relegated to a front

end processor. In this case the interface to the protocol is some

protocol. It is difficult to imagine how to build close

between layers when they are that far separated. Realistically, one

the prices which must be associated with an implementation so

modularized is that the performance will suffer as a result. Of course

a separate processor for protocols could be very closely integrated

18


the mainframe architecture, with interprocessor co-ordination signals

shared memory, and similar features. Such a physical modularity

work very well, but there is little documented experience with

closely coupled architecture for protocol support


6. Efficiency of Protocol


To this point, this document has considered how a protocol

should be broken into modules, and how those modules should

distributed between free standing machines, the operating system kernel

and one or more user processes. It is now time to consider the

half of the efficiency question, which is what can be done to speed

execution of those programs that actually implement the protocols.

will make some specific observations about TCP and IP, and then

with a few generalities


IP is a simple protocol, especially with respect to the

of normal packets, so it should be easy to get it to

efficiently. The only area of any complexity related to actual

processing has to do with fragmentation and reassembly. The reader

referred to RFC 815, titled "IP Datagram Reassembly Algorithms",

specific consideration of this point


Most costs in the IP layer come from table look up functions,

opposed to packet processing functions. An outgoing packet requires

translation functions to be performed. The internet address must

translated to a target gateway, and a gateway address must be

to a local network number (if the host is attached to more than

19


network). It is easy to build a simple implementation of these

look up functions that in fact performs very poorly. The

should keep in mind that there may be as many as a thousand

numbers in a typical configuration. Linear searching of a

entry table on every packet is extremely unsuitable. In fact, it may

worth asking TCP to cache a hint for each connection, which can

handed down to IP each time a packet is sent, to try to avoid

overhead of a table look up


TCP is a more complex protocol, and presents many

opportunities for getting things wrong. There is one area which

generally accepted as causing noticeable and substantial overhead

part of TCP processing. This is computation of the checksum. It

be nice if this cost could be avoided somehow, but the idea of an end

to-end checksum is absolutely central to the functioning of TCP.

host implementor should think of omitting the validation of a

on incoming data


Various clever tricks have been used to try to minimize the cost

computing the checksum. If it is possible to add additional

instructions to the machine, a checksum instruction is the most

candidate. Since computing the checksum involves picking up every

of the segment and examining it, it is possible to combine the

of computing the checksum with the operation of copying the segment

one location to another. Since a number of data copies are

already required as part of the processing structure, this kind

sharing might conceivably pay off if it didn't cause too much trouble

20


the modularity of the program. Finally, computation of the

seems to be one place where careful attention to the details of

algorithm used can make a drastic difference in the throughput of

program. The Multics system provides one of the best case studies

this, since Multics is about as poorly organized to perform

function as any machine implementing TCP. Multics is a 36-bit

machine, with four 9-bit bytes per word. The eight-bit bytes of a

segment are laid down packed in memory, ignoring word boundaries.

means that when it is necessary to pick up the data as a set of 16-

units for the purpose of adding them to compute checksums,

masking and shifting is required for each 16-bit value. An

version of a program using this strategy required 6 milliseconds

checksum a 576-byte segment. Obviously, at this point,

computation was becoming the central bottleneck to throughput. A

careful recoding of this algorithm reduced the checksum processing

to less than one millisecond. The strategy used was extremely dirty

It involved adding up carefully selected words of the area in which

data lay, knowing that for those particular words, the 16-bit

were properly aligned inside the words. Only after the addition

been done were the various sums shifted, and finally added to

the eventual checksum. This kind of highly specialized programming

probably not acceptable if used everywhere within an operating system

It is clearly appropriate for one highly localized function which can

clearly identified as an extreme performance bottleneck


Another area of TCP processing which may cause performance

is the overhead of examining all of the possible flags and options

21


occur in each incoming packet. One paper, by Bunch and Day [2],

that the overhead of packet header processing is actually an

limiting factor in throughput computation. Not all

experiments have tended to support this result. To whatever extent

is true, however, there is an obvious strategy which the

ought to use in designing his program. He should build his program

optimize the expected case. It is easy, especially when first

a program, to pay equal attention to all of the possible outcomes

every test. In practice, however, few of these will ever happen. A

should be built on the assumption that the next packet to arrive

have absolutely nothing special about it, and will be the next

expected in the sequence space. One or two tests are sufficient

determine that the expected set of control flags are on. (The ACK

should be on; the Push flag may or may not be on. No other flags

be on.) One test is sufficient to determine that the sequence number

the incoming packet is one greater than the last sequence

received. In almost every case, that will be the actual result. Again

using the Multics system as an example, failure to optimize the case

receiving the expected sequence number had a detectable effect on

performance of the system. The particular problem arose when a

of packets arrived at once. TCP attempted to process all of

packets before awaking the user. As a result, by the time the

packet arrived, there was a threaded list of packets which had

items on it. When a new packet arrived, the list was searched to

the location into which the packet should be inserted. Obviously,

list should be searched from highest sequence number to lowest

22


number, because one is expecting to receive a packet which comes

those already received. By mistake, the list was searched from front

back, starting with the packets with the lowest sequence number.

amount of time spent searching this list backwards was easily

in the metering measurements


Other data structures can be organized to optimize the action

is normally taken on them. For example, the retransmission queue

very seldom actually used for retransmission, so it should not

organized to optimize that action. In fact, it should be organized

optimized the discarding of things from it when the

arrives. In many cases, the easiest way to do this is not to save

packet at all, but to reconstruct it only if it needs to

retransmitted, starting from the data as it was originally buffered

the user


There is another generality, at least as important as

the common case, which is to avoid copying data any more times

necessary. One more result from the Multics TCP may prove

here. Multics takes between two and three milliseconds within the

layer to process an incoming packet, depending on its size. For a 576-

byte packet, the three milliseconds is used up approximately as follows

One millisecond is used computing the checksum. Six

microseconds is spent copying the data. (The data is copied twice,

.3 milliseconds a copy.) One of those copy operations could

be included as part of the checksum cost, since it is done to get

data on a known word boundary to optimize the checksum algorithm

23


However, the copy also performs another necessary transfer at the

time. Header processing and packet resequencing takes .7 milliseconds

The rest of the time is used in miscellaneous processing, such

removing packets from the retransmission queue which are acknowledged

this packet. Data copying is the second most expensive single

after data checksuming. Some implementations, often because of

excessively layered modularity, end up copying the data around a

deal. Other implementations end up copying the data because there is

shared memory between processes, and the data must be moved from

to process via a kernel operation. Unless the amount of this

is kept strictly under control, it will quickly become the

performance bottleneck


7.


This document has addressed two aspects of obtaining

from a protocol implementation, the way in which the protocol is

and integrated into the operating system, and the way in which

detailed handling of the packet is optimized. It would be nice if

or the other of these costs would completely dominate, so that all

one's attention could be concentrated there. Regrettably, this is

so. Depending on the particular sort of traffic one is getting,

example, whether Telnet one-byte packets or file transfer maximum

packets at maximum speed, one can expect to see one or the other

being the major bottleneck to throughput. Most implementors who

studied their programs in an attempt to find out where the time

going have reached the unsatisfactory conclusion that it is

24


equally to all parts of their program. With the possible exception

checksum processing, very few people have ever found that

performance problems were due to a single, horrible bottleneck

they could fix by a single stroke of inventive programming. Rather,

performance was something which was improved by painstaking tuning

the entire program


Most discussions of protocols begin by introducing the concept

layering, which tends to suggest that layering is a

wonderful idea which should be a part of every consideration

protocols. In fact, layering is a mixed blessing. Clearly, a

interface is necessary whenever more than one client of a

layer is to be allowed to use that same layer. But an interface

precisely because it is fixed, inevitably leads to a lack of

understanding as to what one layer wishes to obtain from another.

has to lead to inefficiency. Furthermore, layering is a potential

in that one is tempted to think that a layer boundary, which was

artifact of the specification procedure, is in fact the proper

to use in modularizing the implementation. Again, in certain cases,

architected layer must correspond to an implemented layer, precisely

that several clients can have access to that layer in a

straightforward manner. In other cases, cunning rearrangement of

implemented module boundaries to match with various functions, such

the demultiplexing of incoming packets, or the sending of

outgoing packets, can lead to unexpected performance

compared to more traditional implementation strategies. Finally,

performance is something which is difficult to retrofit onto an

25


program. Since performance is influenced, not just by the fine detail

but by the gross structure, it is sometimes the case that in order

obtain a substantial performance improvement, it is necessary

completely redo the program from the bottom up. This is a

disappointment to programmers, especially those doing a

implementation for the first time. Programmers who are

inexperienced and unfamiliar with protocols are sufficiently

with getting their program logically correct that they do not have

capacity to think at the same time about the performance of

structure they are building. Only after they have achieved a

correct program do they discover that they have done so in a way

has precluded real performance. Clearly, it is more difficult to

a program thinking from the start about both logical correctness

performance. With time, as implementors as a group learn more about

appropriate structures to use for building protocols, it will

possible to proceed with an implementation project having

confidence that the structure is rational, that the program will work

and that the program will work well. Those of us now

protocols have the privilege of being on the forefront of this

process. It should be no surprise that our programs sometimes

from the uncertainty we bring to bear on them

26





[1] Cohen and Postel, "On Protocol Multiplexing", Sixth

Communications Symposium, ACM/IEEE, November 1979.


[2] Bunch and Day, "Control Structure Overhead in TCP", Trends

Applications: Computer Networking, NBS Symposium, May 1980.









if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.




RFC documents can be found at I.E.T.F.



Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX







Spectrum