As per Relevance of the word reliability, we have this rfc below:
NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
TIP/TENEX Reliability
RFC 636 J. Burchfiel - BBN-
B. Cosell - BBN-
NIC 30490 R. Tomlinson - BBN-
D. Walden - BBN-
10 June 1974
TIP/TENEX Reliability
During the past months we have felt strong pressure to improve
reliability of TIP/TENEX network connection as improvement in
reliability of users' connections between TENEXs and TIPs would
major impact on the appearance of overall network reliability due to
large number and high visibility of TENEXs and TIPs. Despite
emphasis on TIP/TENEX interaction, all work done applies equally well
interactions between Hosts of any type.
The remainder of this RFC gives a sketch of our plan for improving
reliability of connections bettween TIPs and TENEXs. Major portions
this plan have already been implemented (TIP version 322; TENEX
1.32) and are now undergoing final test prior to release throughout
network. Completion of the implementation of the plan is expected
the next quarter.
Our plan for improving the reliability of TIP/TENEX connections
concerned with obtaining and maintaining TIP/TENEX connections
gracefully recovering from lost connections, and providing
messages to the user whenever the state of his connection changes.
When a TIP user attempts to open a connection to any Host, the Host
be down. In this case it would be helpful to provide the user
information about the extent of the Host's unavailability. To
this, we modified the IMP program to accept and utilize information
a Host about when the Host will be back up and for what reason it
down. TENEX is to be modified to supply such information before it
down, or through manual means, after it has gone down. When the
user then attempts to connect to the down TENEX, the IMP local to
TENEX returns the information about why and for how long TENEX will
down. The TIP is to be modified to report this sort of information
the user; e.g., "Host unavailable because of hardware maintenance --
expected available Tuesday at 16:30 GMT".
The TIP's logger is presently not reentrant. Thus, no single TIP
can be allowed to tie up the logger for too long at a time; and the TIP
NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
TIP/TENEX Reliability
therefore enforces a timeout of arbitrary length (about 60 seconds)
logger use. However, a heavily loaded Host cannot be guaranteed
to respond within 60 seconds to a TIP login request, and at present
users sometimes cannot get connected to a heavily loaded TENEX.
correct this problem, the TIP logger will be made reentrant and
timeout on logger use will be eliminated.
One notorious soft spot in the Host/Host protocol which degrades
reliability of connections is the Host/Host protocol
allocate mechanism. Low frequency software bugs, intermittant
bugs, etc., can lead to the incremental allocates associated with
connection getting out of synchronization. When this happens it
appears to the user as if the connection just "hung up". A
addiition to the Host/Host protocol to allow connection allocates to
resynchronized has been designed and implemented for both the TIP
TENEX.
TENEX has a number of internal consistency checks (called "bughalts")
which occasionally cause TENEX to halt. Frequently, after diagnosis
system personnel, TENEX can be made to proceed without loss from
viewpoint of local users. A mechanism is being provided which
TENEX to proceed in this case from the point of view of TIP users
TENEX.
The appropriate mechanism entails the following: TENEX will not
its ready line during a bughalt (from which TENEX can usually
successfully), nor will it clear its NCP tables and abort
connections. Instead, after a bughalt TENEX will: discard the
it is currently receiving, as the IMP has returned an
Transmission to the source for this message; reinitialize the
to the IMP; and resynchronize, on all connections possible, Host/
protocol allocate inconsistencies due to lost messages, RFNMs etc.
latter is done with the same mechanism described above. This
is not guaranteed to save all data -- a tiny bit may be lost -- but
is of secondary importance to maintaining the connection over the
bughalt.
The TIP user must be kept fully informed as TENEX halts and
continues. Therefore, the TIP has been modified to report "Host
responding -- connection suspended" when it senses that TENEX has
(it does this by properly interpreting messages returned by
destination IMP). When TENEX resumes service after proceeding from
bughalt, the above procedure notifies the TIP that service is restored
and the TIP has been modified to report "Service resumed" to all
of that Host.
On the other hand, the service interruption may not be proceedable
1
NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
TIP/TENEX Reliability
TENEX may have to do a total system reload and restart. In this
TENEX will clear its NCP connection tables and send a Host/Host
reset command to all other Hosts. On receiving this reset command,
TIP will report "Host reset -- connection closed" to all users of
Host with suspended connections. The TIP user can then re-login to
TENEX or to some other Host.
Of couse, the user may not have the patience to wait for service
resume after a TENEX bughalt. Instead, he may unilaterally choose
connect to some other Host, ignoring the previously
connection. If TENEX is then able to proceed, its NCP will still
its connection to the TIP is good and suitable for use. Thus, we have
connection which the TIP thinks is closed and TENEX thinks is open,
phenomenon known as the "half-closed connection". An
procedure for cleanly completing the closing of such a connection
been specified and implemented for the TIP and TENEX.
Since TENEX will maintain connections across service interruptions,
TIP user will be required to take the security procedure telling the
to "forget" his suspended connection before abandoning his terminal
The command @H 0 (for example) will guarantee that his connection
not be reestablished on resumpption of service. Otherwise, his
would be left at the mercy of anyone who acquires that terminal.
An appendix follows which describes the Host/Host protocol changes made
These changes are backward compatible (with the exception that
which have not implemented these changes will sometimes
unrecognizable Host/Host protocol commands which they presumably
without suffering harm). These protocol changes are ad hoc in
but in light of their backward compatibility and potential utility,
okayed their addition to the TIP and TENEX NCPs without (we believe)
implication that other Hosts have to implement them (although we
encourage their widespread implementation).
2
NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
TIP/TENEX Reliability
Appendix - Ad Hoc Change to Host-Host Protocol
A.1 Introduction
The current Host-Host protocol (NIC #8246) contains no
for resynchronizing the status information kept at the two ends
each connection. In particular, if either host suffers a
interruption, or if a control message is lost or corrupted in
interface or in the subnet, the status information at the two
of the connection will be inconsistent.
Since the current protocol provides no way to correct
condition, the NCPs at the two ends stay "confused" forever.
occasional frustrating symptom of this effect is the "
allocate" phenomenon, where the receiving NCP believes that it
bit and message allocations outstanding, while the sending
believes that it does not have any allocation. As a result
information flow over that connection can never be restarted.
Use of the Host-Host RST (reset) command is inappropriate here,
it destroys all connections between the two hosts. What is
is a way to resynchronize only the affected connection
disturbing any others.
A second troublesome symptom of inconsistency in
information is the "half-closed" connection: after a
interruption or network partitioning, one NCP may believe that
connection is still open, while the other believes that
connection is closed (does not exist). When such an
is discovered, the "open" end of the connection should be closed
A.2 The RAR, RAS and RAP commands
To achieve resynchronization of allocation, we add the
three commands to the host-host protocol.
8 bits 8
-------------------
! ! !
16 ! RAR ! link !
! ! !
-------------------
Reset Allocation by
8 bits 8
-------------------
! ! !
3
NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
TIP/TENEX Reliability
17 ! RAS ! link !
! ! !
-------------------
Reset Allocation by
8 bits 8
-------------------
! ! !
20 ! RAP ! link !
! ! !
-------------------
Reset Allocation
The RAS command is sent from the Host sending on "link" to
Host receiving on "link". This command may be sent whenever
sending Host desires to resynch the status information
with the connection (and doesn't have a message in transit
the network). Some circumstances in which the sending Host
choose to do this are:
1) After a timeout when there is traffic to move but
allocation (assumes that an allocation has been lost);
2) When an inconsistent event occurs associated with
connection (e.g. an outstanding allocation in excess of 2^32
bits or 2^16 messages);
3) After the sending host has suffered an interruption
network service
4) In response to a RAP (see below).
The RAR command is sent from the Host receiving on "link" to
Host sending on "link" in response to an RAS. It marks
completion of the connection resynchronization. When the RAR
returned the connection is in the known state of having
messages in transit in either direction and the allocations
zero. The receiving Host may then start afresh with a
allocation and normal message transmission can proceed. Since
RAR may be sent ONLY in response to an RAS, there are no races
the resynchronization. All of the initiative lies with
sending Host.
If the receiving Host detects an anomalous situation, however
there is no way to inform the sending Host that
resynchronization is desirable. For this purpose, the RAP
is provided. It constitutes a "suggestion" on the part of
4
NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
TIP/TENEX Reliability
receiving Host that the sending Host resynchronize; the
Host is free to honor it or not as it sees fit. Since there is
obligatory response to a RAP, the receiving Host may send them
frequently as it chooses and no harm can occur. For example, if
message in excess of the allocate arrives, the receiving
might send RAPs every few seconds until the sending Host
with no fears of races if one or more RAPs pass a RAS in
network.
A.3 Resynchronization Procedure
The resynchronization sequence below may be initiated only by
sender either for internally generated reasons or upon the
of a RAP.
a) Sender - decision to
1) Set state to "Wait-for-RAR" (Defer transmission
message.)
2) Wait until no RFNM
3) Send
4) Zero
5) Ignore allocates until RAR
6) Set state to "Open" (Resume normal message
subject to flow control.)
b) Receiver - receipt of
1) Send
2) Zero
3) Send a new
When the sender is in the "Wait-for-RAR" state it is not
to send new regular messages. (Note that steps 4 and 5
insure this in the normal course of events.) With the return
the RAR the pipeline contains no messages and no allocates,
outstanding allocation variables at both ends are forced
agreement by setting them both to zero. The receiver will
reconsider bit and message allocation, and send an ALL command
any allocation it cares to do.
A.4 The Problem of Half-closed Connections
The above procedures provide a way to resynchronize a
after a brief lapse by a communications component, which
in lost messages or allocates for an open connection.
5
NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
TIP/TENEX Reliability
A longer and more severe interruption of communication may
from a partitioning of the subnet or from a service
on one of the communicating hosts. It is undesirable to tie
resources indefinitely under such circumstances, so the user
provided with the option of freeing up these resources (
himself) by unilaterally dissolving the connection.
"unilaterally" means sending the CLS command and closing
connection without receiving the CLS acknowledgement. Note
this is legal only if the subnet indicates that the destination
dead.
When service is restored ater such an interruption, the
information at the two ends of the connection is out
synchronization. One end believes that the connection is open
and may proceed to use the connection. The disconnecting
believes that the connection is closed (does not exist), and
proceed to re-initialize communication by opening a new
(RTS or STR command) using the same socket pair or same link.
The resynchronization needed here is to properly close the
end of the connection when the inconsistency is detected. We
accomplish this by specifying consistency checks and adding a
pair of commands.
A.5 The NXS and NXR Commands
The "missing CLS" situation described above can manifest itself
two ways. The first way involves action taken by the NCP at
"open" end of the connection. It may continue to send
messages on the link of the half-closed connection, or
messages referencing its link. The closed end should respond
an NXS if the message referred to a non-existent transmit
(e.g. was an ALL) or NXR if the message referred to a non-
receive link (e.g. a data message). On receipt of such an NXS
NXR message, the NCP at the "open" end should close the
by modifying its tables (without sending any CLS command)
bringing both ends into agreement.
8 bits 8
-------------------
! ! !
21 ! NXR ! link !
! ! !
-------------------
Non-existent Receive
8 bits 8
6
NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
TIP/TENEX Reliability
-------------------
! ! !
22 ! NXS ! link !
! ! !
-------------------
Non-existent Send
A.6 Consistency Checks
A second way this inconsistency can show up involves
initiated by the NCP at the "closed" end. It may (thinking
connection is closed) send an STR or RTS to reopen the connection
The NCP at the "open" end should detect the inconsistency when
receives such an RTS or STR command, because it specifies the
socket pair as an existing open connection, or, in the case of
RTS, the same link. In this case, the NCP at the "open"
should close the connection (without sending any CLS command)
bring the two ends into agreement before responding to
RTS/STR.
A.7 Conclusion
The scheme presented in Section A.2 to resynchronize
has one very important property: the data stream is
through the exchange. Since no data is lost, it is safe
initiate resynchronization from either end at any time. When
doubt, resynchronize.
The consistency checks for RTS and STR, and the NXR and
commands provide the synchronization needed to complete
closing of "half-closed" connections.
The protocol changes above
if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.
RFC documents can be found at I.E.T.F.
Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX