As per Relevance of the word required, we have this rfc below:
Network Working Group C.
Request for Comments: 1913
Category: Standards Track J.
S.
February 1996
Architecture of the Whois++ Index
Status of this
This document specifies an Internet standards track protocol for
Internet community, and requests discussion and suggestions
improvements. Please refer to the current edition of the "
Official Protocol Standards" (STD 1) for the standardization
and status of this protocol. Distribution of this memo is unlimited
The authors describe an architecture for indexing in
databases, and apply this to the WHOIS++ protocol
1. Purpose
The WHOIS++ directory service [Deutsch, et al, 1995] is intended
provide a simple, extensible directory service predicated on
template-based information model and a flexible query language.
document describes a general architecture designed for
distributed databases, and then applys that architecture to
together many of these WHOIS++ servers into a distributed,
wide area directory service
2. Scope
This document details a distributed, easily maintained
for providing a unified index to a large number of
WHOIS++ servers. This architecture can be used with systems
than WHOIS++ to provide a distributed directory service which is
searchable
3. Motivation and Introduction
It seems clear that with the vast amount of directory
potentially available on the Internet, it is simply not feasible
build a centralized directory to serve all this information. If
are to distribute the directory service, the easiest (although
Weider, et al Standards Track [Page 1]
RFC 1913 Architecture of the Whois++ Index Service February 1996
necessarily the best) way of building the directory service is
build a hierarchy of directory information collection agents. In
architecture, a directory query is delivered to a certain agent
the tree, and then handed up or down, as appropriate, so that
query is delivered to the agent which holds the information
fills the query. This approach has been tried before, most
in some implementations of the X.500 standard. However, there
number of major flaws with the approach as it has been taken.
new Index Service is designed to fix these flaws
3.1. The search
One of the primary assumptions made by recent implementations
distributed directory services is that every entry resides in
location in a hierarchical name space. While this arrangement
ideal for reading the entry once one knows its location, it is not
good when one is searching for the location in the namespace of
entries which meet some set of criteria. If the only criteria we
about a desired entry are items which do not appear in the namespace
we are forced to do a global query. Whenever we issue a global
(at the root of the namespace), or a query at the top of a
subtree in the namespace, that query is replicated to "all"
of the starting point. The replication of the query to all
is not necessarily a problem; queries are cheap. However,
server to which the query has been replicated must process
query, even if it has no entries which match the specified criteria
This part of the global query processing is quite expensive. A
designed namespace or a thin namespace can cause the vast majority
queries to be replicated globally, but a very broad namespace
cause its own navigation problems. Because of these problems,
has been turned off at high levels of the X.500 namespace
3.2. The location
With global search turned off, one must know in advance how the
space is laid out so that one can guide a query to a proper location
Also, the layout of the namespace then becomes critical to a user'
ability to find the desired information. Thus there are
battles about how to lay out the name space to best serve a given
of users, and enormous headaches whenever it becomes apparent
the current namespace is unsuited to the current usages and must
changed (as recently happened in X.500). Also, assuming one
impose multiple hierarchies on the entries through use of
namespace, the mechanisms to maintain these multiple hierarchies
X.500 do not exist yet, and it is possible to move entries out
under their pointers. Also, there is as yet no agreement on how
X.500 namespace should look even for the White Pages types
information that is currently installed in the X.500 pilot project
Weider, et al Standards Track [Page 2]
RFC 1913 Architecture of the Whois++ Index Service February 1996
3.3. The Yellow Pages
Current implementations of this hierarchical architecture have
been unsuited to solving the Yellow Pages problem; that is,
problem of easily and flexibly building special-purpose
(say of molecular biologists) and of automatically maintaining
directories once they have been built. In particular, the
appropriate to the new directory must be built into the
because that is the only way to segregate related entries into
place where they can be found without a global search. Also, there
a classification problem; how does one adequately specify the
categories so that people other than the creator of the directory
find the correct subtree? Additionally, there is the problem
actually finding the data to put into the subtree; if one
traverse the hierarchy to find the data, we have to look globally
the proper entries
3.4.
The problems examined in this section can be addressed by
combination of two new techniques: directory meshes and
knowledge
4. Directory meshes and forward
We'll hold off for a moment on describing the actual
used in our solution to these problems and concentrate on a
level description of what solutions are provided by our
approach. To begin with, although every entry in WHOIS++ does
have a unique identifier (resides in a specific location in
namespace) the navigational algorithms to reach a specific entry
not necessarily depend on the identifier the entry has been assigned
The Index Service gets around the namespace and hierarchy problems
creating a directory mesh on top of the entries. Each layer of
mesh has a set of 'forward knowledge' which indicates the contents
the various servers at the next lower layer of the mesh. Thus when
query is received by a server in a given layer of the mesh, it
prune the search tree and hand the query off to only those
level servers which have indicated that they might be able to
it. Thus search becomes feasible at all levels of the mesh. In
current version of this architecture, we have chosen a certain set
information to hand up the mesh as forward knowledge. This may or
not be exactly the set of information required to construct a
searchable directory, but the protocol itself doesn't restrict
types of information which can be handed around
In addition, the protocols designed to maintain the forward
will also work perfectly well to provide replication of servers
Weider, et al Standards Track [Page 3]
RFC 1913 Architecture of the Whois++ Index Service February 1996
redundancy and robustness. In this case, the forward knowledge
around by the protocols is the entire database of entries held by
replicated server
Another benefit provided by the mesh of index servers is that
the entry identification scheme has been decoupled from
navigation service, multiple hierarchies can be built and
maintained on top of the existing data. Also, the user does not
to know in advance where in the mesh the entry is contained
Also, the Yellow Pages problem now becomes tractable, as the
servers can pick and choose between information proffered by a
server; because we have an architecture that allows for
polling of data, special purpose directories become easy to
and to maintain
5. Components of the Index Service
5.1. WHOIS++
The whois++ service is described in [Deutsch, et al, 1995]. As
service specifies only the query language, the information model,
the server responses, whois++ services can be provided by a
variety of databases and directory services. However, to
in the Index Service, that underlying database must also be able
generate a 'centroid', or some other type of forward knowledge,
the data it serves
5.2. Centroids as forward
The centroid of a server is comprised of a list of the templates
attributes used by that server, and a word list for each attribute
The word list for a given attribute contains one occurrence of
word which appears at least once in that attribute in some record
that server's data, and nothing else
A word is any token delimited by blank spaces, newlines, or the '@'
character, in the value of an attribute
For example, if a whois++ server contains exactly three records,
follows
Record 1 Record 2
Template: User Template:
First Name: John First Name:
Last Name: Smith Last Name:
Favourite Drink: Labatt Beer Favourite Drink: Molson
Weider, et al Standards Track [Page 4]
RFC 1913 Architecture of the Whois++ Index Service February 1996
Record 3
Template:
Domain Name: foo.
Contact Name: Mike
the centroid for this server would
Template:
First Name:
Last Name:
Favourite Drink:
Template:
Domain Name: foo.
Contact Name:
It is this information which is handed up the tree to provide
knowledge. As we mention above, this may not turn out to be
ideal solution for forward knowledge, and we suspect that there
be a number of different sets of forward knowledge used in the
Service. However, the directory architecture is in a very real
independent of what types of forward knowledge are handed around,
it is entirely possible to build a unified directory which uses
types of forward knowledge
5.3. Index servers and Index server
A whois++ index server collects and collates the centroids (or
forward knowledge) of either a number of whois++ servers or of
number of other index servers. An index server must be able
generate a centroid for the information it contains. In addition,
index server can index any other server it wishes, which allows
base level server (or index server) to participate in
hierarchies in the directory mesh
5.3.1. Queries to index
An index server will take a query in standard whois++ format,
its collections of centroids and other forward information,
which servers hold records which may fill that query, and
notifies the user's client of the next servers to contact to
the query (referral in the X.500 model). An index server can
contain primary data of its own; and thus act a both an index
and a base level server. In this case, the index server's response
Weider, et al Standards Track [Page 5]
RFC 1913 Architecture of the Whois++ Index Service February 1996
a query may be a mix of records and referral pointers
5.3.2. Index server distribution model and centroid
The diagram on the next page illustrates how a mesh of index
might be created for a set of whois++ servers. Although it looks
a hierarchy, the protocols allow (for example) server A to be
by both server D and by server H
whois++ index
servers servers
for
whois++ lower-
servers index
_______
| |
| A |__
|_______| \ _______
\----------| |
_______ | D |__ ______
| | /----------|_______| \ | |
| B |__/ \----------| |
|_______| | F |
/----------|______|
/
_______ _______ /
| | | |-
| C |--------------| E |
|_______| |_______|-
\
\
_______ \ ______
| | \----------| |
| G |--------------------------------------| H |
|_______| |______|
Figure 1: Sample layout of the Index Service
In the portion of the index tree shown above, whois++ servers A and
hand their centroids up to index server D, whois++ server C hands
centroid up to index server E, and index servers D and E hand
centroids up to index server F. Servers E and G also hand
centroids up to H
The number of levels of index servers, and the number of
servers at each level, will depend on the number of whois++
deployed, and the response time of individual layers of the
tree. These numbers will have to be determined in the field
Weider, et al Standards Track [Page 6]
RFC 1913 Architecture of the Whois++ Index Service February 1996
5.3.3. Centroid propogation and changes to
Centroid propogation is initiated by an authenticated POLL
(sec. 5.2). The format of the POLL command allows the poller
request the centroid of any or all templates and attributes held
the polled server. After the polled server has authenticated
poller, it determines which of the requested centroids the poller
allowed to request, and then issues a CENTROID-CHANGES report (sec
5.3) to transmit the data. When the poller receives the CENTROID
CHANGES report, it can authenticate the pollee to determine
to add the centroid changes to its data. Additionally, if a
pollee knows what pollers hold centroids from the pollee, it
signal to those pollers the fact that its centroid has changed
issuing a DATA-CHANGED command. The poller can then determine if
when to issue a new POLL request to get the updated information.
DATA-CHANGED command is included in this protocol to
'interactive' updating of critical information
5.3.4. Centroid propogation and mesh
When an index server issues a POLL request, it may indicate to
polled server what relationship it has to the polled.
information can be used to help traverse the directory mesh.
fields are specified in the current proposal to transmit
relationship information, although it is expected that
relationship information will be shared in future revisions of
protocol
One field used for this information is the Hierarchy field, and
take on three values. The first is 'topology', which indicates
the indexing server is at a higher level in the network
(e.g. indexes the whole regional ISP). The second is 'geographical',
which indicates that the polling server covers a geographical
subsuming the pollee. The third is 'administrative', which
that the indexing server covers an administrative domain
the pollee
The second field used for this information is the Description field
which contains the DESCRIBE record of the polling server. This
users to obtain richer metainformation for the directory mesh
enabling them to expand queries more effectively
5.3.5. Query handling and passing
When an index server receives a query, it searches its collection
centroids and determines which servers hold records which may
that query. As whois++ becomes widely deployed, it is expected
some index servers may specialize in indexing certain whois++
Weider, et al Standards Track [Page 7]
RFC 1913 Architecture of the Whois++ Index Service February 1996
templates or perhaps even certain fields within those templates.
an index server obtains a match with the query "for those
fields and attributes the server indexes", it is to be considered
match for the purpose of forwarding the query
5.3.5.1. Query
Query referral is the process of informing a client which servers
contact next to resolve a query. The syntax for notifying a
is outlined in section 5.5.
5.3.6 Loop
Since there are no a priori restrictions on which servers may
which other servers, and since a given server may participate in
sub-meshes, mechanisms must be installed to allow the detection
cycles in the polling relationships. This is accomplished in
current protocol by including a hop-count on polling relationships
Each time a polled server generates forward information, it
the polling server about its current hopcount, which is the
of the hopcounts of all the servers it polls, plus 1. A base
server (one which polls no other servers) will have a hopcount of 0.
When a server decides to poll a new server, if its hopcount goes up
then it must information all the other servers which poll it
its new hopcount. A maximum hopcount (8 in the current version)
help the servers detect polling loops
A second approach to loop detection is to do all the work in
client; which would determine which new referrals have
appeared in the referral list, and then simply iterate the
process until there are no new servers to ask. An algorithm
accomplish this in WHOIS++ is detailed in [Faltstrom 95].
6. Syntax for operations of the Index Service
The syntax for each protocol componenet is listed below. In addition
each section contains a listing of which of these attributes
required and optional for each of the componenet. All timestamps
be in the format YYYYMMDDHHMM and in GMT
6.1. Data changed
The data changed template look like this
# DATA-
Version-number: // version number of index service software, used
// insure compatibility. Current value is 1.0
Time-of-latest-centroid-change: // time stamp of latest
Weider, et al Standards Track [Page 8]
RFC 1913 Architecture of the Whois++ Index Service February 1996
// change,
Time-of-message-generation: // time when this message was generated
//
Server-handle: // IANA unique identifier for this
Host-Name: // Host name of this server (current name
Host-Port: // Port number of this server (current port
Best-time-to-poll: // For heavily used servers, this will
// when the server is likely to be
// loaded so that response to the poll will
//speedy,
Authentication-type: // Type of authentication used by server, or
Authentication-data: // data for
# END // This line must be used to terminate the data
//
Required/optional
Version-Number
Time-of-latest-centroid-change
Time-of-message-generation
Server-handle
Host-Name
Host-Port
Best-time-to-poll
Authentication-type
Authentication-data
6.2. Polling
# POLL
Version-number: // version number of poller's index software, used
// insure
Type-of-poll: // type of forward data requested. CENTROID or
// are the only one currently
Poll-scope: // Selects bounds within which data will be returned
// See note
Start-time: // give me all the centroid changes starting at
// time,
End-time: // ending at this time,
Template: // a standard whois++ template name, or the keyword ALL
// for a full update
Field: // used to limit centroid update information to
// fields, is either a specific field name, a list of
// names, or the keyword
Server-handle: // IANA unique identifier for the polling server
// this handle may optionally be cached by the
// server to announce future
Host-Name: // Host name of the polling server
Weider, et al Standards Track [Page 9]
RFC 1913 Architecture of the Whois++ Index Service February 1996
Host-Port: // Port number of the polling server
Hierarchy: // This field indicates the relationship which the
// bears to the pollee. Typical values might
// 'Topology', 'Geographical", or "Administrative
Description: // This field contains the DESCRIBE record of
// polling
Authentication-type: // Type of authentication used by poller, or
Authentication-data: // Data for
# END // This line must by used to terminate the poll
Note: For poll type CENTROID, the allowable values for Poll Scope
FULL and RELATIVE. Support of the FULL value is required,
provides a complete listing of the centroid or other
information. RELATIVE indicates that these are the relative
in the centroid since the last report to the polling server
For poll type QUERY, the allowable values for Poll Scope are a
line, which indicates that all records are to be returned, or a
WHOIS++ query, which indicates that just those records which
the query are to be returned. N.B. Security considerations
require additional authentication for successful response to
Blank Line Poll Scope. This value has been included for
replication
A polling server may wish to index different types of
than the polled server has collected. The POLLED-FOR command
indicate which servers the polled server has contacted
Required/Optional
Version-Number REQUIRED, value is 1.0
Type-Of-Poll REQUIRED, values CENTROID and QUERY are
Poll-scope REQUIRED If Type-of-poll is CENTROID, FULL is required
RELATIVE is
If Type-of-poll is QUERY, Blank line
required, and WHOIS++-type queries
Start-time
End-Time
Template
Field
Server-handle
Host-Name
Host-Port
Hierarchy
Description
Authentication-Type:
Authentication-data:
Weider, et al Standards Track [Page 10]
RFC 1913 Architecture of the Whois++ Index Service February 1996
Example of a POLL command
# POLL
Version-number: 1.0
Type-of-poll:
Poll-scope:
Start-time: 199501281030+0100
Template:
Field:
Server-handle: BUNYIP01
Host-Name: services.bunyip.
Host-Port: 7070
Hierarchy:
#
6.3. Centroid change
As the centroid change report contains nested multiply-
blocks, each multiply occurring block is surrounded *in this paper
by curly braces '{', '}'. These curly braces are NOT part of
syntax, they are for identification purposes only
The syntax of a Data: item is either a list of words, one word
line, or the keyword
The keyword ANY as the only item of a Data: list means that any
for this field should be treated as a hit by the indexing server
The field Any-field: needs more explanation than can be given in
body of the syntax description below. It can take two values, True
False. If the value is True, the pollee is indicating that there
fields in this template which are not being exported to the
server, but wishes to treat as a hit. Thus, when the polling
gets a query which has a term requesting a field not in this list
this template, the polling server will treat that term as a 'hit'.
If the value is False, the pollee is indicating that there are
other fields for this template which should be treated as a hit.
field is required because the basic model for the WHOIS++
syntax requires that the results of each search term be 'and'
together. This field allows polled servers to export data only
non-sensitive fields, yet still get referrals of queries
contain sensitive terms
IMPORTANT: The data listed in the centroid must be in the ISO-8859-1
character set in this version of the indexing protocol. Use of
other character set is a violation of the protocol. Note that
base-level server is also specified to use ISO-8859-1 [Deutsch,
Weider, et al Standards Track [Page 11]
RFC 1913 Architecture of the Whois++ Index Service February 1996
al, 1995].
# CENTROID-
Version-number: // version number of pollee's index software, used
// insure
Start-time: // change list starting time,
End-time: // change list ending time,
Server-handle: // IANA unique identifier of the responding
Case-sensitive: // states whether data is case sensitive or
// insensitive. values are TRUE or
Authentication-type: // Type of authentication used by pollee, or
Authentication-data: // Data for
Compression-type: // Type of compression used on the data, or
Size-of-compressed-data: // size of compressed data if
// is
Operation: // One of 3 keywords: ADD, DELETE,
// ADD - add these entries to the centroid for this
// DELETE - delete these entries from the centroid of
//
// FULL - the full centroid as of end-time
{ // The multiply occurring template block starts
# BEGIN
Template: // a standard whois++ template
Any-field: // TRUE or FALSE. See beginning of 6.3 for explanation
{ // the template contains multiple field
# BEGIN
Field: // a field name within that
Data: // Either the keyword *ANY*,
// the word list itself, one per line, cr/lf terminated
// each line starting with a dash character ('-').
# END
} // the field ends with END
# END
} // the template block ends with END
# END CENTROID-CHANGES // This line must be used to terminate
// centroid change
For each template, all fields must be listed, or queries will not
referred correctly
Required/Optional
Version-number REQUIRED, value is 1.0
Start-time REQUIRED (even if the centroid type is FULL
End-time REQUIRED (even if the centroid type is FULL
Server-handle
Case-Sensitive
Authentication-Type
Weider, et al Standards Track [Page 12]
RFC 1913 Architecture of the Whois++ Index Service February 1996
Authentication-Data
Compression-type
Size-of-compressed-data OPTIONAL (even if compression is used
Operation OPTIONAL, if used, upport for all three values is
Tokenization-type
#BEGIN TEMPLATE
Template
Any-field
#BEGIN FIELD
Field
Data
#END FIELD
#END TEMPLATE
#END CENTROID-CHANGES
Example
# CENTROID-
Version-number: 1.0
Start-time: 197001010000
End-time: 199503012336
Server-handle: BUNYIP01
# BEGIN
Template:
Any-field:
# BEGIN
Field:
Data:
-
-
-
#END
#BEGIN
Field:
Data: paf@bunyip.
-malin.linnerborg@paf.
# END
# END
# END CENTROID-
6.4 QUERY and POLLEES
The response to a QUERY command is done in WHOIS++ format
Weider, et al Standards Track [Page 13]
RFC 1913 Architecture of the Whois++ Index Service February 1996
6.5. Query
When referrals are included in the body of a response to a query
each referral is listed in a separate SERVER-TO-ASK block as
below
# SERVER-TO-
Version-number: // version number of index software, used to
//
Body-of-Query: // the original query goes
Server-Handle: // WHOIS++ handle of the referred
Host-Name: // DNS name or IP address of the referred
Port-Number: // Port number to which to connect, if different from
// WHOIS++ port
#
Required/Optional
Version-number REQUIRED, value should be 1.0
Body-of-query
Server-Handle
Host-Name
Port-Number OPTIONAL, must be used if different from port 63
Example
# SERVER-TO-
Version-Number: 1.0
Server-Handle: SUNETSE01
Host-Name: sunic.sunet.
Port-Number: 63
#
7: Reply
In addition to the reply codes listed in [Deutsch 95] for the
WHOIS++ client/server interaction, the following reply codes are
in version 1.0 of this protocol
113 Requested method not available Unable to provide a
compression method.
server will send
data in different format
227 Update request acknowledged A DATA-CHANGED
has been accepted and
for further action
Weider, et al Standards Track [Page 14]
RFC 1913 Architecture of the Whois++ Index Service February 1996
503 Required attribute missing A REQUIRED attribute
missing in an interaction
504 Desired server unreachable The desired server
unreachable
505 Desired server unavailable The desired server fails
respond to requests, but
is still reachable
8.
[Deutsch 95] Deutsch, et al., "Architecture of the WHOIS++ service",
RFC 1835, August 1995.
[Faltstrom 95] Faltstrom, P., et al., "How to Interact with a WHOIS++
Mesh, RFC 1914, February 1996.
9. Security
Security issues are not discussed in this memo
Weider, et al Standards Track [Page 15]
RFC 1913 Architecture of the Whois++ Index Service February 1996
10. Authors'
Chris
Bunyip Information Systems, Inc
310 St. Catherine St.
Montreal, PQ H2X 2A
Phone: +1-514-875-8611
Fax: +1-514-875-6134
EMail: clw@bunyip.
Jim
MCNC Center for
Post Office Box 12889
3021 Cornwallis
Research Triangle
North Carolina 27709-2889
Phone: 410-795-5422
Fax: 410-795-5422
EMail: fullton@cnidr.
Simon
EMail: ses@eit.
Weider, et al Standards Track [Page 16]
if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.
RFC documents can be found at I.E.T.F.
Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX