As per Relevance of the word building, we have this rfc below:
Network Working Group R.
Request for Comments: 2517 R.
Category: Informational AT&
February 1999
Building Directories from DNS: Experiences from
Status of this
This memo provides information for the Internet community. It
not specify an Internet standard of any kind. Distribution of
memo is unlimited
Copyright
Copyright (C) The Internet Society (1999). All Rights Reserved
There has been much discussion and several documents written
the need for an Internet Directory. Recently, this discussion
focused on ways to discover an organization's domain name
relying on use of DNS as a directory service. This memo
lessons that were learned during InterNIC Directory and
Services' development and operation of WWWSeeker, an application
finds a web site given information about the name and location of
organization. The back end database that drives this application
built from information obtained from domain registries via WHOIS
other protocols. We present this information to help
implementors avoid some of the blind alleys that we have
explored. This work builds on the Netfind system that was created
Mike Schwartz and his team at the University of Colorado at
[1].
1.
Over time, there have been several RFCs [2, 3, 4] about
for providing Internet Directories. Many of the earlier
discussed white pages directories that supply mappings from
person's name to their telephone number, email address, etc
More recently, there has been discussion of directories that map
a company name to a domain name or web site. Many people are
DNS as a directory today to find this type of information about
given company. Typically when DNS is used, users guess the
name of the company they are looking for and then prepend "www.".
This makes it highly desirable for a company to have an
Moats & Huber Informational [Page 1]
RFC 2517 Building Directories from DNS February 1999
guessable name
There are two major problems here. As the number of assigned
increases, it becomes more difficult to get an easily guessable name
Also, the TLD must be guessed as well as the name. While many
just guess ".COM" as the "default" TLD today, there are many two
letter country code top-level domains in current use as well as
gTLDs (.NET, .ORG, and possibly .EDU) with the prospect of
gTLDs in the future. As the number of TLDs in general use increases
guessing gets more difficult
Between July 1996 and our shutdown in March 1998, the
Directory and Database Services project maintained the Netfind
engine [1] and the associated database that maps
information to domain names. This database thus acted as the type
Internet directory that associates company names with domain names
We also built WWWSeeker, a system that used the Netfind database
find web sites associated with a given organization. The
gained from maintaining and growing this database provides
insight into the issues of providing a directory service. We
it here to allow future implementors to avoid some of the
alleys that we have already explored
2. Directory
2.1 What to do
There are two issues in populating a directory: finding all
domain names (building the skeleton) and associating those
with entities (adding the meat). These two issues are
below
2.2 Building the
In "building the skeleton", it is popular to suggest using a
of a "tree walk" to determine the domains that need to be added
the directory. Our experience is that this is neither a
nor an efficient proposal for maintaining such a directory.
for some infrequent and long-standing DNS surveys [5], DNS "
walks" tend to be discouraged by the Internet community,
given that the frequency of DNS changes would require a new tree
monthly (if not more often). Instead, our experience has shown
data on allocated DNS domains can usually be retrieved in
fashion with FTP, HTTP, or Gopher (we have used each of these
particular TLDs). This has the added advantage of both "building
skeleton" and "adding the meat" at the same time. Our
method for finding a server that has allocated DNS domain
is to start with the list maintained
Moats & Huber Informational [Page 2]
RFC 2517 Building Directories from DNS February 1999
http://www.alldomains.com/countryindex.html and go from there
Before this was available, it was necessary to hunt for a
using trial and error
When maintaining the database, existing domains may be verified
direct DNS lookups rather than a "tree walk." "Tree walks"
therefore be the choice of last resort for directory population,
bulk retrieval should be used whenever possible
2.3 Adding the
A possibility for populating a directory ("adding the meat") is
use an automated system that makes repeated queries using the
protocol to gather information about the organization that owns
domain. The queries would be made against a WHOIS server
with the above method. At the conclusion of the InterNIC
and Database Services project, our backend database contained
2.9 million records built from data that could be retrieved
WHOIS. The entire database contained 3.25 million records, with
additional records coming from sources other than WHOIS
In our experience this information contains many factual
typographical errors and requires further examination and
to improve its quality. Further, TLD registrars that support
typically only support WHOIS information for second level
(i.e. ne.us) as opposed to lower level domains (i.e
windrose.omaha.ne.us). Also, there are TLDs without registrars,
without WHOIS support, and still other TLDs that use other
(HTTP, FTP, gopher) for providing organizational information.
on our experience, an implementor of an internet directory needs
support multiple protocols for directory population. An
WHOIS search tool is necessary, but isn't enough
3. Directory Updating: Full Rebuilds vs Incremental
Given the size of our database in April 1998 when it was
generated, a complete rebuild of the database that is available
WHOIS lookups would require between 134.2 to 167.8 days just
WHOIS lookups from a Sun SPARCstation 20. This estimate does
include other considerations (for example, inverting the token
required about 24 hours processing time on a Sun SPARCstation 20)
that would increase the amount of time to rebuild the
database
Whether this is feasible depends on the frequency of database
provided. Because of the rate of growth of allocated domain
(150K-200K new allocated domains per month in early 1998),
provided monthly updates of the database. To rebuild the
Moats & Huber Informational [Page 3]
RFC 2517 Building Directories from DNS February 1999
each month (based on the above time estimate) would require between 3
and 5 machines to be dedicated full time (independent of
architecture). Instead, we checkpointed the allocated domain
and rebuild on an incremental basis during one weekend of the month
This allowed us to complete the update on between 1 and 4 machines (3
Sun SPARCstation 20s and a dual-processor Sparcserver 690)
full dedication over a couple of days. Further, by
incremental updates with periodic refresh of existing data (which
be done during another part of the month and doesn't require
dedication of machine hardware), older records would be
updated when the underlying information changes. The tradeoff
timeliness and accuracy of data (some data in the database may
old) against hardware and processing costs
4. Directory Presentation: Distributed vs
While a distributed directory is a desirable goal, we maintained
database as a monolithic structure. Given past growth, it is
clear at what point migrating to a distributed directory
actually necessary to support customer queries. Our last
contained over 3.25 million records in a flat ASCII file.
was done via a PERL script of an inverted tree (also produced by
PERL script). While admittedly primitive, this
supported over 200,000 database queries per month from our
servers
Increasing the database size only requires more disk space to
the database and inverted tree. Of course, using database
would probably improve performance and scalability, but we had
reached the point where this technology was required
5. Security
The underlying data for the type of directory discussed in
document is already generally available through WHOIS, DNS, and
standard interfaces. No new information is made available by
these techniques though many types of search become much easier.
the extent that easier access to this data makes it easier to
specific sites or machines to attack, security may be decreased
The protocols discussed here do not have built-in security features
If one source machine is spoofed while the directory data is
gathered, substantial amounts of incorrect and misleading data
be pulled in to the directory and be spread to a wider audience
Moats & Huber Informational [Page 4]
RFC 2517 Building Directories from DNS February 1999
In general, building a directory from registry data will not open
new security holes since the data is already available to the public
Existing security and accuracy problems with the data sources
likely to be amplified
6.
This work described in this document was partially supported by
National Science Foundation under Cooperative Agreement NCR-9218179.
7.
[1] M. F. Schwartz, C. Pu. "Applying an
Gathering Architecture to Netfind: A White Pages Tool for
Changing and Growing Internet", University of Colorado
Report CU-CS-656-93. December 1993, revised July 1994.
URL:ftp://ftp.cs.colorado.edu/pub/cs/techreports/schwartz/
[2] Sollins, K., "Plan for Internet Directory Services", RFC 1107,
July 1989.
[3] Hardcastle-Kille, S., Huizer, E., Cerf, V., Hobby, R. and S
Kent, "A Strategic Plan for Deploying an Internet X.500
Service", RFC 1430, February 1993.
[4] Postel, J. and C. Anderson, "White Pages Meeting Report",
1588, February 1994.
[5] M. Lottor, "Network Wizards Internet Domain Survey",
from http://www.nw.com/zone/WWW/top.
Moats & Huber Informational [Page 5]
RFC 2517 Building Directories from DNS February 1999
8. Authors'
Ryan
AT&
15621 Drexel
Omaha, NE 68135-2358
EMail: jayhawk@att.
Rick
AT&
Room C3-3B30, 200 Laurel Ave.
Middletown, NJ 07748
EMail: rvh@att.
Moats & Huber Informational [Page 6]
RFC 2517 Building Directories from DNS February 1999
9. Full Copyright
Copyright (C) The Internet Society (1999). All Rights Reserved
This document and translations of it may be copied and furnished
others, and derivative works that comment on or otherwise explain
or assist in its implementation may be prepared, copied,
and distributed, in whole or in part, without restriction of
kind, provided that the above copyright notice and this paragraph
included on all such copies and derivative works. However,
document itself may not be modified in any way, such as by
the copyright notice or references to the Internet Society or
Internet organizations, except as needed for the purpose
developing Internet standards in which case the procedures
copyrights defined in the Internet Standards process must
followed, or as required to translate it into languages other
English
The limited permissions granted above are perpetual and will not
revoked by the Internet Society or its successors or assigns
This document and the information contained herein is provided on
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED,
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE
Moats & Huber Informational [Page 7]
if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.
RFC documents can be found at I.E.T.F.
Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX