As per Relevance of the word specific, we have this rfc below:











Network Working Group T. Berners-
Request for Comments: 1630
Category: Informational June 1994


Universal Resource Identifiers in

A Unifying Syntax for the Expression
Names and Addresses of Objects on the
as used in the World-Wide

Status of this

This memo provides information for the Internet community. This
does not specify an Internet standard of any kind. Distribution
this memo is unlimited

IESG Note

Note that the work contained in this memo does not describe
Internet standard. An Internet standard for general
Identifiers is under development within the IETF



This document defines the syntax used by the World-Wide
initiative to encode the names and addresses of objects on
Internet. The web is considered to include objects accessed using
extendable number of protocols, existing, invented for the
itself, or to be invented in the future. Access instructions for
individual object under a given protocol are encoded into forms
address string. Other protocols allow the use of object names
various forms. In order to abstract the idea of a generic object
the web needs the concepts of the universal set of objects, and
the universal set of names or addresses of objects

A Universal Resource Identifier (URI) is a member of this
set of names in registered name spaces and addresses referring
registered protocols or name spaces. A Uniform Resource
(URL), defined elsewhere, is a form of URI which expresses an
which maps onto an access algorithm using network protocols.
URI schemes which correspond to the (still mutating) concept of
URLs are listed here. The Uniform Resource Name (URN) debate
to define a name space (and presumably resolution protocols)
persistent object names. This area is not addressed by this document
which is written in order to document existing practice and provide
reference point for URL and URN discussions




Berners-Lee [Page 1]

RFC 1630 URIs in WWW June 1994


The world-wide web protocols are discussed on the mailing list www
talk-request@info.cern.ch and the newsgroup comp.infosystems.www
preferable for beginner's questions. The mailing list uri
request@bunyip.com has discussion related particularly to the
issue. The author may be contacted as timbl@info.cern.ch

This document is available in hypertext form at

http://info.cern.ch/hypertext/WWW/Addressing/URL/URI_Overview.

The Need For a Universal

This section describes the concept of the URI and does not form
of the specification

Many protocols and systems for document search and retrieval
currently in use, and many more protocols or refinements of
protocols are to be expected in a field whose expansion is explosive

These systems are aiming to achieve global search and readership
documents across differing computing platforms, and despite
plethora of protocols and data formats. As protocols evolve
gateways can allow global access to remain possible. As data
evolve, format conversion programs can preserve global access.
is one area, however, in which it is impractical to make conversions
and that is in the names and addresses used to identify objects
This is because names and addresses of objects are passed on in
many ways, from the backs of envelopes to hypertext objects, and
have a long life

A common feature of almost all the data models of past and
systems is something which can be mapped onto a concept of "object
and some kind of name, address, or identifier for that object.
can therefore define a set of name spaces in which these objects
be said to exist

Practical systems need to access and mix objects which are part
different existing and proposed systems. Therefore, the concept
the universal set of all objects, and hence the universal set
names and addresses, in all name spaces, becomes important.
allows names in different spaces to be treated in a common way,
though names in different spaces have differing characteristics,
do the objects to which they refer








Berners-Lee [Page 2]

RFC 1630 URIs in WWW June 1994




This document defines a way to encapsulate a name in
registered name space, and label it with the the name space
producing a member of the universal set. Such an encoded
labelled member of this set is known as a Universal
Identifier, or URI

The universal syntax allows access of objects available
existing protocols, and may be extended with technology

The specification of the URI syntax does not imply anything
the properties of names and addresses in the various name
which are mapped onto the set of URI strings. The
follow from the specifications of the protocols and the
usage conventions for each scheme



For existing Internet access protocols, it is necessary in
cases to define the encoding of the access algorithm
something concise enough to be termed address. URIs which
to objects accessed with existing protocols are known as "
Resource Locators" (URLs) and are listed here as used in WWW,
to be formally defined in a separate document



There is currently a drive to define a space of more
names than any URLs. These "Uniform Resource Names" are
subject of an IETF working group's discussions. (See Sollins
Masinter, Functional Specifications for URNs,
informally.)

The URI syntax and URL forms have been in widespread use
World-Wide Web software since 1990.















Berners-Lee [Page 3]

RFC 1630 URIs in WWW June 1994


Design Criteria and

This section is not part of the specification: it is simply
explanation of the way in which the specification was derived

Design

The syntax was designed to be

Extensible New naming schemes may be added later

Complete It is possible to encode any
scheme

Printable It is possible to express any URI
7-bit ASCII characters so that URIs may
if necessary, be passed using pen and ink

Choices for a universal

For the syntax itself there is little choice except for the
and punctuation of the elements, and the acceptable characters
escaping rules

The extensibility requirement is met by allowing an arbitrary (
registered) string to be used as a prefix. A prefix is chosen
left to right parsing is more common than right to left.
choice of a colon as separator of the prefix from the rest of
URI was arbitrary

The decoding of the rest of the string is defined as a function
the prefix. New prefixed are introduced for new schemes
necessary, in agreement with the registration authority.
registration of a new scheme clearly requires the definition
the decoding of the URI into a given name space, and a
of the properties and, where applicable, resolution protocols,
the name space

The completeness requirement is easily met by
particularly strange or plain binary names to be encoded in
16 or 64 using the acceptable characters

The printability requirement could have been met by requiring
schemes to encode characters not part of a basic set. This led
many discussions of what the basic set should be. A
case, for example, is when an ISO latin 1 string appears in a URL
and within an application with ISO Latin-1 capability, it can
handled intact. However, for transport in general, the non-



Berners-Lee [Page 4]

RFC 1630 URIs in WWW June 1994


characters need to be escaped

The solution to this was to specify a safe set of characters,
a general escaping scheme which may be used for encoding "unsafe
characters. This "safe" set is suitable, for example, for use
electronic mail. This is the canonical form of a URI

The choice of escape character for introducing representations
non-allowed characters also tends to be a matter of taste.
ANSI standard exists in the C language, using the back-
character "\". The use of this character on unix command lines
however, can be a problem as it is interpreted by many
programs, and would have itself to be escaped. It is also
character which is not available on certain keyboards. The
sign is commonly used in the encoding of names
attribute=value pairs. The percent sign was eventually chosen
a suitable escape character

There is a conflict between the need to be able to represent
characters including spaces within a URI directly, and the need
be able to use a URI in environments which have limited
sets or in which certain characters are prone to corruption.
conflict has been resolved by use of an hexadecimal
method which may be applied to any characters forbidden in a
context. When URLs are moved between contexts, the set
characters escaped may be enlarged or reduced unambiguously

The use of white space characters is risky in URIs to be
or sent by electronic mail, and the use of multiple white
characters is very risky. This is because of the
introduction of extraneous white space when lines are wrapped
systems such as mail, or sheer necessity of narrow column width
and because of the inter-conversion of various forms of
space which occurs during character code conversion and
transfer of text between applications. This is why the
form for URIs has all white spaces encoded



This section describes the syntax for URIs as used in the
Web initiative. The generic syntax provides a framework for
schemes for names to be resolved using as yet undefined protocols

URI

A complete URI consists of a naming scheme specifier followed by
string whose format is a function of the naming scheme. For
of information on the Internet, a common syntax is used for the



Berners-Lee [Page 5]

RFC 1630 URIs in WWW June 1994


address part. A BNF description of the URL syntax is given in an
later section. The components are as follows. Fragment
and relative URIs are not involved in the basic URL definition



Within the URI of a object, the first element is the name of
scheme, separated from the rest of the object by a colon



The rest of the URI follows the colon in a format depending on
scheme. The path is interpreted in a manner dependent on
protocol being used. However, when it contains slashes,
must imply a hierarchical structure

Reserved

The path in the URI has a significance defined by the
scheme. Typically, it is used to encode a name in a given
space, or an algorithm for accessing an object. In either case,
encoding may use those characters allowed by the BNF syntax,
hexadecimal encoding of other characters

Some of the reserved characters have special uses as defined here

THE PERCENT

The percent sign ("%", ASCII 25 hex) is used as the
character in the encoding scheme and is never allowed for
else

HIERARCHICAL

The slash ("/", ASCII 2F hex) character is reserved for
delimiting of substrings whose relationship is hierarchical.
enables partial forms of the URI. Substrings consisting of
or double dots ("." or "..") are similarly reserved

The significance of the slash between two segments is that
segment of the path to the left is more significant than
segment of the path to the right. ("Significance" in this
refers solely to closeness to the root of the
structure and makes no value judgement!)







Berners-Lee [Page 6]

RFC 1630 URIs in WWW June 1994




The similarity to unix and other disk operating system
conventions should be taken as purely coincidental, and
not be taken to indicate that URIs should be interpreted
file names

HASH FOR FRAGMENT

The hash ("#", ASCII 23 hex) character is reserved as a
to separate the URI of an object from a fragment identifier .

QUERY

The question mark ("?", ASCII 3F hex) is used to delimit
boundary between the URI of a queryable object, and a set of
used to express a query on that object. When this form is used
the combined URI stands for the object which results from
query being applied to the original object

Within the query string, the plus sign is reserved as
notation for a space. Therefore, real plus signs must be encoded
This method was used to make query URIs easier to pass in
which did not allow spaces

The query string represents some operation applied to the object
but this specification gives no common syntax or semantics for it
In practice the syntax and sematics may depend on the scheme
may even on the base URI

OTHER RESERVED

The astersik ("*", ASCII 2A hex) and exclamation mark ("!" ,
21 hex) are reserved for use as having special signifiance
specific schemes

Unsafe

In canonical form, certain characters such as spaces,
characters, some characters whose ASCII code is used differently
different national character variant 7 bit sets, and all 8
characters beyond DEL (7F hex) of the ISO Latin-1 set, shall not
used unencoded. This is a recommendation for trouble-
interchange, and as indicated below, the encoded set may be
or reduced






Berners-Lee [Page 7]

RFC 1630 URIs in WWW June 1994


Encoding reserved

When a system uses a local addressing scheme, it is useful to
a mapping from local addresses into URIs so that references
objects within the addressing scheme may be referred to globally,
possibly accessed through gateway servers

For a new naming scheme, any mapping scheme may be defined
it is unambiguous, reversible, and provides valid URIs. It
recommended that where hierarchical aspects to the local
scheme exist, they be mapped onto the hierarchical URL path syntax
order to allow the partial form to be used

It is also recommended that the conventional scheme below be used
all cases except for any scheme which encodes binary data as
to text, in which case a more compact encoding such as
hexadecimal or base 64 might be more appropriate. For example,
conventional URI encoding method is used for mapping WAIS, FTP
Prospero and Gopher addresses in the URI specification

CONVENTIONAL URI ENCODING

Where the local naming scheme uses ASCII characters which are
allowed in the URI, these may be represented in the URL by
percent sign "%" immediately followed by two hexadecimal
(0-9, A-F) giving the ISO Latin 1 code for that character
Character codes other than those allowed by the syntax shall
be used unencoded in a URI

REDUCED OR INCREASED SAFE CHARACTER

The same encoding method may be used for encoding characters
use, although technically allowed in a URI, would be unwise due
problems of corruption by imperfect gateways or
due to the use of variant character sets, or which would simply
awkward in a given environment. Because a % sign always
an encoded character, a URI may be made "safer" simply by
any characters considered unsafe, while leaving already
characters still encoded. Similarly, in cases where a larger
of characters is acceptable, % signs can be selectively
reversibly expanded

Before two URIs can be compared, it is therefore necessary
bring them to the same encoding level

However, the reserved characters mentioned above have a
different significance when encoded, and so may NEVER be
and unencoded in this way



Berners-Lee [Page 8]

RFC 1630 URIs in WWW June 1994


The percent sign intended as such must always be encoded, as
presence otherwise always indicates an encoding. Sequences
start with a percent sign but are not followed by two
characters are reserved for future extension. (See Example 3.)

Example 1

The

http://info.cern.ch/albert/bertram/marie-



http://info.cern.ch/albert/bertram/marie%2

are identical, as the %2D encodes a hyphen character

Example 2

The

http://info.cern.ch/albert/bertram/marie-



http://info.cern.ch/albert/bertram%2Fmarie-

are NOT identical, as in the second case the encoded slash does
have hierarchical significance

Example 3

The

fxqn:/us/va/reston/cnri/ietf/24/asdf%*.



news:12345667123%asdghfh@info.cern.

are illegal, as all % characters imply encodings, and there is
decoding defined for "%*" or "%as" in this recommendation

Partial (relative)

Within a object whose URI is well defined, the URI of another
may be given in abbreviated form, where parts of the two URIs are
same. This allows objects within a group to refer to each



Berners-Lee [Page 9]

RFC 1630 URIs in WWW June 1994


without requiring the space for a complete reference, and
incidentally allows the group of objects to be moved without
any references. It must be emphasized that when a reference
passed in anything other than a well controlled context, the
form must always be used

In the World-Wide Web applications, the context URI is that of
document or object containing a reference. In this case partial
can be generated in virtual objects or stored in real objects
without the need for dramatic change if the higher-order parts of
hierarchical naming system are modified. Apart from terseness,
gives greater robustness to practical systems, by
information hiding between system components

The partial form relies on a property of the URI syntax that
characters ("/") and certain path elements ("..", ".") have
significance reserved for representing a hierarchical space, and
be recognized as such by both clients and servers

A partial form can be distinguished from an absolute form in that
latter must have a colon and that colon must occur before any
characters. Systems not requiring partial forms should not use
unencoded slashes in their naming schemes. If they do, absolute
will still work, but confusion may result. (See note on
below.)

The rules for the use of a partial name relative to the URI of
context are

If the scheme parts are different, the whole absolute URI
be given. Otherwise, the scheme is omitted, and

If the partial URI starts with a non-zero number of
slashes, then everything from the context URI up to (but
including) the first occurrence of exactly the same number
consecutive slashes which has no greater number of
slashes anywhere to the right of it is taken to be the same
so prepended to the partial URL to form the full URL. Otherwise

The last part of the path of the context URI (anything
the rightmost slash) is removed, and the given partial
appended in its place, and then

Within the result, all occurrences of "xxx/../" or "/."
recursively removed, where xxx, ".." and "." are complete
elements





Berners-Lee [Page 10]

RFC 1630 URIs in WWW June 1994


Note: Trailing

If a path of the context locator ends in slash, partial URIs
treated differently to the URI with the same path but without
trailing slash. The trailing slash indicates a void segment of
path

Note:

The gopher system does not have the concept of relative URIs, and
gopher community currently allows / as data characters in gopher
without escaping them to %2F. Relative forms may not in general
used for documents served by gopher servers. If they are used,
WWW software assumes, normally correctly, that in fact they do
hierarchical significance despite the specifications. The use of
rather than gopher protocol is however recommended



In the context of

magic://a/b/c//d/e/

the partial URIs would expand as follows

g magic://a/b/c//d/e/

/g magic://a/

//g magic://

../g magic://a/b/c//d/

g:h g:

and in the context of the

magic://a/b/c//d/e

the results would be exactly the same

Fragment-

This represents a part of, fragment of, or a sub-function within,
object. Its syntax and semantics are defined by the
responsible for the object, or the specification of the content
of the object. The only definition here is of the allowed
by which it may be represented in a URL



Berners-Lee [Page 11]

RFC 1630 URIs in WWW June 1994


Specific syntaxes for representing fragments in text documents
line and character range, or in graphics by coordinates, or
structured documents using ladders, are suitable for
but not defined here

The fragment-id follows the URL of the whole object from which it
separated by a hash sign (#). If the fragment-id is void, the
sign may be omitted: A void fragment-id with or without the hash
means that the URL refers to the whole object

While this hook is allowed for identification of fragments,
question of addressing of parts of objects, or of the grouping
objects and relationship between continued and containing objects,
not addressed by this document

Fragment identifiers do NOT address the question of objects which
different versions of a "living" object, nor of expressing
relationships between different versions and the living object

There is no implication that a fragment identifier refers to
which can be extracted as an object in its own right. It may,
example, refer to an indivisible point within an object

Specific

The mapping for URIs onto some existing standard and
protocols is outlined in the BNF syntax definition. Notes
particular protocols follow. These URIs are frequently referred
as URLs, though the exact definition of the term URL is still
discussion (March 1993). The schemes covered are

http Hypertext Transfer Protocol (examples

ftp File Transfer

gopher Gopher

mailto Electronic mail

news Usenet

telnet, rlogin and tn3270
Reference to interactive

wais Wide Area Information

file Local file




Berners-Lee [Page 12]

RFC 1630 URIs in WWW June 1994


The following schemes are proposed as essential to the unification
the web with electronic mail, but not currently (to the author'
knowledge) implemented

mid Message identifiers for electronic

cid Content identifiers for MIME body

The schemes for X.500, network management database, and Whois++
not been specified and may be the subject of further study.
for Prospero, and restricted NNTP use are not currently
as far as the author is aware

The "urn" prefix is reserved for use in encoding a Uniform
Name when that has been developed by the IETF working group

New schemes may be registered at a later time



The HTTP protocol specifies that the path is handled transparently
those who handle URLs, except for the servers which de-
them. The path is passed by the client to the server with
request, but is not otherwise understood by the client

The host details are not passed on to the client when the URL is
HTTP URL which refers to the server in question. In this case
string sent starts with the slash which follows the host details
However, when an HTTP server is being used as a gateway (or "proxy")
then the entire URI, whether HTTP or some other scheme, is passed
the HTTP command line. The search part, if present, is sent as
of the HTTP command, and may in this respect be treated as part
the path. No fragmentid part of a WWW URI (the hash sign
following) is sent with the request. Spaces and control
in URLs must be escaped for transmission in HTTP, as must
disallowed characters



These examples are not part of the specification: they
provided as illustations only. The URI of the "welcome" page to
server is

http://www.my.work.com

As the rest of the URL (after the hostname an port) is
to the client, it shows great variety but the following are
fairly typical



Berners-Lee [Page 13]

RFC 1630 URIs in WWW June 1994


http://www.my.uni.edu/info/matriculation/enroling.

http://info.my.org/AboutUs/

http://www.library.my.town.va.us/Catalogue/76523471236%2Fwen44--4.98

http://www.my.org/462F4F2D4241522A314159265358979323846

A URL for a server on a different port to 80 looks

http://info.cern.ch:8000/imaginary/

A reference to a particular part of a document may, including
fragment identifier, look

http://www.myu.edu/org/admin/people#

in which case the string "#andy" is not sent to the server, but
retained by the client and used when the whole object had
retrieved

A search on a text database might look

http://info.my.org/AboutUs/Index/Phonebook?

and on another

http://info.cern.ch/RDB/EMP?*%20where%20name%%3

In all cases the client passes the path string to the
uninterpreted, and for the client to deduce anything



The ftp: prefix indicates that the FTP protocol is used, as
in STD 9, RFC 959 or any successor. The port number, if present
gives the port of the FTP server if not the FTP default

User name and

The syntax allows for the inclusion of a user name and even
password for those systems which do not use the anonymous
convention. The default, however, if no user or password
supplied, will be to use that convention, viz. that the user
is "anonymous" and the password the user's Internet-style
address





Berners-Lee [Page 14]

RFC 1630 URIs in WWW June 1994


Where possible, this mail address should correspond to a
mail address for the user, and preferably give a DNS host
which resolves to the IP address of the client. Note that
currently vary in their treatment of the anonymous password



The FTP protocol allows for a sequence of CWD commands (
working directory) and a TYPE command prior to service
such as RETR (retrieve) or NLIST (etc.) which actually access
file

The arguments of any CWD commands are successive segment parts
the URL delimited by slash, and the final segment is suitable
the filename argument to the RETR command for retrieval or
directory argument to NLIST

For some file systems (Unix in particular), the "/" used to
the hierarchical structure of the URL corresponds to the
used to construct a file name hierarchy, and thus, the
will look the same as the URL path. This does NOT mean that
URL is a Unix filename

Note: Retrieving subsequent URLs from the same

There is no common hierarchical model to the FTP protocol, so if
directory change command has been given, it is impossible
general to deduce what sequence should be given to navigate
another directory for a second retrieval, if the paths
different. The only reliable algorithm is to disconnect
reestablish the control connection

Data

The data content type of a file can only, in the general FTP case
be deduced from the name, normally the suffix of the name.
is not standardized. An alternative is for it to be transferred
information outside the URL. A suitable FTP transfer type (
example binary "I" or text "A") must in turn be deduced from
data content type. It is recommended that conventions
suffixes of public archives be established, but it is outside
scope of this standard

An FTP URL may optionally specify the FTP data transfer type
which an object is to be retrieved. Most of the methods
to the FTP "Data Types" ASCII and IMAGE for the retrieval of
document, as specified in FTP by the TYPE command. One
indicates directory access



Berners-Lee [Page 15]

RFC 1630 URIs in WWW June 1994


The data type is specified by a suffix to the URL.
suffixes are

;type = Use FTP type as given to perform
transfer

/ Use FTP directory list commands to


The type code is in the format defined in RFC 959 except that
SPACE IS OMITTED FROM THE URL

Transfer

Stream Mode is always used



The gopher URL specifies the host and optionally the port to
the client should connect. This is followed by a slash and a
gopher type code. This type code is used by the client to
how to interpret the server's reply and is is not for sending
server. The command string to be sent to the server
follows the gopher type character. It consists of the
selector string followed by any "Gopher plus" syntax, but
omitting the trainling CR LF pair

When the gopher command string contains characters (such a
CR LF and HT characters) not allowed in a URL, these are
using the conventional encoding

Note that some gopher selector strings begin with a copy of
gopher type character, in which case that character will occur
consecutively. Also note that the gopher selector string may be
empty string since this is how gopher clients refer to the top-
directory on a gopher server

If the encoded command string (with trailing CR LF stripped) would
void then the gopher type character may be omiited and "1" (ASCII 31
hex) is assumed

Note that slash "/" in gopher selector strings may not correspond
a level in a hierarchical structure








Berners-Lee [Page 16]

RFC 1630 URIs in WWW June 1994




This allows a URL to specify an RFC822 addr-spec mail address.
that use of % , for example as used in forming a gatewayed
address, requires conversion to %25 in a URL



The news locators refer to either news group names or article
identifiers which must conform to the rules for a Message-Id of
1036 (Horton 1987). A message identifier may be distinguished from
news group name by the presence of the commercial at "@" character
These rules imply that within an article, a reference to a news
or to another article will be a valid URL (in the partial form).

A news URL may be dereferenced using NNTP (RFC 977, Kantor 1986)
(The ARTICLE by message-id command ) or using any other protocol
the conveyance of usenet news articles, or by reference to a body
news articles already received

Note 1:

Among URLs the "news" URLs are anomalous in that they
location-independent. They are unsuitable as URN
because the NNTP architecture relies on the expiry of articles
therefore a small number of articles being available at any time
When a news: URL is quoted, the assumption is that the reader
fetch the article or group from his or her local news host.
host names are NOT part of news URLs

Note 2:

An outstanding problem is that the message identifier
insufficient to allow the retrieval of an expired article, as
algorithm exists for deriving an archive site and file name.
addition of the date and news group set to the article's URL
allow this if a directory existed of archive sites by news group

Suggested subject of study in conjunction with NNTP working group
Further extension possible may be to allow the naming of
threads as addressable objects

Telnet, rlogin, tn3270

The use of URLs to represent interactive sessions is a
extension to their uses for objects. This allows access
information systems which only provide an interactive service, and
information server. As information within the service cannot



Berners-Lee [Page 17]

RFC 1630 URIs in WWW June 1994


addressed individually or, in general, automatically retrieved,
is a less desirable, though currently common, solution



The "Universal Resource Name" is currently (March 1993)
development in the IETF. A requirements specification is
preparation. It currently looks as though it will be a short
suitable for encoding in URI syntax, for which case the "urn:"
is reserved. The URN shall be encoded precisely as defined in
(future) URN standard, except in that

If the official description of the URN syntax includes
constant wrapper characters, then they shall not be omitted
the URI encoding of the URN

If the URN has a hierarchical nature, then the slash
shall be used in the URI encoding

If the URN has a hierarchical nature, the most significant
shall be encoded on the left in the URI encoding

Any characters with reserved meanings in the URI syntax shall
escape

These rules of course apply to any URI scheme. It is of
possible that the URN syntax will be chosen such that the
encoding will be a 1-1 transcription

An example might be a name such

urn:/iana/dns/ch/cern/cn/techdoc/94/1642-3

but the reader should refer to the latest URN drafts
specifications



The current WAIS implementation public domain requires that a
know the "type" of a object prior to retrieval. This value
returned along with the internal object identifier in the
response. It has been encoded into the path part of the URL in
to make the URL sufficient for the retrieval of the object

Within the WAIS world, names do not of course need to be prefixed
"wais:" (by the partial form rules).





Berners-Lee [Page 18]

RFC 1630 URIs in WWW June 1994


The wpath of a WAIS URL consists of encoded fields of the
identifier, in the same order as inthe WAIS identifier. For
field, the identifier field number is the digits before the
sign, and the field contents follow, encoded in the
encoding, terminated by ";".



The other URI schemes (except nntp) share the property that they
equally valid at any geographical place

There is however a real practical requirement to be able to
a URL for an object in a machine's local file system

The syntax is similar to the ftp syntax, but in this case the
is used to donate boundaries between directory levels of
hierarchical file system is used. The "client" software converts
file URL into a file name in the local file name conventions.
allows local files to be treated just as network objects without
necessity to use a network server for access. This may be used
example for defining a user's "home" document in WWW

There is clearly a danger of confusion that a link made to a
file should be followed by someone on a different system,
unexpected and possibly harmful results. Therefore, the
is that even a "file" URL is provided with a host part. This
a client on another system to know that it cannot access the
system, or perhaps to use some other local mecahnism to access
file

The special value "localhost" is used in the host field to
that the filename should really be used on whatever host one is
This for example allows links to be made to files which
distribted on many machines, or to "your unix local password file
subject of course to consistency across the users of the data

A void host field is equivalent to "localhost".

Message-

For systems which include information transferred using
protocols, there is a need to be able to make cross-
between different items of information, even though, by the nature
mail, those items are only available to a restricted set of people

Two schemes are defined. The first, "mid:", refers to the STD 11,
RFC 822 Message-Id of a mail message. This Identifier is
used in RFC 822 in for example the References and In-Reply-to field



Berners-Lee [Page 19]

RFC 1630 URIs in WWW June 1994


The rest of the URL after the "mid:" is the RFC822 msg-id with
constant <> wrapper removed, leaving an identifier whose format
fact happens to be the same as addr-spec format for mailboxes (
the semantics are different).

The use of a "mid" URL implies access to a body of mail
received. If a message has been distributed using NNTP or
usenet protocols over the news system, then the "news:" form
be used

Content-

The second scheme, "cid:", is similar to "mid:", but makes
to a body part of a MIME message by the value of its content-
field. This allows, for example, a master document being the
part of a multipart/related MIME message to refer to component
which are transferred in the same message



Beware however, that content identifiers are only required to
unique within the context of a given MIME message, and so the cid
URL is only meaningful with the context the same MIME message.
a reference outside the message, it would need to be appended
the message-id of the whole message. A syntax for this has
been defined

Schemes for Further

X500

The mapping of x500 names onto URLs is not defined here.
decision is required as to whether "distinguished names" or "
friendly names" (ufn), or both, should be allowed. If
punctuation conversions are needed from the adopted x500
representation (such as the use of slashes between parts of a ufn
they must be defined. This is a subject for study



This prefix describes the access using the "whois++" scheme in
process of definition. The host name part is the same as
other IP based schemes. The path part can be either a
handle for a whois object, or it can be a valid whois
string. This is a subject for further study






Berners-Lee [Page 20]

RFC 1630 URIs in WWW June 1994


NETWORK MANAGEMENT

This is a subject for study



This is an alternative form of reference for news articles
specifically to be used with NNTP servers, and particularly
incomplete server implementations which do not allow retrieval
message identifier. In all other cases the "news" scheme
be used

The news server name, newsgroup name, and index number of
article within the newsgroup on that particular server are given
The NNTP protocol must be used

Note 1.

This form of URL is not of global accessability, as
NNTP servers only allow access from local clients. Note
the article numbers within groups vary from server to server

This form or URL should not be quoted outside this local area
It should not be used within news articles for
circulation than the one server. This is a local
for a resource which is often available globally, and so is
recommended except in the case in which incomplete
implementations on the local server force its adoption



The Prospero (Neuman, 1991) directory service is used to resolve
URL yielding an access method for the object (which can then
be represented as a URL if translated). The host part contains
host name or internet address. The port part is optional

The path part contains a host specific object name and an
version number. If present, the version number is separated from
host specific object name by the characters "%00" (percent
zero), this being an escaped string terminator (null).
Prospero links are represented as URLs of the underlying
method and are not represented as Prospero URLs

Registration of naming

A new naming scheme may be introduced by defining a mapping onto
conforming URL syntax, using a new prefix. Experimental prefixes
be used by mutual agreement between parties, and must start with



Berners-Lee [Page 21]

RFC 1630 URIs in WWW June 1994


characters "x-". The scheme name "urn:" is reserved for the work
progress on a scheme for more persistent names

It is proposed that the Internet Assigned Numbers Authority (IANA
perform the function of registration of new schemes. Any
of a new URI scheme must include a definition of an algorithm for
retrieval of any object within that scheme. The algorithm must
the URI and produce either a set of URL(s) which will lead to
desired object, or the object itself, in a well-defined
determinable format

It is recommended that those proposing a new scheme demonstrate
utility and operability by the provision of a gateway which
provide images of objects in the new scheme for clients using
existing protocol. If the new scheme is not a locator scheme,
the properties of names in the new space should be clearly defined
It is likewise recommended that, where a protocol allows
retrieval by URL, that the client software have provision for
configured to use specific gateway locators for indirect
through new naming schemes

BNF of Generic URI

This is a BNF-like description of the URI syntax. at the level
which specific schemes are not considered

A vertical line "|" indicates alternatives, and [brackets]
optional parts. Spaces are represented by the word "space", and
vertical line character by "vline". Single letters stand for
letters. All words of more than one letter below are
described somewhere in this description

The "generic" production gives a higher level parsing of the
URIs as the other productions. The "national" and "punctuation
characters do not appear in any productions and therefore may
appear in URIs

fragmentaddress uri [ # fragmentid ]

uri scheme : path [ ? search ]

scheme

path void | xpalphas [ / path ]

search xalphas [ + search ]

fragmentid



Berners-Lee [Page 22]

RFC 1630 URIs in WWW June 1994



xalpha alpha | digit | safe | extra |

xalphas xalpha [ xalphas ]

xpalpha xalpha | +

xpalphas xpalpha [ xpalpha ]

ialpha alpha [ xalphas ]

alpha a | b | c | d | e | f | g | h | i | j | k |
l | m | n | o | p | q | r | s | t | u | v |
w | x | y | z | A | B | C | D | E | F | G |
H | I | J | K | L | M | N | O | P | Q | R |
S | T | U | V | W | X | Y |

digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

safe $ | - | _ | @ | . | &

extra ! | * | " | ' | ( | ) | ,

reserved = | ; | / | # | ? | : |

escape % hex

hex digit | a | b | c | d | e | f | A | B | C |
D | E |

national { | } | vline | [ | ] | \ | ^ | ~

punctuation < | >



(end of URI BNF

BNF for specific URL

This is a BNF-like description of the Uniform Resource
syntax. A vertical line "|" indicates alternatives, and [brackets
indicate optional parts. Spaces are represented by the word "space",
and the vertical line character by "vline". Single letters stand
single letters. All words of more than one letter below are
described somewhere in this description





Berners-Lee [Page 23]

RFC 1630 URIs in WWW June 1994


The current IETF URI Working Group preference is for the
production. (Nov 1993. July 93: url).

The "national" and "punctuation" characters do not appear in
productions and therefore may not appear in URLs

The "afsaddress" is left in as historical note, but is not a
production

prefixedurl u r l :

url httpaddress | ftpaddress | newsaddress |
nntpaddress | prosperoaddress |
| gopheraddress | waisaddress |
mailtoaddress | midaddress |

scheme

httpaddress h t t p : / / hostport [ / path ] [ ?
search ]

ftpaddress f t p : / / login / path [ ftptype ]

afsaddress a f s : / / cellname /

newsaddress n e w s :

nntpaddress n n t p : group /

midaddress m i d : addr-

cidaddress c i d : content-

mailtoaddress m a i l t o : xalphas @

waisaddress waisindex |

waisindex w a i s : / / hostport / database [ ?
]

waisdoc w a i s : / / hostport / database / wtype /


wpath digits = path ; [ wpath ]

groupart * | group |

group ialpha [ . group ]



Berners-Lee [Page 24]

RFC 1630 URIs in WWW June 1994



article xalphas @

database

wtype

prosperoaddress

prosperolink p r o s p e r o : / / hostport / hsoname [ %
0 0 version [ attributes ] ]

hsoname

version

attributes attribute [ attributes ]

attribute

telnetaddress t e l n e t : / /

gopheraddress g o p h e r : / / hostport [/ gtype [
gcommand ] ]

login [ user [ : password ] @ ]

hostport host [ : port ]

host hostname |

ftptype A formcode | E formcode | I | L

formcode N | T |

cellname

hostname ialpha [ . hostname ]

hostnumber digits . digits . digits .

port

gcommand

path void | segment [ / path ]

segment



Berners-Lee [Page 25]

RFC 1630 URIs in WWW June 1994



search xalphas [ + search ]

user alphanum2 [ user ]

password alphanum2 [ password ]

fragmentid

gtype

alphanum2 alpha | digit | - | _ | . | +

xalpha alpha | digit | safe | extra |

xalphas xalpha [ xalphas ]

xpalpha xalpha | +

xpalphas xpalpha [ xpalphas ]

ialpha alpha [ xalphas ]

alpha a | b | c | d | e | f | g | h | i | j | k |
l | m | n | o | p | q | r | s | t | u | v |
w | x | y | z | A | B | C | D | E | F | G |
H | I | J | K | L | M | N | O | P | Q | R |
S | T | U | V | W | X | Y |

digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

safe $ | - | _ | @ | . | & | + | -

extra ! | * | " | ' | ( | ) | ,

reserved = | ; | / | # | ? | : |

escape % hex

hex digit | a | b | c | d | e | f | A | B | C |
D | E |

national { | } | vline | [ | ] | \ | ^ | ~

punctuation < | >

digits digit [ digits ]




Berners-Lee [Page 26]

RFC 1630 URIs in WWW June 1994


alphanum alpha |

alphanums alphanum [ alphanums ]



(end of URL BNF



Alberti, R., et.al., "Notes on the Internet Gopher Protocol",
University of Minnesota, December 1991,
protocol>. See
Information About Gopher/
Gopher

Berners-Lee, T., "Hypertext Transfer Protocol (HTTP)", CERN,
1991, as updated from time to time

Crocker, D., "Standard for ARPA Internet Text Messages" STD 11,
822, UDel, August 1982.

Davis, F, et al., "WAIS Interface Protocol: Prototype
Specification", Thinking Machines Corporation, April 23, 1990.

International Standards Organization, Information and Documentation -
Search and Retrieve Application Protocol Specification for
Systems Interconnection, ISO-10163.

Horton, M., and R. Adams, "Standard for Interchange of
messages", RFC 1036, AT&T Bell Laboratories, Center for
Studies, December 1987.

Huitema, C., "Naming: strategies and techniques", Computer
and ISDN Systems 23 (1991) 107-110.

Kahle, B., "Document Identifiers, or International Standard
Numbers for the Electronic Age", //quake.think.com/pub/wais/doc/doc-ids.txt

Kantor, B., and P. Lapsley, Kantor, B., and P. Lapsley, "Network
Transfer Protocol", RFC 977, UC San Diego & UC Berkeley,
1986. internic.net/rfc/rfc977.txt

Kunze, J., "Requirements for URLs", Work in Progress




Berners-Lee [Page 27]

RFC 1630 URIs in WWW June 1994


Lynch, C., Coalition for Networked Information: "Workshop on ID
Reference Structures for Networked Information", November 1991.
discussion-archives?lynch

Mockapetris, P., "Domain Names - Concepts and Facilities", STD 13,
1034, USC/Information Sciences Institute, November 1987,
internic.net/rfc/rfc1034.txt

Neuman, B. Clifford, "Prospero: A Tool for Organizing
Resources", Electronic Networking: Research, Applications
Policy, Vol 1 No 2, Meckler Westport CT USA, 1992. See

Postel, J., and J. Reynolds, "File Transfer Protocol (FTP)", STD 9,
RFC 959, USC/Information Sciences Institute, October 1985.
internic.net/rfc/rfc959.txt

Sollins, K., and L. Masinter, "Requiremnets for URNs", Work
Progress

Yeong, W., "Towards Networked Information Retrieval", Technical
91-06-25-01, June 1991, Performance Systems International, Inc

Yeong, W., "Representing Public Archives in the Directory", Work
Progress, November 1991, now expired

Security

Security issues are not discussed in this memo

Author's

Tim Berners-
World-Wide Web

1211 Geneva 23,


Phone: +41 (22)767 3755
Fax: +41 (22)767 7155
EMail: timbl@info.cern.









Berners-Lee [Page 28]








if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.




RFC documents can be found at I.E.T.F.



Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX







Spectrum