As per Relevance of the word resource, we have this rfc below:











Network Working Group T. Berners-
Request for Comments: 1738
Category: Standards Track L.
Xerox
M.
University of

December 1994


Uniform Resource Locators (URL

Status of this

This document specifies an Internet standards track protocol for
Internet community, and requests discussion and suggestions
improvements. Please refer to the current edition of the "
Official Protocol Standards" (STD 1) for the standardization
and status of this protocol. Distribution of this memo is unlimited



This document specifies a Uniform Resource Locator (URL), the
and semantics of formalized information for location and access
resources via the Internet

1.

This document describes the syntax and semantics for a compact
representation for a resource available via the Internet.
strings are called "Uniform Resource Locators" (URLs).

The specification is derived from concepts introduced by the World
Wide Web global information initiative, whose use of such
dates from 1990 and is described in "Universal Resource
in WWW", RFC 1630. The specification of URLs is designed to meet
requirements laid out in "Functional Requirements for
Resource Locators" [12].

This document was written by the URI working group of the
Engineering Task Force. Comments may be addressed to the editors,
to the URI-WG . Discussions of the group are
at







Berners-Lee, Masinter & McCahill [Page 1]

RFC 1738 Uniform Resource Locators (URL) December 1994


2. General URL

Just as there are many different methods of access to resources
there are several schemes for describing the location of
resources

The generic syntax for URLs provides a framework for new schemes
be established using protocols other than those defined in
document

URLs are used to `locate' resources, by providing an
identification of the resource location. Having located a resource
a system may perform a variety of operations on the resource,
might be characterized by such words as `access', `update',
`replace', `find attributes'. In general, only the `access'
needs to be specified for any URL scheme

2.1. The main parts of

A full BNF description of the URL syntax is given in Section 5.

In general, URLs are written as follows

:specific-part

A URL contains the name of the scheme being used ()
by a colon and then a string (the specific-part>)
interpretation depends on the scheme

Scheme names consist of a sequence of characters. The lower
letters "a"--"z", digits, and the characters plus ("+"),
("."), and hyphen ("-") are allowed. For resiliency,
interpreting URLs should treat upper case letters as equivalent
lower case in scheme names (e.g., allow "HTTP" as well as "http").

2.2. URL Character Encoding

URLs are sequences of characters, i.e., letters, digits, and
characters. A URLs may be represented in a variety of ways: e.g.,
on paper, or a sequence of octets in a coded character set.
interpretation of a URL depends only on the identity of
characters used

In most URL schemes, the sequences of characters in different
of a URL are used to represent sequences of octets used in
protocols. For example, in the ftp scheme, the host name,
name and file names are such sequences of octets, represented
parts of the URL. Within those parts, an octet may be represented



Berners-Lee, Masinter & McCahill [Page 2]

RFC 1738 Uniform Resource Locators (URL) December 1994


the chararacter which has that octet as its code within the US-
[20] coded character set

In addition, octets may be encoded by a character triplet
of the character "%" followed by the two hexadecimal digits (
"0123456789ABCDEF") which forming the hexadecimal value of the octet
(The characters "abcdef" may also be used in hexadecimal encodings.)

Octets must be encoded if they have no corresponding
character within the US-ASCII coded character set, if the use of
corresponding character is unsafe, or if the corresponding
is reserved for some other interpretation within the particular
scheme

No corresponding graphic US-ASCII

URLs are written only with the graphic printable characters of
US-ASCII coded character set. The octets 80-FF hexadecimal are
used in US-ASCII, and the octets 00-1F and 7F hexadecimal
control characters; these must be encoded

Unsafe

Characters can be unsafe for a number of reasons. The
character is unsafe because significant spaces may disappear
insignificant spaces may be introduced when URLs are transcribed
typeset or subjected to the treatment of word-processing programs
The characters "<" and ">" are unsafe because they are used as
delimiters around URLs in free text; the quote mark (""") is used
delimit URLs in some systems. The character "#" is unsafe and
always be encoded because it is used in World Wide Web and in
systems to delimit a URL from a fragment/anchor identifier that
follow it. The character "%" is unsafe because it is used
encodings of other characters. Other characters are unsafe
gateways and other transport agents are known to sometimes
such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]", and "`".

All unsafe characters must always be encoded within a URL.
example, the character "#" must be encoded within URLs even
systems that do not normally deal with fragment or
identifiers, so that if the URL is copied into another system
does use them, it will not be necessary to change the URL encoding








Berners-Lee, Masinter & McCahill [Page 3]

RFC 1738 Uniform Resource Locators (URL) December 1994


Reserved

Many URL schemes reserve certain characters for a special meaning
their appearance in the scheme-specific part of the URL has
designated semantics. If the character corresponding to an octet
reserved in a scheme, the octet must be encoded. The characters ";",
"/", "?", ":", "@", "=" and "&" are the characters which may
reserved for special meaning within a scheme. No other characters
be reserved within a scheme

Usually a URL has the same interpretation when an octet
represented by a character and when it encoded. However, this is
true for reserved characters: encoding a character reserved for
particular scheme may change the semantics of a URL

Thus, only alphanumerics, the special characters "$-_.+!*'(),",
reserved characters used for their reserved purposes may be
unencoded within a URL

On the other hand, characters that are not required to be
(including alphanumerics) may be encoded within the scheme-
part of a URL, as long as they are not being used for a
purpose

2.3 Hierarchical schemes and relative

In some cases, URLs are used to locate resources that
pointers to other resources. In some cases, those pointers
represented as relative links where the expression of the location
the second resource is in terms of "in the same place as this
except with the following relative path". Relative links are
described in this document. However, the use of relative
depends on the original URL containing a hierarchical
against which the relative link is based

Some URL schemes (such as the ftp, http, and file schemes)
names that can be considered hierarchical; the components of
hierarchy are separated by "/".













Berners-Lee, Masinter & McCahill [Page 4]

RFC 1738 Uniform Resource Locators (URL) December 1994


3. Specific

The mapping for some existing standard and experimental protocols
outlined in the BNF syntax definition. Notes on particular
follow. The schemes covered are

ftp File Transfer
http Hypertext Transfer
gopher The Gopher
mailto Electronic mail
news USENET
nntp USENET news using NNTP
telnet Reference to interactive
wais Wide Area Information
file Host-specific file
prospero Prospero Directory

Other schemes may be specified by future specifications. Section 4
this document describes how new schemes may be registered, and
some scheme names that are under development

3.1. Common Internet Scheme

While the syntax for the rest of the URL may vary depending on
particular scheme selected, URL schemes that involve the direct
of an IP-based protocol to a specified host on the Internet use
common syntax for the scheme-specific data

//:<password>@:/
Some or all of the parts ":<password>@", ":<password>",
":", and "/" may be excluded. The scheme
data start with a double slash "//" to indicate that it complies
the common Internet scheme syntax. The different components obey
following rules


An optional user name. Some schemes (e.g., ftp) allow
specification of a user name


An optional password. If present, it follows the
name separated from it by a colon

The user name (and password), if present, are followed by
commercial at-sign "@". Within the user and password field, any ":",
"@", or "/" must be encoded




Berners-Lee, Masinter & McCahill [Page 5]

RFC 1738 Uniform Resource Locators (URL) December 1994


Note that an empty user name or password is different than no
name or password; there is no way to specify a password
specifying a user name. E.g., has an
user name and no password, has no user name
while has a user name of "foo" and
empty password


The fully qualified domain name of a network host, or its
address as a set of four decimal digit groups separated
".". Fully qualified domain names take the form as
in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123
[5]: a sequence of domain labels separated by ".", each
label starting and ending with an alphanumerical character
possibly also containing "-" characters. The rightmost
label will never start with a digit, though,
syntactically distinguishes all domain names from the
addresses


The port number to connect to. Most schemes
protocols that have a default port number. Another port
may optionally be supplied, in decimal, separated from
host by a colon. If the port is omitted, the colon is as well

url-
The rest of the locator consists of data specific to
scheme, and is known as the "url-path". It supplies
details of how the specified resource can be accessed.
that the "/" between the host (or port) and the url-path
NOT part of the url-path

The url-path syntax depends on the scheme being used, as does
manner in which it is interpreted

3.2.

The FTP URL scheme is used to designate files and directories
Internet hosts accessible using the FTP protocol (RFC959).

A FTP URL follow the syntax described in Section 3.1. If :
omitted, the port defaults to 21.









Berners-Lee, Masinter & McCahill [Page 6]

RFC 1738 Uniform Resource Locators (URL) December 1994


3.2.1. FTP Name and

A user name and password may be supplied; they are used in the
"USER" and "PASS" commands after first making the connection to
FTP server. If no user name or password is supplied and one
requested by the FTP server, the conventions for "anonymous" FTP
to be used, as follows

The user name "anonymous" is supplied

The password is supplied as the Internet e-mail
of the end user accessing the resource

If the URL supplies a user name but no password, and the
server requests a password, the program interpreting the FTP
should request one from the user

3.2.2. FTP url-

The url-path of a FTP URL has the following syntax

//...//;type=
Where through and are (possibly encoded)
and is one of the characters "a", "i", or "d". The
";type=" may be omitted. The and parts may
empty. The whole url-path may be omitted, including the "/"
delimiting it from the prefix containing user, password, host,
port

The url-path is interpreted as a series of FTP commands as follows

Each of the elements is to be supplied, sequentially, as
argument to a CWD (change working directory) command

If the typecode is "d", perform a NLST (name list) command
as the argument, and interpret the results as a
directory listing

Otherwise, perform a TYPE command with as the argument
and then access the file whose name is (for example,
the RETR command.)

Within a name or CWD component, the characters "/" and ";"
reserved and must be encoded. The components are decoded prior
their use in the FTP protocol. In particular, if the appropriate
sequence to access a particular file requires supplying a
containing a "/" as an argument to a CWD or RETR command, it



Berners-Lee, Masinter & McCahill [Page 7]

RFC 1738 Uniform Resource Locators (URL) December 1994


necessary to encode each "/".

For example, the URL
interpreted by FTP-ing to "host.dom", logging in as "myname
(prompting for a password if it is asked for), and then
"CWD /etc" and then "RETR motd". This has a different meaning
which would "CWD etc" and
"RETR motd"; the initial "CWD" might be executed relative to
default directory for "myname". On the other hand
, would "CWD " with a
argument, then "CWD etc", and then "RETR motd".

FTP URLs may also be used for other operations; for example, it
possible to update a file on a remote file server, or
information about it from the directory listings. The mechanism
doing so is not spelled out here

3.2.3. FTP Typecode is

The entire ;type= part of a FTP URL is optional. If it
omitted, the client program interpreting the URL must guess
appropriate mode to use. In general, the data content type of a
can only be guessed from the name, e.g., from the suffix of the name
the appropriate type code to be used for transfer of the file
then be deduced from the data content of the file

3.2.4

For some file systems, the "/" used to denote the
structure of the URL corresponds to the delimiter used to construct
file name hierarchy, and thus, the filename will look similar to
URL path. This does NOT mean that the URL is a Unix filename

3.2.5.

Clients accessing resources via FTP may employ additional
to optimize the interaction. For some FTP servers, for example,
may be reasonable to keep the control connection open while
multiple URLs from the same server. However, there is no
hierarchical model to the FTP protocol, so if a directory
command has been given, it is impossible in general to deduce
sequence should be given to navigate to another directory for
second retrieval, if the paths are different. The only
algorithm is to disconnect and reestablish the control connection







Berners-Lee, Masinter & McCahill [Page 8]

RFC 1738 Uniform Resource Locators (URL) December 1994


3.3.

The HTTP URL scheme is used to designate Internet
accessible using HTTP (HyperText Transfer Protocol).

The HTTP protocol is specified elsewhere. This specification
describes the syntax of HTTP URLs

An HTTP URL takes the form

http://:/?
where and are as described in Section 3.1. If : is omitted, the port defaults to 80. No user name or password
allowed. is an HTTP selector, and is a
string. The is optional, as is the and
preceding "?". If neither nor is present, the "/"
may also be omitted

Within the and components, "/", ";", "?"
reserved. The "/" character may be used within HTTP to designate
hierarchical structure

3.4.

The Gopher URL scheme is used to designate Internet
accessible using the Gopher protocol

The base Gopher protocol is described in RFC 1436 and supports
and collections of items (directories). The Gopher+ protocol is a
of upward compatible extensions to the base Gopher protocol and
described in [2]. Gopher+ supports associating arbitrary sets
attributes and alternate data representations with Gopher items
Gopher URLs accommodate both Gopher and Gopher+ items and
attributes

3.4.1. Gopher URL

A Gopher URL takes the form

gopher://:/
where is one

<selector
<selector>%09 <selector>%09%09



Berners-Lee, Masinter & McCahill [Page 9]

RFC 1738 Uniform Resource Locators (URL) December 1994


If : is omitted, the port defaults to 70. is
single-character field to denote the Gopher type of the resource
which the URL refers. The entire may also be empty,
which case the delimiting "/" is also optional and the defaults to "1".

<selector> is the Gopher selector string. In the Gopher protocol
Gopher selector strings are a sequence of octets which may
any octets except 09 hexadecimal (US-ASCII HT or tab) 0A
(US-ASCII character LF), and 0D (US-ASCII character CR).

Gopher clients specify which item to retrieve by sending the
selector string to a Gopher server

Within the , no characters are reserved

Note that some Gopher <selector> strings begin with a copy of
character, in which case that character will occur
consecutively. The Gopher selector string may be an empty string
this is how Gopher clients refer to the top-level directory on
Gopher server

3.4.2 Specifying URLs for Gopher Search

If the URL refers to a search to be submitted to a Gopher
engine, the selector is followed by an encoded tab (%09) and
search string. To submit a search to a Gopher search engine,
Gopher client sends the <selector> string (after decoding), a tab
and the search string to the Gopher server

3.4.3 URL syntax for Gopher+

URLs for Gopher+ items have a second encoded tab (%09) and a Gopher
string. Note that in this case, the %09 string must
supplied, although the element may be the empty string

The is used to represent information required
retrieval of the Gopher+ item. Gopher+ items may have
views, arbitrary sets of attributes, and may have electronic
associated with them

To retrieve the data associated with a Gopher+ URL, a client
connect to the server and send the Gopher selector, followed by a
and the search string (which may be empty), followed by a tab and
Gopher+ commands






Berners-Lee, Masinter & McCahill [Page 10]

RFC 1738 Uniform Resource Locators (URL) December 1994


3.4.4 Default Gopher+ data

When a Gopher server returns a directory listing to a client,
Gopher+ items are tagged with either a "+" (denoting Gopher+ items
or a "?" (denoting Gopher+ items which have a +ASK form
with them). A Gopher URL with a Gopher+ string consisting of only
"+" refers to the default view (data representation) of the
while a Gopher+ string containing only a "?" refer to an item with
Gopher electronic form associated with it

3.4.5 Gopher+ items with electronic

Gopher+ items which have a +ASK associated with them (i.e. Gopher
items tagged with a "?") require the client to fetch the item's +
attribute to get the form definition, and then ask the user to
out the form and return the user's responses along with the
string to retrieve the item. Gopher+ clients know how to do this
depend on the "?" tag in the Gopher+ item description to know when
handle this case. The "?" is used in the Gopher+ string to
consistent with Gopher+ protocol's use of this symbol

3.4.6 Gopher+ item attribute

To refer to the Gopher+ attributes of an item, the Gopher URL'
Gopher+ string consists of "!" or "$". "!" refers to the all of
Gopher+ item's attributes. "$" refers to all the item attributes
all items in a Gopher directory

3.4.7 Referring to specific Gopher+

To refer to specific attributes, the URL's gopher+_string
"!<attribute_name>" or "$<attribute_name>". For example, to refer
the attribute containing the abstract of an item, the gopher+_
would be "!+ABSTRACT".

To refer to several attributes, the gopher+_string consists of
attribute names separated by coded spaces. For example
"!+ABSTRACT%20+SMELL" refers to the +ABSTRACT and +SMELL
of an item

3.4.8 URL syntax for Gopher+ alternate

Gopher+ allows for optional alternate data representations (
views) of items. To retrieve a Gopher+ alternate view, a Gopher
client sends the appropriate view and language identifier (found
the item's +VIEW attribute). To refer to a specific Gopher+
view, the URL's Gopher+ string would be in the form




Berners-Lee, Masinter & McCahill [Page 11]

RFC 1738 Uniform Resource Locators (URL) December 1994


+%20<language_name

For example, a Gopher+ string of "+application/postscript%20Es_ES
refers to the Spanish language postscript alternate view of a Gopher
item

3.4.9 URL syntax for Gopher+ electronic

The gopher+_string for a URL that refers to an item referenced by
Gopher+ electronic form (an ASK block) filled out with
values is a coded version of what the client sends to the server
The gopher+_string is of the form

+%091%0D%0A+-1%0D%0A%0D%0A%0D%0A.%0D%0

To retrieve this item, the Gopher client sends

selector>+1 +-1 .
to the Gopher server

3.5.

The mailto URL scheme is used to designate the Internet
address of an individual or service. No additional information
than an Internet mailing address is present or implied

A mailto URL takes the form

mailto:
where is (the encoding of an) addr-spec,
specified in RFC 822 [6]. Within mailto URLs, there are no
characters

Note that the percent sign ("%") is commonly used within RFC 822
addresses and must be encoded

Unlike many URLs, the mailto scheme does not represent a data
to be accessed directly; there is no sense in which it designates
object. It has a different use than the message/external-body type
MIME





Berners-Lee, Masinter & McCahill [Page 12]

RFC 1738 Uniform Resource Locators (URL) December 1994


3.6.

The news URL scheme is used to refer to either news groups
individual articles of USENET news, as specified in RFC 1036.

A news URL takes one of two forms

news:<newsgroup-name
news:
A <newsgroup-name> is a period-delimited hierarchical name, such
"comp.infosystems.www.misc". A corresponds to
Message-ID of section 2.1.5 of RFC 1036, without the enclosing "<"
and ">"; it takes the form @. A
identifier may be distinguished from a news group name by
presence of the commercial at "@" character. No additional
are reserved within the components of a news URL

If <newsgroup-name> is "*" (as in ), it is used to
to "all available news groups".

The news URLs are unusual in that by themselves, they do not
sufficient information to locate a single resource, but, rather,
location-independent

3.7.

The nntp URL scheme is an alternative method of referencing
articles, useful for specifying news articles from NNTP servers (
977).

A nntp URL take the form

nntp://:/<newsgroup-name>/
where and are as described in Section 3.1. If : is omitted, the port defaults to 119.

The <newsgroup-name> is the name of the group, while the number> is the numeric id of the article within that newsgroup

Note that while nntp: URLs specify a unique location for the
resource, most NNTP servers currently on the Internet today
configured only to allow access from local clients, and thus
URLs do not designate globally accessible resources. Thus, the news
form of URL is preferred as a way of identifying news articles





Berners-Lee, Masinter & McCahill [Page 13]

RFC 1738 Uniform Resource Locators (URL) December 1994


3.8.

The Telnet URL scheme is used to designate interactive services
may be accessed by the Telnet protocol

A telnet URL takes the form

telnet://:<password>@:/

as specified in Section 3.1. The final "/" character may be omitted
If : is omitted, the port defaults to 23. The :<password>
be omitted, as well as the whole :<password> part

This URL does not designate a data object, but rather an
service. Remote interactive services vary widely in the means
which they allow remote logins; in practice, the
<password> supplied are advisory only: clients accessing a telnet
merely advise the user of the suggested username and password

3.9.

The WAIS URL scheme is used to designate WAIS databases, searches,
individual documents available from a WAIS database. WAIS
described in [7]. The WAIS protocol is described in RFC 1625 [17];
Although the WAIS protocol is based on Z39.50-1988, the WAIS
scheme is not intended for use with arbitrary Z39.50 services

A WAIS URL takes one of the following forms

wais://:/<database
wais://:/<database>? wais://:/<database>//
where and are as described in Section 3.1. If : is omitted, the port defaults to 210. The first form designates
WAIS database that is available for searching. The second
designates a particular search. <database> is the name of the
database being queried

The third form designates a particular document within a
database to be retrieved. In this form is the
designation of the type of the object. Many WAIS
require that a client know the "type" of an object prior
retrieval, the type being returned along with the internal
identifier in the search response. The is included in
URL in order to allow the client interpreting the URL
information to actually retrieve the document




Berners-Lee, Masinter & McCahill [Page 14]

RFC 1738 Uniform Resource Locators (URL) December 1994


The of a WAIS URL consists of the WAIS document-id,
as necessary using the method described in Section 2.2. The
document-id should be treated opaquely; it may only be decomposed
the server that issued it

3.10

The file URL scheme is used to designate files accessible on
particular host computer. This scheme, unlike most other URL schemes
does not designate a resource that is universally accessible over
Internet

A file URL takes the form

file:///
where is the fully qualified domain name of the system
which the is accessible, and is a
directory path of the form <directory>/<directory>/.../.

For example, a VMS

DISK$USER:[MY.NOTES]NOTE123456.

might


As a special case, can be the string "localhost" or the
string; this is interpreted as `the machine from which the URL
being interpreted'.

The file URL scheme is unusual in that it does not specify
Internet protocol or access method for such files; as such,
utility in network protocols between hosts is limited

3.11

The Prospero URL scheme is used to designate resources that
accessed via the Prospero Directory Service. The Prospero protocol
described elsewhere [14].

A prospero URLs takes the form

prospero://:/;=
where and are as described in Section 3.1. If : is omitted, the port defaults to 1525. No username or password



Berners-Lee, Masinter & McCahill [Page 15]

RFC 1738 Uniform Resource Locators (URL) December 1994


allowed

The is the host-specific object name in the
protocol, suitably encoded. This name is opaque and interpreted
the Prospero server. The semicolon ";" is reserved and may
appear without quoting in the .

Prospero URLs are interpreted by contacting a Prospero
server on the specified host and port to determine appropriate
methods for a resource, which might themselves be represented
different URLs. External Prospero links are represented as URLs
the underlying access method and are not represented as
URLs

Note that a slash "/" may appear in the without quoting
no significance may be assumed by the application. Though
may indicate hierarchical structure on the server, such structure
not guaranteed. Note that many s begin with a slash,
which case the host or port will be followed by a double slash:
slash from the URL syntax, followed by the initial slash from
. (E.g., designates
of "/pros/name".)

In addition, after the , optional fields and
associated with a Prospero link may be specified as part of the URL
When present, each field/value pair is separated from each other
from the rest of the URL by a ";" (semicolon). The name of the
and its value are separated by a "=" (equal sign). If present,
fields serve to identify the target of the URL. For example,
OBJECT-VERSION field can be specified to identify a specific
of an object

4. REGISTRATION OF NEW

A new scheme may be introduced by defining a mapping onto
conforming URL syntax, using a new prefix. URLs for
schemes may be used by mutual agreement between parties. Scheme
starting with the characters "x-" are reserved for
purposes

The Internet Assigned Numbers Authority (IANA) will maintain
registry of URL schemes. Any submission of a new URL scheme
include a definition of an algorithm for accessing of
within that scheme and the syntax for representing such a scheme

URL schemes must have demonstrable utility and operability. One
to provide such a demonstration is via a gateway which
objects in the new scheme for clients using an existing protocol.



Berners-Lee, Masinter & McCahill [Page 16]

RFC 1738 Uniform Resource Locators (URL) December 1994


the new scheme does not locate resources that are data objects,
properties of names in the new space must be clearly defined

New schemes should try to follow the same syntactic conventions
existing schemes, where appropriate. It is likewise
that, where a protocol allows for retrieval by URL, that the
software have provision for being configured to use specific
locators for indirect access through new naming schemes

The following scheme have been proposed at various times, but
document does not define their syntax or use at this time. It
suggested that IANA reserve their scheme names for future definition

afs Andrew File System global file names
mid Message identifiers for electronic mail
cid Content identifiers for MIME body parts
nfs Network File System (NFS) file names
tn3270 Interactive 3270 emulation sessions
mailserver Access to data available from mail servers
z39.50 Access to ANSI Z39.50 services

5. BNF for specific URL

This is a BNF-like description of the Uniform Resource
syntax, using the conventions of RFC822, except that "|" is used
designate alternatives, and brackets [] are used around optional
repeated elements. Briefly, literals are quoted with "",
elements are enclosed in [brackets], and elements may be
with * to designate n or more repetitions of the
element; n defaults to 0.

; The generic form of a URL is

genericurl = scheme ":"

; Specific predefined schemes are defined here; new
; may be registered with

url = httpurl | ftpurl | newsurl |
nntpurl | telneturl | gopherurl |
waisurl | mailtourl | fileurl |
prosperourl |

; new schemes follow the general
otherurl =

; the scheme is in lower case; interpreters should use case-
scheme = 1*[ lowalpha | digit | "+" | "-" | "." ]



Berners-Lee, Masinter & McCahill [Page 17]

RFC 1738 Uniform Resource Locators (URL) December 1994


schemepart = *xchar | ip-


; URL schemeparts for ip based protocols

ip-schemepart = "//" login [ "/" urlpath ]

login = [ user [ ":" password ] "@" ]
hostport = host [ ":" port ]
host = hostname |
hostname = *[ domainlabel "." ]
domainlabel = alphadigit | alphadigit *[ alphadigit | "-" ]
toplabel = alpha | alpha *[ alphadigit | "-" ]
alphadigit = alpha |
hostnumber = digits "." digits "." digits "."
port =
user = *[ uchar | ";" | "?" | "&" | "=" ]
password = *[ uchar | ";" | "?" | "&" | "=" ]
urlpath = *xchar ; depends on protocol see section 3.1

; The predefined schemes

; FTP (see also RFC959)

ftpurl = "ftp://" login [ "/" fpath [ ";type=" ftptype ]]
fpath = fsegment *[ "/" fsegment ]
fsegment = *[ uchar | "?" | ":" | "@" | "&" | "=" ]
ftptype = "A" | "I" | "D" | "a" | "i" | "d

;

fileurl = "file://" [ host | "localhost" ] "/"

;

httpurl = "http://" hostport [ "/" hpath [ "?" search ]]
hpath = hsegment *[ "/" hsegment ]
hsegment = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
search = *[ uchar | ";" | ":" | "@" | "&" | "=" ]

; GOPHER (see also RFC1436)

gopherurl = "gopher://" hostport [ / [ gtype [
[ "%09" search [ "%09" gopher+_string ] ] ] ] ]
gtype =
selector = *
gopher+_string = *




Berners-Lee, Masinter & McCahill [Page 18]

RFC 1738 Uniform Resource Locators (URL) December 1994


; MAILTO (see also RFC822)

mailtourl = "mailto:" encoded822
encoded822addr = 1*xchar ; further defined in RFC822

; NEWS (see also RFC1036)

newsurl = "news:"
grouppart = "*" | group |
group = alpha *[ alpha | digit | "-" | "." | "+" | "_" ]
article = 1*[ uchar | ";" | "/" | "?" | ":" | "&" | "=" ] "@"

; NNTP (see also RFC977)

nntpurl = "nntp://" hostport "/" group [ "/" digits ]

;

telneturl = "telnet://" login [ "/" ]

; WAIS (see also RFC1625)

waisurl = waisdatabase | waisindex |
waisdatabase = "wais://" hostport "/"
waisindex = "wais://" hostport "/" database "?"
waisdoc = "wais://" hostport "/" database "/" wtype "/"
database = *
wtype = *
wpath = *

;

prosperourl = "prospero://" hostport "/" ppath *[ fieldspec ]
ppath = psegment *[ "/" psegment ]
psegment = *[ uchar | "?" | ":" | "@" | "&" | "=" ]
fieldspec = ";" fieldname "="
fieldname = *[ uchar | "?" | ":" | "@" | "&" ]
fieldvalue = *[ uchar | "?" | ":" | "@" | "&" ]

; Miscellaneous

lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
"i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
"q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
"y" | "z
hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z



Berners-Lee, Masinter & McCahill [Page 19]

RFC 1738 Uniform Resource Locators (URL) December 1994


alpha = lowalpha |
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9"
safe = "$" | "-" | "_" | "." | "+"
extra = "!" | "*" | "'" | "(" | ")" | ","
national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`"
punctuation = "<" | ">" | "#" | "%" | <">


reserved = ";" | "/" | "?" | ":" | "@" | "&" | "="
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f
escape = "%" hex

unreserved = alpha | digit | safe |
uchar = unreserved |
xchar = unreserved | reserved |
digits = 1*

6. Security

The URL scheme does not in itself pose a security threat.
should beware that there is no general guarantee that a URL which
one time points to a given object continues to do so, and does
even at some later time point to a different object due to
movement of objects on servers

A URL-related security threat is that it is sometimes possible
construct a URL such that an attempt to perform a harmless
operation such as the retrieval of the object will in fact cause
possibly damaging remote operation to occur. The unsafe URL
typically constructed by specifying a port number other than
reserved for the network protocol in question. The
unwittingly contacts a server which is in fact running a
protocol. The content of the URL contains instructions which
interpreted according to this other protocol cause an
operation. An example has been the use of gopher URLs to cause a
message to be sent via a SMTP server. Caution should be used
using any URL which specifies a port number other than the
for the protocol, especially when it is a number within the
space

Care should be taken when URLs contain embedded encoded
for a given protocol (for example, CR and LF characters for
protocols) that these are not unencoded before transmission.
would violate the protocol but could be used to simulate an
operation or parameter, again causing an unexpected and
harmful remote operation to be performed



Berners-Lee, Masinter & McCahill [Page 20]

RFC 1738 Uniform Resource Locators (URL) December 1994


The use of URLs containing passwords that should be secret is
unwise

7.

This paper builds on the basic WWW design (RFC 1630) and
discussion of these issues by many people on the network.
discussion was particularly stimulated by articles by Clifford Lynch
Brewster Kahle [10] and Wengyik Yeong [18]. Contributions from
Curran, Clifford Neuman, Ed Vielmetti and later the IETF URL BOF
URI working group were incorporated

Most recently, careful readings and comments by Dan Connolly,
Freed, Roy Fielding, Guido van Rossum, Michael Dolan, Bert Bos,
Kunze, Olle Jarnefors, Peter Svanberg and many others have
refine this RFC



































Berners-Lee, Masinter & McCahill [Page 21]

RFC 1738 Uniform Resource Locators (URL) December 1994


APPENDIX: Recommendations for URLs in

URIs, including URLs, are intended to be transmitted
protocols which provide a context for their interpretation

In some cases, it will be necessary to distinguish URLs from
possible data structures in a syntactic structure. In this case,
recommended that URLs be preceeded with a prefix consisting of
characters "URL:". For example, this prefix may be used
distinguish URLs from other kinds of URIs

In addition, there are many occasions when URLs are included in
kinds of text; examples include electronic mail, USENET
messages, or printed on paper. In such cases, it is convenient
have a separate syntactic wrapper that delimits the URL and
it from the rest of the text, and in particular from
marks that might be mistaken for part of the URL. For this purpose
is recommended that angle brackets ("<" and ">"), along with
prefix "URL:", be used to delimit the boundaries of the URL.
wrapper does not form part of the URL and should not be used
contexts in which delimiters are already specified

In the case where a fragment/anchor identifier is associated with
URL (following a "#"), the identifier would be placed within
brackets as well

In some cases, extra whitespace (spaces, linebreaks, tabs, etc.)
need to be added to break long URLs across lines. The
should be ignored when extracting the URL

No whitespace should be introduced after a hyphen ("-") character
Because some typesetters and printers may (erroneously) introduce
hyphen at the end of line when breaking a line, the interpreter of
URL containing a line break immediately after a hyphen should
all unencoded whitespace around the line break, and should be
that the hyphen may or may not actually be part of the URL

Examples

Yes, Jim, I found it under type=d> but you can probably pick it up from ternic.net/rfc>. Note the warning in internic
net/instructions/overview.html#WARNING>.








Berners-Lee, Masinter & McCahill [Page 22]

RFC 1738 Uniform Resource Locators (URL) December 1994




[1] Anklesaria, F., McCahill, M., Lindner, P., Johnson, D.,
Torrey, D., and B. Alberti, "The Internet Gopher
(a distributed document search and retrieval protocol)",
RFC 1436, University of Minnesota, March 1993.
internic.net/rfc/rfc1436.txt;type=a

[2] Anklesaria, F., Lindner, P., McCahill, M., Torrey, D.,
Johnson, D., and B. Alberti, "Gopher+: Upward
enhancements to the Internet Gopher protocol",
University of Minnesota, July 1993.
/Gopher+/Gopher+.txt

[3] Berners-Lee, T., "Universal Resource Identifiers in WWW:
Unifying Syntax for the Expression of Names and Addresses
Objects on the Network as used in the World-Wide Web",
1630, CERN, June 1994.
internic.net/rfc/rfc1630.txt

[4] Berners-Lee, T., "Hypertext Transfer Protocol (HTTP)",
CERN, November 1993.

[5] Braden, R., Editor, "Requirements for Internet Hosts --
Application and Support", STD 3, RFC 1123, IETF, October 1989.
internic.net/rfc/rfc1123.txt

[6] Crocker, D. "Standard for the Format of ARPA Internet
Messages", STD 11, RFC 822, UDEL, April 1982.
internic.net/rfc/rfc822.txt

[7] Davis, F., Kahle, B., Morris, H., Salem, J., Shen, T., Wang, R.,
Sui, J., and M. Grinbaum, "WAIS Interface Protocol
Functional Specification", (v1.5), Thinking
Corporation, April 1990.

[8] Horton, M. and R. Adams, "Standard For Interchange of
Messages", RFC 1036, AT&T Bell Laboratories, Center for
Studies, December 1987.
internic.net/rfc/rfc1036.txt

[9] Huitema, C., "Naming: Strategies and Techniques",
Networks and ISDN Systems 23 (1991) 107-110.





Berners-Lee, Masinter & McCahill [Page 23]

RFC 1738 Uniform Resource Locators (URL) December 1994


[10] Kahle, B., "Document Identifiers, or International
Book Numbers for the Electronic Age", 1991.

[11] Kantor, B. and P. Lapsley, "Network News Transfer Protocol
A Proposed Standard for the Stream-Based Transmission of News",
RFC 977, UC San Diego & UC Berkeley, February 1986.
internic.net/rfc/rfc977.txt

[12] Kunze, J., "Functional Requirements for Internet
Locators", Work in Progress, December 1994.
internic.net/internet-
/draft-ietf-uri-irl-fun-req-02.txt

[13] Mockapetris, P., "Domain Names - Concepts and Facilities",
STD 13, RFC 1034, USC/Information Sciences Institute
November 1987.
internic.net/rfc/rfc1034.txt

[14] Neuman, B., and S. Augart, "The Prospero Protocol",
USC/Information Sciences Institute, June 1993.
/prospero-protocol.PS.Z

[15] Postel, J. and J. Reynolds, "File Transfer Protocol (FTP)",
STD 9, RFC 959, USC/Information Sciences Institute
October 1985.
internic.net/rfc/rfc959.txt

[16] Sollins, K. and L. Masinter, "Functional Requirements
Uniform Resource Names", RFC 1737, MIT/LCS, Xerox Corporation
December 1994.
internic.net/rfc/rfc1737.txt

[17] St. Pierre, M, Fullton, J., Gamiel, K., Goldman, J., Kahle, B.,
Kunze, J., Morris, H., and F. Schiettecatte, "WAIS
Z39.50-1988", RFC 1625, WAIS, Inc., CNIDR, Thinking
Corp., UC Berkeley, FS Consulting, June 1994.
internic.net/rfc/rfc1625.txt

[18] Yeong, W. "Towards Networked Information Retrieval",
report 91-06-25-01, Performance Systems International, Inc
, June 1991.

[19] Yeong, W., "Representing Public Archives in the Directory",
Work in Progress, November 1991.





Berners-Lee, Masinter & McCahill [Page 24]

RFC 1738 Uniform Resource Locators (URL) December 1994


[20] "Coded Character Set -- 7-bit American Standard Code
Information Interchange", ANSI X3.4-1986.

Editors'

Tim Berners-
World-Wide Web
CERN
1211 Geneva 23,


Phone: +41 (22)767 3755
Fax: +41 (22)767 7155
EMail: timbl@info.cern.


Larry
Xerox
3333 Coyote Hill
Palo Alto, CA 94034

Phone: (415) 812-4365
Fax: (415) 812-4333
EMail: masinter@parc.xerox.


Mark
Computer and Information Services
University of
Room 152 Shepherd
100 Union Street
Minneapolis, MN 55455

Phone: (612) 625 1300
EMail: mpm@boombox.micro.umn.
















Berners-Lee, Masinter & McCahill [Page 25]








if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.




RFC documents can be found at I.E.T.F.



Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX







Spectrum