As per Relevance of the word register, we have this rfc below:
Network Working Group H.
Request for Comments: 3066 Cisco
BCP: 47 January 2001
Obsoletes: 1766
Category: Best Current
Tags for the Identification of
Status of this
This document specifies an Internet Best Current Practices for
Internet Community, and requests discussion and suggestions
improvements. Distribution of this memo is unlimited
Copyright
Copyright (C) The Internet Society (2001). All Rights Reserved
This document describes a language tag for use in cases where it
desired to indicate the language used in an information object,
to register values for use in this language tag, and a construct
matching such language tags
1.
Human beings on our planet have, past and present, used a number
languages. There are many reasons why one would want to identify
language used when presenting information
In some contexts, it is possible to have information available
more than one language, or it might be possible to provide
(such as dictionaries) to assist in the understanding of a language
Also, many types of information processing require knowledge of
language in which information is expressed in order for that
to be performed on the information; for example spell-checking
computer-synthesized speech, Braille, or high-quality
renderings
One means of indicating the language used is by labeling
information content with an identifier for the language that is
in this information content
Alvestrand Best Current Practice [Page 1]
RFC 3066 Tags for Identification of Languages January 2001
This document specifies an identifier mechanism, a
function for values to be used with that identifier mechanism, and
construct for matching against those values
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
document are to be interpreted as described in [RFC 2119].
2. The Language
2.1 Language tag
The language tag is composed of one or more parts: A primary
subtag and a (possibly empty) series of subsequent subtags
The syntax of this tag in ABNF [RFC 2234] is
Language-Tag = Primary-subtag *( "-" Subtag )
Primary-subtag = 1*8
Subtag = 1*8(ALPHA / DIGIT
The productions ALPHA and DIGIT are imported from RFC 2234;
denote respectively the characters A to Z in upper or lower case
the digits from 0 to 9. The character "-" is HYPHEN-MINUS (ABNF
%x2D).
All tags are to be treated as case insensitive; there
conventions for capitalization of some of them, but these should
be taken to carry meaning. For instance, [ISO 3166] recommends
country codes are capitalized (MN Mongolia), while [ISO 639]
recommends that language codes are written in lower case (
Mongolian).
2.2 Language tag
The namespace of language tags is administered by the
Assigned Numbers Authority (IANA) [RFC 2860] according to the
in section 3 of this document
The following rules apply to the primary subtag
- All 2-letter subtags are interpreted according to assignments
in ISO standard 639, "Code for the representation of names
languages" [ISO 639], or assignments subsequently made by the
639 part 1 maintenance agency or governing standardization bodies
(Note: A revision is underway, and is expected to be released
Alvestrand Best Current Practice [Page 2]
RFC 3066 Tags for Identification of Languages January 2001
ISO 639-1:2000)
- All 3-letter subtags are interpreted according to assignments
in ISO 639 part 2, "Codes for the representation of names
languages -- Part 2: Alpha-3 code [ISO 639-2]", or
subsequently made by the ISO 639 part 2 maintenance agency
governing standardization bodies
- The value "i" is reserved for IANA-defined
- The value "x" is reserved for private use. Subtags of "x"
not be registered by the IANA
- Other values shall not be assigned except by revision of
standard
The reason for reserving all other tags is to be open towards
revisions of ISO 639; the use of "i" and "x" is the minimum we can
here to be able to extend the mechanism to meet our
requirements
The following rules apply to the second subtag
- All 2-letter subtags are interpreted as ISO 3166 alpha-2
codes from [ISO 3166], or subsequently assigned by the ISO 3166
maintenance agency or governing standardization bodies,
the area to which this language variant relates
- Tags with second subtags of 3 to 8 letters may be registered
IANA, according to the rules in chapter 5 of this document
- Tags with 1-letter second subtags may not be assigned except
revision of this standard
There are no rules apart from the syntactic ones for the third
subsequent subtags
Tags constructed wholly from the codes that are
interpretations by this chapter do not need to be registered
IANA before use
The information in a subtag may for instance be
- Country identification, such as en-US (this usage is described
ISO 639)
- Dialect or variant information, such as en-
Alvestrand Best Current Practice [Page 3]
RFC 3066 Tags for Identification of Languages January 2001
- Languages not listed in ISO 639 that are not variants of any
language, which can be registered with the i-prefix, such as i
- Region identification, such as sgn-US-MA (Martha's Vineyard
Language, which is found in the state of Massachusetts, US
This document leaves the decision on what tags are appropriate or
to the registration process described in section 3.
ISO 639 defines a maintenance agency for additions to and changes
the list of languages in ISO 639. This agency is
International Information Centre for Terminology (Infoterm
P.O. Box 130
A-1021
Phone: +43 1 26 75 35 Ext. 312
Fax: +43 1 216 32 72
ISO 639-2 defines a maintenance agency for additions to and
in the list of languages in ISO 639-2. This agency is
Library of
Network Development and MARC Standards
Washington, D.C. 20540
Phone: +1 202 707 6237
Fax: +1 202 707 0115
URL: http://www.loc.gov/standards/iso639
The maintenance agency for ISO 3166 (country codes) is
ISO 3166 Maintenance Agency
c/o DIN Deutsches Institut fuer
Burggrafenstrasse 6
Postfach 1107
D-10787
Phone: +49 30 26 01 320
Fax: +49 30 26 01 231
URL: http://www.din.de/gremien/nas/nabd/iso3166ma
ISO 3166 reserves the country codes AA, QM-QZ, XA-XZ and ZZ as user
assigned codes. These MUST NOT be used to form language tags
Alvestrand Best Current Practice [Page 4]
RFC 3066 Tags for Identification of Languages January 2001
2.3 Choice of language
One may occasionally be faced with several possible tags for the
body of text
Interoperability is best served if all users send the same tag,
use the same tag for the same language for all documents. If
application has requirements that make the rules here inapplicable
the application protocol specification MUST specify how the
varies from the one given here
The text below is based on the set of tags known to the
entity
1. Use the most precise tagging known to the sender that can
ascertained and is useful within the application context
2. When a language has both an ISO 639-1 2-character code and an
639-2 3-character code, you MUST use the tag derived from the
639-1 2-character code
3. When a language has no ISO 639-1 2-character code, and the
639-2/T (Terminology) code and the ISO 639-2/B (Bibliographic
code differ, you MUST use the Terminology code. NOTE: At present
all languages for which there is a difference have 2-
codes, and the displeasure of developers about the existence of 2
code sets has been adequately communicated to ISO. So
situation will hopefully not arise
4. When a language has both an IANA-registered tag (i-something)
a tag derived from an ISO registered code, you MUST use the
tag. NOTE: When such a situation is discovered, the IANA
registered tag SHOULD be deprecated as soon as possible
5. You SHOULD NOT use the UND (Undetermined) code unless the
in use forces you to give a value for the language tag, even
the language is unknown. Omitting the tag is preferred
6. You SHOULD NOT use the MUL (Multiple) tag if the protocol
you to use multiple languages, as is the case for the Content
Language: header
NOTE: In order to avoid versioning difficulties in applications
as that of RFC 1766, the ISO 639 Registration Authority
Advisory Committee (RA-JAC) has agreed on the following
statement
Alvestrand Best Current Practice [Page 5]
RFC 3066 Tags for Identification of Languages January 2001
"After the publication of ISO/DIS 639-1 as an
Standard, no new 2-letter code shall be added to ISO 639-1 unless
3-letter code is also added at the same time to ISO 639-2.
addition, no language with a 3-letter code available at the time
publication of ISO 639-1 which at that time had no 2-letter
shall be subsequently given a 2-letter code."
This will ensure that, for example, a user who implements "hwi
(Hawaiian), which currently has no 2-letter code, will not find
or her data invalidated by eventual addition of a 2-letter code
that language."
2.4 Meaning of the language
The language tag always defines a language as spoken (or written
signed or otherwise signaled) by human beings for communication
information to other human beings. Computer languages such
programming languages are explicitly excluded. There is
guaranteed relationship between languages whose tags begin with
same series of subtags; specifically, they are NOT guaranteed to
mutually intelligible, although it will sometimes be the case
they are
The relationship between the tag and the information it relates to
defined by the standard describing the context in which it appears
Accordingly, this section can only give possible examples of
usage
- For a single information object, it could be taken as the set
languages that is required for a complete comprehension of
complete object
Example: Plain text documents
- For an aggregation of information objects, it should be taken
the set of languages used inside components of that aggregation
Examples: Document stores and libraries
- For information objects whose purpose is to provide alternatives
the set of tags associated with it should be regarded as a
that the content is provided in several languages, and that one
to inspect each of the alternatives in order to find its
or languages. In this case, a tag with multiple languages does
mean that one needs to be multi-lingual to get
understanding of the document
Example: MIME multipart/alternative
Alvestrand Best Current Practice [Page 6]
RFC 3066 Tags for Identification of Languages January 2001
- In markup languages, such as HTML and XML, language information
be added to each part of the document identified by the
structure (including the whole document itself). For example,
could write C'est la vie. inside a
document; the Norwegian-speaking user could then access a French
Norwegian dictionary to find out what the marked section meant.
the user were listening to that document through a speech
interface, this formation could be used to signal the
to appropriately apply French text-to-speech pronunciation rules
that span of text, instead of misapplying the Norwegian rules
2.5 Language-
Since the publication of RFC 1766, it has become apparent that
is a need to define a term for a set of languages whose tags
begin with the same sequence of subtags
The following definition of language-range is derived from HTTP/1.1
[RFC 2616].
language-range = language-tag / "*"
That is, a language-range has the same syntax as a language-tag,
is the single character "*".
A language-range matches a language-tag if it exactly equals the tag
or if it exactly equals a prefix of the tag such that the
character following the prefix is "-".
The special range "*" matches any tag. A protocol which
language ranges may specify additional rules about the semantics
"*"; for instance, HTTP/1.1 specifies that the range "*" matches
languages not matched by any other range within an "Accept-Language:"
header
NOTE: This use of a prefix matching rule does not imply that
tags are assigned to languages in such a way that it is always
that if a user understands a language with a certain tag, then
user will also understand all languages with tags for which this
is a prefix. The prefix rule simply allows the use of prefix tags
this is the case
3. IANA registration procedure for language
The procedure given here MUST be used by anyone who wants to use
language tag not given an interpretation in chapter 2.2 of
document or previously registered with IANA
Alvestrand Best Current Practice [Page 7]
RFC 3066 Tags for Identification of Languages January 2001
This procedure MAY also be used to register information with the
about a tag defined by this document, for instance if one wishes
make publicly available a reference to the definition for a
such as sgn-US (American Sign Language).
Tags with a first subtag of "x" need not, and cannot, be registered
The process starts by filling out the registration form
below
----------------------------------------------------------------------
LANGUAGE TAG REGISTRATION
Name of requester :
E-mail address of requester
Tag to be registered :
English name of language :
Native name of language (transcribed into ASCII):
Reference to published description of the language (book or article):
Any other relevant information
----------------------------------------------------------------------
The language form must be sent to for a 2-
week review period before it can be submitted to IANA. (This is
open list. Requests to be added should be sent to
request@iana.org>.)
When the two week period has passed, the language tag reviewer,
is appointed by the IETF Applications Area Director, either
the request to IANA@IANA.ORG, or rejects it because of
objections raised on the list. Note that the reviewer can
objections on the list himself, if he so desires. The
thing is that the objection must be made publicly
The applicant is free to modify a rejected application
additional information and submit it again; this restarts the 2-
comment period
Alvestrand Best Current Practice [Page 8]
RFC 3066 Tags for Identification of Languages January 2001
Decisions made by the reviewer may be appealed to the IESG [RFC 2028]
under the same rules as other IETF decisions [RFC 2026].
registered forms are available online in the
http://www.iana.org/numbers.html under "languages".
Updates of registrations follow the same procedure as registrations
The language tag reviewer decides whether to allow a new
to update a registration made by someone else; in the normal case
objections by the original registrant would carry extra weight
such a decision
There is no deletion of registrations; when some registered
should not be used any more, for instance because a corresponding
639 code has been registered, the registration should be amended
adding a remark like "DEPRECATED: use instead" to
"other relevant information" section
Note: The purpose of the "published description" is intended as
aid to people trying to verify whether a language is registered,
what language a particular tag refers to. In most cases,
to an authoritative grammar or dictionary of the language will
useful; in cases where no such work exists, other well known
describing that language or in that language may be appropriate.
language tag reviewer decides what constitutes a "good enough
reference material
4. Security
The only security issue that has been raised with language tags
the publication of RFC 1766, which stated that "Security issues
believed to be irrelevant to this memo", is a concern with
ranges used in content negotiation - that they may be used to
the nationality of the sender, and thus identify potential
for surveillance
This is a special case of the general problem that anything you
is visible to the receiving party; it is useful to be aware that
concerns can exist in some cases
The evaluation of the exact magnitude of the threat, and any
countermeasures, is left to each application protocol
5. Character set
Language tags may always be presented using the characters A-Z, a-z
0-9 and HYPHEN-MINUS, which are present in most character sets,
presentation of language tags should not have any character
issues
Alvestrand Best Current Practice [Page 9]
RFC 3066 Tags for Identification of Languages January 2001
The issue of deciding upon the rendering of a character set based
the language tag is not addressed in this memo; however, it
thought impossible to make such a decision correctly for all
unless means of switching language in the middle of a text
defined (for example, a rendering engine that decides font based
Japanese or Chinese language may produce suboptimal output when
mixed Japanese-Chinese text is encountered
6.
This document has benefited from many rounds of review and
in various fora of the IETF and the Internet working groups
Any list of contributors is bound to be incomplete; please regard
following as only a selection from the group of people who
contributed to make this document what it is today
In alphabetical order
Glenn Adams, Tim Berners-Lee, Marc Blanchet, Nathaniel Borenstein
Eric Brunner, Sean M. Burke, John Clews, Jim Conklin,
Constable, John Cowan, Mark Crispin, Dave Crocker, Mark Davis,
Duerst, Michael Everson, Ned Freed, Tim Goodwin, Dirk-Willem
Gulik, Marion Gunn, Paul Hoffman, Olle Jarnefors, Kent Karlsson,
Klensin, Alain LaBonte, Chris Newman, Keith Moore, Masataka Ohta
Keld Jorn Simonsen, Otto Stolz, Rhys Weatherley, Misha Wolf,
Yergeau and many, many others
Special thanks must go to Michael Everson, who has served as
tag reviewer for almost the complete period since the publication
RFC 1766, and has provided a great deal of input to this revision
7. Author's
Harald Tveit
Cisco
Weidemanns vei 27
7043
Phone: +47 73 50 33 52
EMail: Harald@Alvestrand.
Alvestrand Best Current Practice [Page 10]
RFC 3066 Tags for Identification of Languages January 2001
8.
[ISO 639] ISO 639:1988 (E/F) - Code for the representation of
of languages - The International Organization
Standardization, 1st edition, 1988-04-01 Prepared
ISO/TC 37 - Terminology (principles and coordination).
Note that a new version (ISO 639-1:2000) is
preparation at the time of this writing
[ISO 639-2] ISO 639-2:1998 - Codes for the representation of names
languages -- Part 2: Alpha-3 code - edition 1, 1998-11-
01, 66 pages, prepared by a Joint Working Group of
TC46/SC4 and ISO TC37/SC2.
[ISO 3166] ISO 3166:1988 (E/F) - Codes for the representation
names of countries - The International Organization
Standardization, 3rd edition, 1988-08-15.
[RFC 1327] Kille, S., "Mapping between X.400 (1988) / ISO 10021
RFC 822", RFC 1327, May 1992.
[RFC 1521] Borenstein, N., and N. Freed, "MIME Part One:
for Specifying and Describing the Format of
Message Bodies", RFC 1521, September 1993.
[RFC 2026] Bradner, S., "The Internet Standards Process --
3", BCP 9, RFC 2026, October 1996.
[RFC 2028] Hovey, R. and S. Bradner, "The Organizations Involved
the IETF Standards Process", BCP 11, RFC 2028,
1996.
[RFC 2119] Bradner, S."Key words for use in RFCs to
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC 2234] Crocker, D. and P. Overell, "Augmented BNF for
Specifications: ABNF", RFC 2234, November 1997.
[RFC 2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P. and T. Berners-Lee, "
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[RFC 2860] Carpenter, B., Baker, F. and M. Roberts, "Memorandum
Understanding Concerning the Technical Work of
Internet Assigned Numbers Authority", RFC 2860,
2000.
Alvestrand Best Current Practice [Page 11]
RFC 3066 Tags for Identification of Languages January 2001
Appendix A: Language Tag Reference
The Library of Congress, maintainers of ISO 639-2, has made the
of languages registered available on the Internet
At the time of this writing, it can be found
http://www.loc.gov/standards/iso639-2/langhome.
The IANA registration forms for registered language codes can
found at http://www.iana.org/numbers.html under "languages".
The ISO 3166 Maintenance Agency has published Web pages
http://www.din.de/gremien/nas/nabd/iso3166ma
Appendix B: Changes from RFC 1766
- Email list address changed from ietf-types@uninett.no to ietf
languages@iana.
- Updated author's
- Added language-range construct from HTTP/1.1
- Added use of ISO 639-2 language
- Added reference to Library of Congress lists of language
- Changed examples to use registered
- Added "Any other information" to registration
- Added description of procedure for updating
- Changed target category for document from standards track to
- Moved the content-language header definition into another
- Added numbers to the permitted characters in language
Alvestrand Best Current Practice [Page 12]
RFC 3066 Tags for Identification of Languages January 2001
Full Copyright
Copyright (C) The Internet Society (2001). All Rights Reserved
This document and translations of it may be copied and furnished
others, and derivative works that comment on or otherwise explain
or assist in its implementation may be prepared, copied,
and distributed, in whole or in part, without restriction of
kind, provided that the above copyright notice and this paragraph
included on all such copies and derivative works. However,
document itself may not be modified in any way, such as by
the copyright notice or references to the Internet Society or
Internet organizations, except as needed for the purpose
developing Internet standards in which case the procedures
copyrights defined in the Internet Standards process must
followed, or as required to translate it into languages other
English
The limited permissions granted above are perpetual and will not
revoked by the Internet Society or its successors or assigns
This document and the information contained herein is provided on
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED,
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE
Funding for the RFC Editor function is currently provided by
Internet Society
Alvestrand Best Current Practice [Page 13]
if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.
RFC documents can be found at I.E.T.F.
Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX