As per Relevance of the word internet, we have this rfc below:
Network Working Group K.
Request for Comments: 2237 Microsoft
Category: Informational November 1997
Japanese Character Encoding for Internet
Status of this
This memo provides information for the Internet community. It
not specify an Internet standard of any kind. Distribution of
memo is unlimited
Copyright
Copyright (C) The Internet Society (1997). All Rights Reserved
1.
This memo defines an encoding scheme for the Japanese Characters
describes "ISO-2022-JP-1", which is used in electronic mail [RFC
822], and network news [RFC 1036]. Also this memo provides a
of the Japanese Character Set that can be used in this
scheme
2. Requirements
This document uses terms that appear in capital letters to
particular requirements of this specification. Those terms
"MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY". The meaning
each term are found in [RFC-2119]
3.
RFC 1468 defines the way Japanese Characters are encoded,
what this memo defines. It defines the use of JIS X 0208 as
double-byte character set in ISO-2022-JP text
Today, many operating systems support proprietary extended
characters or JIS X 0212, This includes the Unicode character set
which does not conform to JIS X 0201 nor JIS X 0208. Therefore,
limits the ability to communicate and correspond precise
because of the limited availability of Kanji characters.
JIS (Japanese Industry Standard) defines JIS X 0212 as "code of
Tamaru Informational [Page 1]
RFC 2237 Japanese Character Encoding November 1997
supplementary Japanese graphic character set for
interchange". Most Japanese characters which are used in
electronic mail in most cases can be accommodated in JIS X 0201,
X 0208 and JIS X 0212.
Also it is recognized that there is a tendency to use Unicode
however, Unicode is not yet widely used and there is a
limitation with old electronic mail system. Furthermore, the
of this comment is to add the capability of writing out JIS X 0212.
This comment does not describe any representation of iso-2022-jp-1
version information in addition to JIS X 0212 support
4.
In "ISO-2022-JP-1" text, the initial character code of the message
in ASCII. The "double-byte-seq"(see "Format Syntax" section) (ESC "$"
"B" / ESC "$" "@" / ESC "$" "(" "D") is the only designator
indicates that the following character is double-byte, and it
valid until another escape sequence appears. It is very
to use (ESC "$" "@") for double byte character encoding,
implementation SHOULD use only (ESC "$" "B") for double byte
instead
The end of "ISO-2022-JP-1" text MUST be in ASCII. Also it is
recommended to back up to the ASCII at the end of each line
than JIS X 0201-Roman if there is any none ASCII character in
of a line
Since "ISO-2022-JP-1" is designed to add the capability of
out JIS X 0212, if the message does not contain none of JIS X 0212
characters. "ISO-2022-JP" text MUST BE used
JIS X 0201-Roman is not identical to the ASCII with two
characters
The following list are the escape sequences and character sets
can be used in "ISO-2022-JP-1" text. The registered number in the
2375 Register which allow double-byte ideographic scripts to
encoded within ISO/IEC 2022 code structure is indicated as reg
below
reg# character set ESC sequence designated
6 ASCII ESC 2/8 4/2 ESC ( B G
42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G
87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G
14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G
159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G
Tamaru Informational [Page 2]
RFC 2237 Japanese Character Encoding November 1997
Other restrictions are given in the Formal Syntax below
5. Formal
The notational conventions used here are identical to those used
STD 11, RFC 822 [RFC822].
The * (asterisk) convention is as follows
l*m
meaning at least l and at most m something, with l and m
default values of 0 and infinity, respectively
iso-2022-jp-1-text = *( line CRLF ) [line
line = (*single-byte-char *
single-byte-seq *single-byte-char) /
*single-byte-
segment = single-byte-segment / double-byte-
single-byte-segment = single-byte-seq *single-byte-
double-byte-segment = double-byte-seq *(one-of-94 one-of-94)
reset-seq = ESC "(" ( "B" / "J" )
single-byte-seq = ESC "(" ( "B" / "J" )
double-byte-seq = (ESC "$" ( "@" / "B" )) /
(ESC "$" "(" "D" )
CRLF = CR LF;( Octal, Decimal.)
ESC = ;( 33,27.)
SI = ;( 17,15.)
SO = ;( 16,14.)
CR = ;( 15,13.)
LF = linefeed>;( 12,10.)
one-of-94 = ;(41-176,33.-126.)
one-of-96 = ;(40-177,32.-127.)
7BIT = ;(0-177,0.-127.)
single-byte-char =
but NOT including CRLF, and not
ESC, SI, SO
6. Security
This memo raises no known security issues
Tamaru Informational [Page 3]
RFC 2237 Japanese Character Encoding November 1997
7. MIME
The name to be used for the Japanese encoding scheme in content
"ISO-2022-JP-1". When this name is used in the MIME message form,
would be
Content-Type: text/plain; charset=iso-2022-jp-1
Since the "ISO-2022-JP-1" is 7bit encoding, it will be unnecessary
encode in another format by specifying the "Content-Transfer
Encoding" header. Also applying Based64 or Quoted-Printable
MAY cause today's software to fail to decode the message
"ISO-2022-JP-1" can be used in MIME headers. Also "ISO-2022-JP-1"
text can be used with Base64 or Quoted-Printable encoding
8. Additional
As long as mail systems are capable of writing out Unicode, it
recommended to also write out Unicode text in addition to "ISO
2022-JP-1" text. Also writing out "ISO-2022-JP" text in addition
"ISO-2022-JP-1" is strongly encouraged for backward
reasons
Some mail systems write out 8bits characters in 'parameter'
'value' defined in [RFC 822] and [RFC 1521]. All 8bit characters
NOT be used in those fields. The implementation of future
systems SHOULD support those only for interoperability reasons
9.
[ISO2022]
International Organization for Standardization (ISO),
"Information processing -- ISO 7-bit and 8-bit
character sets -- Code extension techniques",
International Standard, Ref. No. ISO 2022-1986 (E).
[ISOREG
International Organization for Standardization (ISO),
"International Register of Coded Character Sets To Be
With Escape Sequences".
[RFC-822]
Crocker, D., "Standard for the Format of ARPA
Text Messages", STD 11, RFC 822, August 1982.
Tamaru Informational [Page 4]
RFC 2237 Japanese Character Encoding November 1997
[RFC-1468]
Murai, J., Crispin, M., and E. van der Poel, "
Character Encoding for Internet Messages", RFC 1468,
1993.
[RFC-1766]
Alvestrand, H., "Tags for the Identification
Languages", RFC 1766, March 1995.
[RFC-2045]
Freed, N., and N. Borenstein, "Multipurpose Internet
Extensions (MIME) Part One: Format of Internet
Bodies", RFC 2045, December 1996.
[RFC-2046]
Freed, N., and N. Borenstein, "Multipurpose Internet
Extensions (MIME) Part Two: Media Types", RFC 2046,
December 1996.
[RFC-2047]
Moore, K., "Multipurpose Internet Mail Extensions (MIME
Part Three: Representation of Non-ASCII Text in
Message Headers", RFC 2047, December 1996.
[RFC-2048]
Freed, N., Klensin, J. and J. Postel, "
Internet Mail Extensions (MIME) Part Four:
Registration Procedures", RFC 2048, December 1996.
[RFC-2049]
Freed, N., and N. Borenstein, "Multipurpose Internet
Extensions (MIME) Part Five: Conformance Criteria
Examples", RFC 2049, December 1996.
[RFC-2119]
Bradner, S., "Key words for use in RFCs to
Requirement Levels", RFC 2119, March 1997.
Author's
Kenzaburo
Microsoft
One Microsoft
Redmond, WA 98052-6399
EMail: kenzat@microsoft.
Tamaru Informational [Page 5]
RFC 2237 Japanese Character Encoding November 1997
Full Copyright
Copyright (C) The Internet Society (1997). All Rights Reserved
This document and translations of it may be copied and furnished
others, and derivative works that comment on or otherwise explain
or assist in its implementation may be prepared, copied,
and distributed, in whole or in part, without restriction of
kind, provided that the above copyright notice and this paragraph
included on all such copies and derivative works. However,
document itself may not be modified in any way, such as by
the copyright notice or references to the Internet Society or
Internet organizations, except as needed for the purpose
developing Internet standards in which case the procedures
copyrights defined in the Internet Standards process must
followed, or as required to translate it into languages other
English
The limited permissions granted above are perpetual and will not
revoked by the Internet Society or its successors or assigns
This document and the information contained herein is provided on
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED,
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE
Tamaru Informational [Page 6]
if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.
RFC documents can be found at I.E.T.F.
Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX