As per Relevance of the word extension, we have this rfc below:
Network Working Group M.
Request for Comments: 1554 Tokyo Institute of
Category: Informational K.
December 1993
ISO-2022-JP-2: Multilingual Extension of ISO-2022-
Status of this
This memo provides information for the Internet community. This
does not specify an Internet standard of any kind. Distribution
this memo is unlimited
This memo describes a text encoding scheme: "ISO-2022-JP-2", which
used experimentally for electronic mail [RFC822] and network
[RFC1036] messages in several Japanese networks. The encoding is
multilingual extension of "ISO-2022-JP", the existing encoding
Japanese [2022JP]. The encoding is supported by an Emacs
multilingual text editor: MULE [MULE].
The name, "ISO-2022-JP-2", is intended to be used in the "charset
parameter field of MIME headers (see [MIME1] and [MIME2]).
The text with "ISO-2022-JP-2" starts in ASCII [ASCII], and
to other character sets of ISO 2022 [ISO2022] through
combinations of escape sequences. All the characters are
with 7 bits only
At the beginning of text, the existence of an announcer sequence
"ESC 2/0 4/1 ESC 2/0 4/6 ESC 2/0 5/10" is (though omitted) assumed
Thus, characters of 94 character sets are designated to G0
invoked as GL. C1 control characters are represented with 7 bits
Characters of 96 character sets are designated to G2 and invoked
SS2 (single shift two, "ESC 4/14" or "ESC N").
For example, the escape sequence "ESC 2/4 2/8 4/3" or "ESC $ ( C
indicates that the bytes following the escape sequence are Korean
characters, which are encoded in two bytes each. The escape
"ESC 2/14 4/1" or "ESC . A" indicates that ISO 8859-1 is
to G2. After the designation, the single shifted sequence "ESC 4/14
4/1" or "ESC N A" is interpreted to represent a character "A
acute".
Ohta & Handa [Page 1]
RFC 1554 Multilingual Extension of ISO-2022-JP December 1993
The following table gives the escape sequences and the character
used in "ISO-2022-JP-2" messages. The reg# is the registration
in ISO's registry [ISOREG].
94 character
reg# character set ESC sequence designated
------------------------------------------------------------------
6 ASCII ESC 2/8 4/2 ESC ( B G
42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G
87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G
14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G
58 GB2312-1980 ESC 2/4 4/1 ESC $ A G
149 KSC5601-1987 ESC 2/4 2/8 4/3 ESC $ ( C G
159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G
96 character
reg# character set ESC sequence designated
------------------------------------------------------------------
100 ISO8859-1 ESC 2/14 4/1 ESC . A G
126 ISO8859-7(Greek) ESC 2/14 4/6 ESC . F G
For further information about the character sets and the
sequences, see [ISO2022] and [ISOREG].
If there is any G0 designation in text, there must be a switch
ASCII or to JIS X 0201-Roman before a space character (but
necessarily before "ESC 4/14 2/0" or "ESC N ' '") or
characters such as tab or CRLF. This means that the next line
in the character set that was switched to before the end of
previous line. Though the designation to JIS X 0201-Roman is
for backward compatibility to "ISO-2022-JP", its use is discouraged
Applications such as pagers and editors which randomly seek within
text file encoded with "ISO-2022-JP-2" may assume that all the
begin with ASCII, not with JIS X 0201-Roman
At the beginning of a line, information on G2 designation of
previous line is cleared. New designation must be given before
character in 96 character sets is used in the line
The text must end in ASCII designated to G0.
As the "ISO-2022-JP", and thus, "ISO-2022-JP-2", is designed
represent English and modern Japanese, left-to-right
is assumed if the text is displayed horizontally
Users of "ISO-2022-JP-2" must be aware that some common
such as old Bnews can not relay a 7-bit value 7/15 (decimal 127),
which is used to encode, say, "y with diaeresis" of ISO 8859-1.
Ohta & Handa [Page 2]
RFC 1554 Multilingual Extension of ISO-2022-JP December 1993
Other restrictions are given in the Formal Syntax section below
Formal
The notational conventions used here are identical to those used
STD 11, RFC 822 [RFC822].
The * (asterisk) convention is as follows
l*m
meaning at least l and at most m somethings, with l and m
default values of 0 and infinity, respectively
message = headers 1*(CRLF text
; see also [MIME1] "body-part
; note: must end in
text = *(single-byte-char /
g2-desig-seq /
single-shift-char
[*
reset-
*(single-byte-char /
g2-desig-seq /
single-shift-char ) ]
; note: g2-desig-seq
; precede single-shift-
headers =
segment = single-byte-segment / double-byte-
single-byte-segment = single-byte-
*(single-byte-char /
g2-desig-seq /
single-shift-char )
double-byte-segment = double-byte-
*((one-of-94 one-of-94) /
g2-desig-seq /
single-shift-char )
reset-seq = ESC "(" ( "B" / "J" )
single-byte-seq = ESC "(" ( "B" / "J" )
double-byte-seq = (ESC "$" ( "@" / "A" / "B" )) /
Ohta & Handa [Page 3]
RFC 1554 Multilingual Extension of ISO-2022-JP December 1993
(ESC "$" "(" ( "C" / "D" ))
g2-desig-seq = ESC "." ( "A" / "F" )
single-shift-seq = ESC "N
single-shift-char = single-shift-seq one-of-96
CRLF = CR
; ( Octal, Decimal.)
ESC = ; ( 33, 27.)
SI = ; ( 17, 15.)
SO = ; ( 16, 14.)
CR = ; ( 15, 13.)
LF = linefeed> ; ( 12, 10.)
one-of-94 = ; (41-176, 33.-126.)
one-of-96 = ; (40-177, 32.-127.)
7BIT = ; ( 0-177, 0.-127.)
single-byte-char =
including CRLF, and not including ESC, SI, SO
MIME
The name given to the character encoding is "ISO-2022-JP-2".
name is intended to be used in MIME messages as follows
Content-Type: text/plain; charset=iso-2022-jp-2
The "ISO-2022-JP-2" encoding is already in 7-bit form, so it is
necessary to use a Content-Transfer-Encoding header. It should
noted that applying the Base64 or Quoted-Printable encoding
render the message unreadable in non-MIME-compliant software
"ISO-2022-JP-2" may also be used in MIME headers. Both "B" and "Q
encoding could be useful with "ISO-2022-JP-2" text
Ohta & Handa [Page 4]
RFC 1554 Multilingual Extension of ISO-2022-JP December 1993
[ASCII] American National Standards Institute, "Coded character
-- 7-bit American national standard code for
interchange", ANSI X3.4-1986.
[ISO2022] International Organization for Standardization (ISO),
"Information processing -- ISO 7-bit and 8-bit
character sets -- Code extension techniques",
International Standard, Ref. No. ISO 2022-1986 (E).
[ISOREG] International Organization for Standardization (ISO),
"International Register of Coded Character Sets To Be
With Escape Sequences".
[MIME1] Borenstein, N., and N. Freed, "MIME (Multipurpose
Mail Extensions) Part One: Mechanisms for Specifying
Describing the Format of Internet Message Bodies", RFC 1521,
September 1993.
[MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
September 1993.
[RFC822] Crocker, D., "Standard for the Format of ARPA Internet
Messages", STD 11, RFC 1522, UDEL, August 1982.
[RFC1036] Horton M., and R. Adams, "Standard for Interchange
USENET Messages", RFC 1036, AT&T Bell Laboratories,
for Seismic Studies, December 1987.
[2022JP] Murai, J., Crispin, M., and E. van der Poel, "
Character Encoding for Internet Messages", RFC 1468,
1993.
[MULE] Nishikimi, M., Handa, K., and S. Tomura, "Mule:
Enhancement to GNU Emacs", Proc. of INET'93, August, 1993.
This memo is the result of discussion between various people in
news group: fj.kanji and is reviewed by a mailing list: jp-
@iij.ad.jp. The Authors wish to thank in particular Prof.
Wada for his suggestions based on profound knowledge in ISO 2022
related standards
Ohta & Handa [Page 5]
RFC 1554 Multilingual Extension of ISO-2022-JP December 1993
Security
Security issues are not discussed in this memo
Authors'
Masataka
Tokyo Institute of
2-12-1, O-okayama, Meguro-ku
Tokyo 152,
Phone: +81-3-5499-7084
Fax: +81-3-3729-1940
EMail: mohta@cc.titech.ac.
Ken'ichi
Electrotechnical
Umezono 1-1-4, Tsukuba
Ibaraki 305,
Phone: +81-298-58-5916
Fax: +81-298-58-5918
EMail: handa@etl.go.
Ohta & Handa [Page 6]
if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.
RFC documents can be found at I.E.T.F.
Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX