As per Relevance of the word standard, we have this rfc below:











Network Working Group M.
Request For Comments: 1815 Tokyo Institute of
Category: Informational July 1995


Character Sets ISO-10646 and ISO-10646-J-1

Status of this

This memo provides information for the Internet community. This
does not specify an Internet standard of any kind. Distribution
this memo is unlimited



Though the ISO character set standard of ISO 10646 is
reasonably well about European characters, it is not so useful in
fully internationalized environment

For the practical use of ISO 10646, a lot of external profiling
as restriction of characters, restriction of combination
characters and addition of language information is necessary

This memo provides information on such profiling, along with
names to each profiled instance

Though all the effort is done to make the resulting charset as
10646 based charset as possible, the result is not so good. So,
charsets defined in this memo are only for reference purpose and
use for practical purpose is strongly discouraged



This memo describes two text encoding schemes based on ISO 10646
[10646].

As ISO 10646 specifies too little about how text is visualized,
practically use ISO 10646, it is necessary to restrict the
minimally and then add some amount of profiling information

For ISO 2022 [ISO2022] based national standards, sufficient
information is provided by national standardization bodies, but,
ISO 10646, such a profiling is not yet provided

As the profiling of ISO 10646 largely affects which character
combination of characters could be properly displayed, changes
profiling of ISO 10646 are as significant as additions of
character sets of ISO 2022.



M. Ohta Informational [Page 1]

RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995


That is, it's impractical to support the entirety of ISO 10646 (
restriction or profiling can always be added), so a client needs
know whether some restriction or profiling is being used before
can decide whether to display the body part. Thus, it is necessary
provide multiple charset names to each variation of ISO 10646.

For example, in Japan with Japanese windows NT, only those
characters already supported by MS Kanji code (mostly equivalent
JIS X 0208 [JISX0208]) can be displayed, because no other
pattern is commonly provided

The other problem of ISO 10646 for Han characters is that, to
them in quality required for daily plain text processing
China/Japan/Korea, it is necessary to add profiling information
which one of Chinese/Japanese/Korean the text is using. It should
noted that this feature makes multilingual
Chinese/Japanese/Korean text with ISO 10646 impractical

Also, just as [RFC1521] was unclear about how bi-
should be supported with "ISO-8859-6" and "ISO-8859-8" which
corrected by [RFC1556], it is also unclear how bi-
could be supported with ISO 10646. There are too much ways
support bi- directionality. So, until some bi-
mechanism(s) becomes widely supported, it is necessary to
characters for languages which requires bi-directionality
from the minimal variation. It should be noted that, though
10646 is intended to be free from long term states, save for
profiling information, introduction of bi-directionality with
10646 do requires the long term states

Combining characters also cause problems. In many countries
combining characters based on [ISO2022] is used, there
restrictions on how combining characters are ordered [TIS].
such restriction, the result of combination is completely
which is the current state of ISO 10646. That is, if
combination is allowed in some implementation while the other
not support it, communication between them is difficult unless
10646 is profiled to be least common set of widely
combinations. So, again, until combination restriction will
developed for each language, it is necessary to exclude
for such languages from the minimal variation

Conjoining characters also, may or may not be supported,
requires another profiling

According to those considerations, this memo defines two
of ISO 10646. They are "ISO-10646" as the minimal basic variation
"ISO-10646-J-1" as the variation which could be useful in Japan



M. Ohta Informational [Page 2]

RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995


Finally, this memo, by no means, promotes the use of ISO 10646 on
Internet. It's use is strongly discouraged, when there are
charsets which can encode the same information, Families of ISO 10646
based charsets, like ISO 2022 based charsets, only forms set
mutually incompatible encoding systems and, unlike ISO 2022
charsets [2022INT], they can not be merged together to be the
world wide charset

Description of "ISO-10646"

ISO-10646 is profiled to be the most basic part of the family
encodings based on ISO 10646 and contains the following
graphic characters

collection number and name positions further
------------------------------------------------------------------
1 BASIC LATIN 0020-007
2 LATIN-1 SUPPLEMENT 00A0-00

C0 and C1 control characters may also be used as specified in
section 16 of ISO 10646.

The text with "ISO-10646" encodes text in 16 bit big endian form

As no combining characters are included, "ISO-10646" can be used
applications at implementation level 1.

Left-to-right directionality should be used

The encoding is implemented by Windows/NT

For practical communication, use of "ISO-10646" is discouraged
"ISO-8859-1" [RFC1345] should be used instead


















M. Ohta Informational [Page 3]

RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995


Description of "ISO-10646-J-1"

ISO-10646-J-1 is profiled to be useful for Japanese PC users who
Japanese version of Windows/NT and contains the following
characters

collection number and name positions further
------------------------------------------------------------------
1 BASIC LATIN 0020-007
2 LATIN-1 SUPPLEMENT 00A0-00
8 BASIC GREEK 0370-03
10 CYRILLIC 0400-04
32 GENERAL PUNCTUATION 2000-206F See note 1, below
39 MATHEMATICAL OPERATORS 2200-22FF See note 1, below
44 BOX DRAWING 2500-257
49 CJK SYMBOLS AND PUNCTUATION 3000-303F See note 1, below
50 HIRAGANA 3040-309
51 KATAKANA 30A0-30
60 CJK UNIFIED IDEOGRAPHS 4E00-9FFF See note 1, below
62 CJK COMPATIBILITY IDEOGRAPHS F900-FAFF See note 1, below
66 CJK COMPATIBILITY FORMS FE30-FE4
69 HALFWIDTH AND FULLWIDTH FORMS FF00-

Note 1: Most of the characters are excluded. That is, only
characters of JIS X 0208 [JISX0208] are included. The reason is
the Japanese version of Windows/NT have fonts for them only and
of the users can not read messages which contains other characters

C0 and C1 control characters may also be used as specified in
section 16 of ISO 10646.

The text with "ISO-10646-J-1" encodes text in 16 bit big endian form

Shapes of Han characters should be of Japanese Han, that is, those
column "J" in section 26 of ISO 10646.

As no combining characters are included, "ISO-10646-J-1" can be
with applications at implementation level 1.

Characters in "HALFWIDTH AND FULLWIDTH FORMS" compared to
different characters to the normal width characters

When text is displayed horizontally, left-to-right
should be used

For practical communication, use of "ISO-10646-J-1" is discouraged
ISO-2022-JP" [2022JP] should be used instead




M. Ohta Informational [Page 4]

RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995


MIME

The names given to the character encoding methods described in
memo are, respectively, "ISO-10646" and "ISO-10646-J-1". This
is intended to be used in MIME messages as follows

Content-Type: text/plain; charset=iso-10646

The ISO-10646 and ISO-10646-J-1 encoding are in 16-bit form, so it
often necessary to use a Content-Transfer-Encoding header. Base64
should be useful

The ISO-10646 and ISO-10646-J-1 may also be used in MIME Part 2
headers [RFC1522]. The "B" encoding should be used with them



[10646] International Organization for Standardization (ISO),
"Universal Multiple-Octet Coded Character Set (UCS)",
International Standard, Ref. No. ISO/IEC 10646-1:1993
(E).

[2022INT] (An Internet Draft "draft-ohta-text-encoding-*.txt"
be available).

[2022JP] Murai, J., Crispin, M., and E. van der Poel, "
Character Encoding for Internet Messages", RFC 1468,
1993.

[ISO2022] International Organization for Standardization (ISO),
"Information processing -- ISO 7-bit and 8-bit
character sets -- Code extension techniques",
International Standard, Ref. No. ISO 2022-1986 (E).

[JISX0208] Japanese Standards Association, "Code of the
graphic character set for information interchange", JIS
0208-1990.

[RFC1345] Simonsen, K., "Character Mnemonics & Character Sets",
RFC-1345, Rationel Almen Planlaegning, June 1992.

[RFC1521] Borenstein, N., and Freed, N., "MIME (
Internet Mail Extensions) Part One: Mechanisms
Specifying and Describing the Format of Internet
Bodies", RFC 1521, September 1993.






M. Ohta Informational [Page 5]

RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995


[RFC1522] Moore, K., "MIME (Multipurpose Internet Mail Extensions
Part Two: Message Header Extensions for Non-ASCII Text",
RFC 1522, September 1993.

[RFC1556] Nussbacher, H., "Handling of Bi-directional Texts
MIME" RFC 1556, Israeli Inter-University Computer Center
December 1993.

[TIS] Thai Industrial Standard for Thai Character Code
Computer, TIS 620-2533:1990.

Security

Security issues are not discussed in this memo

Author's

Masataka
Tokyo Institute of
2-12-1, O-okayama, Meguro-ku
Tokyo 152,

Phone: +81-3-5499-7084
Fax: +81-3-3729-1940
EMail: mohta@cc.titech.ac.


























M. Ohta Informational [Page 6]








if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.




RFC documents can be found at I.E.T.F.



Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX







Spectrum