As per Relevance of the word information, we have this rfc below:











Network Working Group Y.
Request for Comments: 1842 AsiaInfo Services Inc
Category: Informational Y.
Harvard Univ
J.
Rice Univ
J.
AsiaInfo Services Inc
Y.
Univ. of
August 1995


ASCII Printable Characters-Based Chinese Character
for Internet

Status of this

This memo provides information for the Internet community. This
does not specify an Internet standard of any kind. Distribution
this memo is unlimited



This document describes the encoding used in electronic mail [RFC822]
and network news [RFC1036] messages over the Internet. The 7-
representation of GB 2312 Chinese text was specified by Fung Fung
of Stanford University [Lee89] and implemented in various
packages under different platforms (see appendix for a partial
of the available software packages that support this
method). It is further tested and used in the usenet
alt.chinese.text and chinese.* as well as various other
forums with considerable success. Future extensions of this
method can accommodate additional GB character sets and other
asian language character sets [Wei94].

The name given to this encoding is "HZ-GB-2312", which is intended
be used in the "charset" parameter field of MIME headers (see [MIME1]
and [MIME2]).












Wei, et al Informational [Page 1]

RFC 1842 ASCII/Chinese Character Encoding August 1995


Table of

1. Introduction................................................ 2
2. Description................................................. 3
3. Formal Syntax............................................... 4
4. MIME Considerations......................................... 5
5. Background Information...................................... 5
6. References.................................................. 6
7. Acknowledgements............................................ 6
8. Security Considerations..................................... 7
9. Authors' Addresses.......................................... 7
10. Appendix: List of Software Implementing HZ Representation... 9

1.

Chinese (and other east Asia languages) characters are encoded
multiple bytes to guarantee sufficient coding space for the
number of glyphs these languages contain. With the prolification
internetwork traffic around the world, it becomes necessary to
ways to facilitate the transfer of text in multiple-byte character
set languages (hereafter as Chinese text) over internet

There are two layers of concerns need to be addressed by
mechanism whose purpose is to transfer Chinese text over internet
The first is on application layer, in which concerned
should be able to recognize the encoding of the text and/or
different character sets which might be mixed in the text and
it accordingly. The second layer is the actual transport of
text between point A to point B over the Internet. Because
prevailing mail transport protocol used over internet, the
Mail Transport Protocol (aka. SMTP) was designed originally for
character set only, many internet mail agents are not 8 bit clean
therefore introduce challenges for any attempt to actually
a mechanism for the transport of Chinese text over internet

Here we describe a mechanism for transmission of Chinese text over
network. This described mechanism has being implemented by
software package dealing with multi-language support and has
tested on USENET newsgroups and other types of internet forums
the last two years. The test results shows that the HZ
can pass through almost all existing mail delivery agents
being corrupted. The HZ representation currently handles GB2312-80
Chinese character set only. Further expansion to other
encoding systems and to other East Asia Language is
consideration






Wei, et al Informational [Page 2]

RFC 1842 ASCII/Chinese Character Encoding August 1995


2.

For an arbitrary mixed text with both Chinese coded text strings
ASCII text strings, we designate to two distinguishable text modes
ASCII mode and HZ mode, as the only two states allowed in the text
At any given time, the text is in either one of these two modes or
the transition from one to the other. In the HZ mode, only
ASCII characters (0x21-0x7E) are meanful with the size of basic
unit being two bytes long

In the ASCII mode, the size of basic text unit is one (1) byte
the exception '~~', which is the special sequence representing
ASCII character '~'. In both ASCII mode and HZ mode, '~' leads
escape sequence. However, as HZ mode has basic size of text
being 2 bytes long, only the '~' character which appears at the
byte of the the two-byte character frame are considered as the
of an escape sequence

The default mode is ASCII mode. Each line of text starts with
default ASCII mode. Therefore, all Chinese character strings are
be enclosed with '~{' and '~}' pair in the same text line

The escape sequences defined are as the following

~{ ---- escape from ASCII mode to GB2312 HZ
~} ---- escape from HZ mode to ASCII
~~ ---- ASCII character '~' in ASCII
~\n ---- line continuation in ASCII
~[!-z|] ---- reserved for future HZ mode character


A few examples of the 7 bit representation of Chinese GB coded
taken directly from [Lee89] are listed as the following

Example 1: (Suppose there is no line size limit.)
This sentence is in ASCII
The next sentence is in GB.~{<:Ky2;S{#,NpJ)l6HK!#~}Bye

Example 2: (Suppose the maximum line size is 42.)
This sentence is in ASCII
The next sentence is in GB.~{<:Ky2;S{#,~}~
~{NpJ)l6HK!#~}Bye

Example 3: (Suppose a new line is started for every mode switch.)
This sentence is in ASCII
The next sentence is in GB.~
~{<:Ky2;S{#,NpJ)l6HK!#~}~
Bye



Wei, et al Informational [Page 3]

RFC 1842 ASCII/Chinese Character Encoding August 1995


3. Formal

The notational conventions used here are identical to those used
RFC 822 [RFC822].

The * (asterisk) convention is as follows

l*m

meaning at least l and at most m somethings, with l and m
default values of 0 and infinity, respectively


message = headers 1*( CRLF *single-byte-char *
single-byte-seq *single-byte-char )
; see also [MIME1] "body-part
; note: must end in

headers =

segment = single-byte-segment / double-byte-

single-byte-segment = 1*single-byte-

double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )

single-byte-seq = "~}"

double-byte-seq = "~{"

CRLF = CR
; ( Octal, Decimal.)

CR = ; ( 15, 13.)

LF = linefeed> ; ( 12, 10.)

one-of-94 = ; (41-176, 33.-126.)

single-byte-char = including CRLF, not including > / "~~">;

7BIT = ; ( 0-177, 0.-127.)








Wei, et al Informational [Page 4]

RFC 1842 ASCII/Chinese Character Encoding August 1995


4. MIME

The name given to the HZ character encoding is "HZ-GB-2312".
name is intended to be used in MIME messages as follows

Content-Type: text/plain; charset=HZ-GB-2312

The HZ-GB-2312 encoding is already in 7-bit form, so it is
necessary to use a Content-Transfer-Encoding header

5. Background

A GB code is a two byte character withe the first byte is in
range of 0x21-0x77 and the second byte in the range 0x21-0x7E. As
printable ASCII subset of characters are single byte character in
range of 0x21--0x7E, two printable ASCII characters can represent
two byte GB coded Chinese character if proper escape sequence is
to indicate the proper text mode. This form the base of the
described HZ 7-bit representation methods. Further, with the use of
printable ASCII character, '~', as the leading byte of the
sequence, the HZ representation eliminated the need of reserving
non-printable ASCII characters, which are commonly used
application programs (as well as system environment) for
control function or other special signaling. Therefore, the
representation method described here posses the least probability
interfering with the host and network environment. This is also
convenient for application for implementing the HZ coding method

HZ representation method has been implemented in various
software across computer hardware platforms. It has also being
for more than two years over USENET newsgroups, alt.chinese.text
chinese.*, for the transmission of Chinese texts over the internet
The original points of those transferred Chinese texts
geographically scattered around the world and under the
of vast different system and network environments. Therefore, such
test group may well represent a rather complete sample of the
internet world. The successful test of the HZ representation
therefore builds up the confidence that it is well suited
transmitting multi-byte text messages over the internet

Under HZ representation, ASCII text remain as 7-bit characters
therefore HZ representation together with the 7-bit ASCII
set can be viewed as forming a superset of characters








Wei, et al Informational [Page 5]

RFC 1842 ASCII/Chinese Character Encoding August 1995


6.

[ASCII] American National Standards Institute, "Coded character
-- 7-bit American national standard code for
interchange", ANSI X3.4-1986.

[GB 2312] Technical Administrative Bureau of P.R.China, "Coding
Chinese Ideogram Set for Information Interchange Basic Set",
GB 2312-80.

[Lee89] Lee, F., "HZ - A Data Format for Exchanging Files
Arbitrarily Mixed Chinese and ASCII characters", RFC 1843,
Stanford University, August 1995.

[MIME1] Borenstein N., and N. Freed, "MIME (Multipurpose
Mail Extensions) Part One: Mechanisms for Specifying and
the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft
September 1993.

[MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions
Part Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
University of Tennessee, September 1993.

[RFC822] Crocker, D., "Standard for the Format of ARPA
Text Messages", STD 11, RFC 822, UDEL, August 1982.

[RFC1036] Horton M., and R. Adams, "Standard for Interchange
USENET Messages", RFC 1036, AT&T Bell Laboratories, Center
Seismic Studies, December 1987.

[Wei94] Wei, Yagui, "A Proposal for a Consolidated Collection
East Asian Language Coding Standards Using Solely ASCII
Characters", June 30, 1994.

7.

Many people have involved the design and specification of the HZ 7-
bit Chinese representation system at different stages. Most
among them are Ed Lai, Chunqing Cheng, Fung Fung Lee, and
Yeung. This document is merely a recollection of thoughts and
made collectively by this group of people whose devotion has led
the current success of the HZ Chinese representation over
Internet. Further, the authors wish to thank AsiaInfo Services Inc
for sponsoring the preparation of this document and for
the communication need to refine this document






Wei, et al Informational [Page 6]

RFC 1842 ASCII/Chinese Character Encoding August 1995


8. Security

Security issues are not discussed in this memo

9. Authors'

Ya-Gui
AsiaInfo Services Inc
One Galleria
13355 Noel Rd. Suite 1340
Dallas, TX 75240

Phone: (214) 788-4141
Fax: (214) 788-0729
EMail: HZRFC@usai.asiainfo.


Yun Fei

Harvard
MS 66
60 Garden St
Cambridge, MA 02138

Phone: (617)-860-9444
EMail: zhang@orion.harvard.


Jian Q.
Rice
ONS - MS 119
P.O. Box 1892
Houston, Texas 77251-1892

Phone: (713)285-5328
EMail: jian@is.rice.















Wei, et al Informational [Page 7]

RFC 1842 ASCII/Chinese Character Encoding August 1995


Jian
ISTIC Bldg, Room 431
15 Fuxing Road
Beijing, China 100038

Phone: 86 10 853-7120
Fax: 86 10 853-7123
EMail: ding@Beijing.AsiaInfo.


Yuan
Electrical Engineering
University of
College Park, MD 200742

Phone: 301-405-3729
EMail: yjj@eng.umd.


































Wei, et al Informational [Page 8]

RFC 1842 ASCII/Chinese Character Encoding August 1995


10. Appendix: List of Software Implementing HZ

In the following, we compiled a list on software packages support
HZ Chinese representation method. Though this list is far
complete, it is visible that support for HZ representation has
implemented for major hardware and software platforms. For
information on the listed software packages (and for
information pertain to Chinese computing), please refer to
internet site: ftp://ftp.ifcss.org/pub/software/ or its mirrors
the following sites

at Beijing, China: ftp://info.bta.net.cn:/pub/software/;
at Shanghai, China: ftp://info.bta.net.cn:/pub/software/;
at Taiwan: ftp://nctuccca.edu.tw/pub/Chinese/ifcss/;
or ftp://ftp.edu.tw:/Chinese/ifcss/software/;
At Singapore: ftp://ftp.technet.sg:/pub/chinese/;
at California, U.S.A.: ftp://cnd.org/pub/software/.

The software in the next section are listed by its name and
by the current version number, release date (in parenthesis) and
author(s) of the software. A brief description of the
of the software starts at the line immediately after the headline
lead by character string "--". Two consecutive packages are
by a blank line

zwdos (V2.2, March 5, 1993) by Wei Ya-
-- MS-DOS kernal extension that gives DOS text mode programs
ability to enter, display, manipulate and print 'zW' and
Chinese text. Small memory requirement. Supports EGA
VGA or Hercules Monographic displays

HZ (V2.0, Feb. 7, 1995) by Fung F.
-- Conversion from HZ to GB, GB to HZ, and zW to HZ respectively
Versions for PC, Mac and Unix exist

XingXing (V4.2, Mar 29. 1995) by Wang
-- chinese word processor for PC

NJStar (V3.00, Feb. 10, 1994 by Hongbo Ni
-- GB Word Processor (Viewer, editor, printing, converter
Supports EGA/(mono)VGA/SuperVGA monitors, and
printers, Chinese<->English dictionary lookup,
and glossary; Includes more than 20 Chinese input
with Intelligent LianXiang and fuzzy Pinyin; Speed up
sentence based Pinyin; Reads and writes GB,Hz,zW & Big5 files
DOS Shell; Configurable





Wei, et al Informational [Page 9]

RFC 1842 ASCII/Chinese Character Encoding August 1995


QuickStar (V3.0, June 7, 1995) by Anthony
-- Compact size Chinese edit software for PC. PinYin, CiZu
WuBi, GuoBiao, ASCII etc input method. Translate to/from GB
HZ and Big5 coded Chinese files

cnprint (V2.6, Jan. 25, 95) by Yidao
-- print GB/Hz/BIG5/JIS/KSC/UTF8 etc or convert to
(conforms to EPSF-3.0). Both DOS and UNIX version available

dm24 (V2.0, Sept. 1993) by Gongquan Chen
-- Chinese GB/HZ printing program for EPSON 24pin

HXLASER (V2.6, Feb. 1994) by Chen,
-- A GB/HZ/BIG5 file printing program for HP LaserJet plus
later model printers

CNVIEW (V3.0, Jan. 1, 1995) by Jifang
-- View GB/Hz/Big5 encoded Chinese text file on IBM-
&

ZWLIST (V1.1, Nov. 24, 1993) by Gongquan
-- Chinese HZ/GB/BIG5 File Browser for

zwTool (V1.0, Oct. 30,1993) by Gongquan
-- a MSDOS TSR program for input of Chinese characters in
mode; Developed primarily for Chinese programmers using
(Integrated Development Environment, like Borland's
languages); Supports GB/HZ; EGA/VGA required

DateStar (V1.1) by Youzhen
-- Chinese Calendar Producer. Displays Chinese and
calendar in ASCII code, BIG-5 code, GuoBiao code (
Standard), and HZ code (Network

MacViewHZ (V2.21 Dec. 93) by Xiaodong
-- Display and print GB/HZ or BIG5 coded Chinese text files
Macintosh without Chinese OS system, with easy to use
user interface including multiple windows and simple
features such as delete, copy, cut and paste

MacHZTerm (V0.52) by Xin
-- a communication program using CommToolBox, capable
displaying GB, HZ, Big5 texts on line. No Chinese OS required
System 7 recommended

HanziTerm (V0.5) by Ricky
-- A terminal emulator for Mac Chinese OS 6.0.x or later
Support 8-bit character code, HZ, and zW



Wei, et al Informational [Page 10]

RFC 1842 ASCII/Chinese Character Encoding August 1995


Tex-Edit-HZ (V1.0, Dec. 18 1993) by Tom Bender and Tie Zeng
-- A MAC WorldScript savvy Text editor with HZ<->GB
feature


MacBlue Telnet (V2.6.6, Feb 16, 1995) by
-- A Telnet program that can handle all Chinese
(such as HZ, GB, Big5, ET etc), EUC-JIS and EUC-KSC; based
NCSA Telnet with built-in hanzi input methods

rnMac (V1.3b5) by Roy
-- Offline Newsreader including GB <-> HZ

Weiqi267 (V2.67) by Xiangbo
-- record Weiqi games and transfer them through net
GB, HZ 100 % compatible (but Russian char disabled).
There is a user guide in HZ coding
* Now can also be used for Chinese Chess

TwinBridge (V3.2, Nov. 16, 1994) by Twinbridge Software
-- an interface between Windows and applications, it
Chinese character processing in Windows applications
Word for Windows, Ami Pro, Excel, etc
You can edit Chinese characters like English
in most of applications

WinHZ (V1.1, April 13, 1995) by Tian
-- HZ extension for Chinese systems for

HZcomm (V1.5, Nov. 14, 1993) by Nick Ke Ning
-- HZ coding supported communication program under
Windows System (GB internal coded). Good for reading/
HZ coded E-mail and news(alt.chinese.text) on line
Windows 3.1 for PCs

SimpTerm (V0.8.0) by Jianqing
-- A Chinese communication program for MS-Windows 3.1
with build in support for BIG5, HZ and GB encoded text

ChPad (V1.31) by Tian
-- GUO BIAO and HZ file browser for MS WINDOWS 3.1

SilkRoad (V1.0) by Antony C.
-- GB/HZ Viewer for MS-Windows 3.1

gnus-chinese (V1.0, Apr. 26 1994) by Ning Mosberger-
-- convert HZ articles to the code understandable by
terminal automatically in GNUS newsreader (for GNU EMACS).



Wei, et al Informational [Page 11]

RFC 1842 ASCII/Chinese Character Encoding August 1995


requires conversion program (e.g. hz2gb and gb2hz) to do
actual conversion

irchat (V2.4jp4cn0) by HIROSE
-- irc client e-lisp program on
patched to handle HZ and Big
now we can read/write all JIS/HZ/Big5 simultaneously on

hztty (V2.0 Jan 29, 1994) by Yongguang
-- This program turns a tty session from one encoding to another
For example, running hztty on cxterm can allow you
read/write Chinese in HZ format

BeTTY/CCF/B5Encode package (V1.534, 1995.03.22) by Jing-Shin
-- a chinese code conversion package for codes widely
in Taiwan and the GB code widely used in Mainland,
a 7-bit Big5 encoding method (B5Encode3/B5E3, an
to HZ encoding for GB),
including off-line converters (CCF/Chinese Code Filters
B5E/B5Encode) and an on-line converter (BeTTY) which
your native chinese terminal to become aware of the
systems widely used in Taiwan and GB, HZ encoding

gb2jis & jis2gb (V1.5, 1995.5.11) by Koichi
-- convert GB (or HZ) to/from JIS with two-letter


gb2ps (V2.02) by Wei
-- convert GB/HZ to postscript, supports simple page
(change chinese fonts and font size, cover page,
number, etc). Five chinese fonts are provided in
release, they are Song, Kai, Fang Song, Hei and
The HZ ENCODING is also supported

ChiRK (V1.2a) by Bo
-- GB/HZ/BIG5 text viewer on terminals (or emulations)
of displaying Tektronics 401x graphics, such as GraphOn,
VT240/330, Xterm, Tektool on Sun, EM4105 on PC
VersaTerm-Pro on Mac, etc

Multi-Localization Enhancement of NCSA Mosaic X 2.4 (V2.4.0)
by TAKADA,
-- a patch to make use of various nat'l character sets in
Mosaic for X 2.4. You can switch between char-sets in
Mosaic. Support ISO 8859-X, KOI-8, GB, HZ, BIG5, KSC & JIS






Wei, et al Informational [Page 12]








if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.




RFC documents can be found at I.E.T.F.



Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX







Spectrum