As per Relevance of the word represent, we have this rfc below:











Network Working Group K.
Request for Comments: 1342 University of
June 1992


Representation of Non-ASCII Text in Internet Message

Status of this

This RFC specifies an IAB standards track protocol for the
community, and requests discussion and suggestions for improvements
Please refer to the current edition of the "IAB Official
Standards" for the standardization state and status of this protocol
Distribution of this memo is unlimited



This memo describes an extension to the message format defined in [1]
(known to the IETF Mail Extensions Working Group as "RFC 1341"),
allow the representation of character sets other than ASCII in
822 message headers. The extensions described were designed to
highly compatible with existing Internet mail handling software,
to be easily implemented in mail readers that support RFC 1341.



RFC 1341 describes a mechanism for denoting textual body parts
are coded in various character sets, as well as methods for
such body parts as sequences of printable ASCII characters.
memo describes similar techniques to allow the encoding of non-
text in various portions of a RFC 822 [2] message header, in a
which is unlikely to confuse existing message handling software

Like the encoding techniques described in RFC 1341, the
outlined here were designed to allow the use of non-ASCII
in message headers in a way which is unlikely to be disturbed by
quirks of existing Internet mail handling programs. In particular
some mail relaying programs are known to (a) delete some
header fields while retaining others, (b) rearrange the order
addresses in To or Cc fields, (c) rearrange the (vertical) order
header fields, and/or (d) "wrap" message headers at different
than those in the original message. In addition, some mail
programs are known to have difficulty correctly parsing
headers which, while legal according to RFC 822, make use
backslash-quoting to "hide" special characters such as "<", ",",
or which exploit other infrequently-used features of
specification




Moore [Page 1]

RFC 1342 Non-ASCII Mail Headers June 1992


While it is unfortunate that these programs do not
interpret RFC 822 headers, to "break" these programs would
severe operational problems for the Internet mail system.
extensions described in this memo therefore do not rely on little
used features of RFC 822. Instead, certain sequences of "ordinary
printable ASCII characters (which are assumed to be unlikely
otherwise appear in message headers) are reserved for use as
data. The characters used in these encodings are restricted to
which do not have special meanings in the context in which
encoded text appears



An "encoded-word" is a sequence of printable ASCII characters
begins with "=?", ends with "?=", and has two "?"s in between.
specifies a character set and an encoding method, and also
the original text encoded as ASCII characters, according to the
for that encoding method

A mail composer that implements this specification will provide
means of inputing non-ASCII text in header fields, but will
these fields (or appropriate portions of these fields) into encoded
words before inserting them into the message header

A mail reader that implements this specification will
encoded-words when they appear in certain portions of the
header. Instead of displaying the encoded-word "as is", it
reverse the encoding and display the original text in the
character set

An "encoded-word" is more precisely defined by the following
grammar, using the notation of RFC 822:

encoded-word = "=" "?" charset "?" encoding "?" encoded-text "?" "="

charset = token ; legal charsets defined by RFC 1341

encoding = token ; Either "B" or "Q

token = 1*
tspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" /
<"> / "/" / "[" / "]" / "?" / "." / "="

encoded-text = 1*printable ASCII character other than "?"
; SPACE> (but see "Use of encoded-words in
; headers", below




Moore [Page 2]

RFC 1342 Non-ASCII Mail Headers June 1992


An encoded-word may not be more than 75 characters long,
charset, encoding, encoded-text, and delimiters. If it is
to encode more text than will fit in an encoded-word of 75
characters, multiple encoded-words (separated by SPACE or newline
may be used. Message header lines that contain one or more encoded
words should be no more than 76 characters long. NOTE:
restrictions are included not only to ease interoperbility
internetwork mail gateways, but also to impose a limit on the
of lookahead a header parser must employ (while looking for a
?= delimiter) before it can decide whether a token is an encoded-
or something else

Initially, the legal values for "encoding" are "Q" and "B".
encodings are described below. The "Q" encoding is recommended
use with Latin character sets, and the "B" encoding for all others
Nevertheless, a mail reader which claims to recognize encoded-
MUST be able to accept either encoding for any character set which
supports

Only a subset of the printable ASCII characters may be used
encoded-text. The SPACE character is not allowed, so that
beginning and end of an encoded-word are obvious. The "?"
is used within an encoded-word to separate the various portions
the encoded-word from one another, and thus cannot appear in
encoded-text portion. Other characters are also illegal in
contexts. For example, an encoded-word in a "phrase" preceeding
address in a From header field may not contain any of the "specials
defined in RFC 822. Finally, certain other characters are
in some contexts, to ensure reliability for messages that
through internetwork mail gateways

The "B" encoding automatically meets these requirements. The "Q
encoding allows a wide range of printable characters to be used
non-critical locations in the message header (e.g., Subject),
fewer characters available for use in other locations

The "B"

The "B" encoding is identical to the "BASE64" encoding defined by
1341.

The "Q"

The "Q" encoding is similar to the "Quoted-Printable" content
transfer-encoding defined in RFC 1341. It is designed to allow
containing mostly ASCII characters to be decipherable on an
terminal without decoding




Moore [Page 3]

RFC 1342 Non-ASCII Mail Headers June 1992


1. Any 8-bit value may be represented by a "=" followed by
hexadecimal digits. For example, if the character set in
were ISO-8859-1, the "=" character would thus be encoded
"=3D", and a SPACE by "=20".

2. The 8-bit hexadecimal value 20 (e.g., IS0-8859-1 SPACE) may
represented as "_" (underscore, ASCII 95.). (This character
not pass through some internetwork mail gateways, but its
will greatly enhance readability of "Q" encoded data with
readers that do not support this encoding.) Note that the "_"
always represents hexadecimal 20, even if the SPACE
occupies a different code position in the character set in use

3. 8-bit values which correspond to printable ASCII characters
than "=", "?", "_" (underscore), and SPACE may be represented
those characters. (But see "Use of encoded-words in
headers", below).

Character

In an encoded-word, the character set associated with the
text is specified by a charset. A charset can be any of
character set names allowed in an RFC 1341 "charset" parameter of
"text/plain" body part. (See section 7.1.1 of RFC 1341 for a list
valid charset parameters).

When there is a possibility of using more than one character set
represent the text in an encoded-word, and in the absence of
agreements between sender and recipients of a message, it
recommended that members of the ISO-8859-* series be used
preference to other character sets. Among the various ISO-8859-*
character sets, the lowest-numbered set which contains all of
required characters should be used

Use of encoded-words in message

A sequence of one or more encoded-words is used to represent non
ASCII textual data within a header field. An encoded-word must
separated from an adjacent encoded-word, "word", "text", "ctext",
"special" by a linear white-space character or a newline.
displaying a particular header field" (in the RFC 822 sense
containing one or more encoded-words, an unencoded SPACE
that immediately follows the encoded-word is not displayed.
newline that immediately follows an encoded-word is not
unless the encoded-word is the last token in that "field". (This
to allow the use of multiple encoded-words to represent long
of unencoded text, without having to separate encoded-words
spaces occur in the unencoded text.)



Moore [Page 4]

RFC 1342 Non-ASCII Mail Headers June 1992


An encoded-word may appear in a message header or body part
according to the following rules

- An encoded-word may replace a "text" token (as defined by RFC 822) in
(1) a Subject or Comments header field, (2) any extension
header field, (3) any user-defined message header field, or (4)
RFC 1341 body part header field (such as Content-Description)
which the field body contains only "text"s

- An encoded-word may appear within a comment delimited by "(" and ")",
i.e., wherever a "ctext" is allowed. More precisely, the RFC 822
definition for "comment" is amended as follows

comment = "(" *(ctext / quoted-pair / comment / encoded-word) ")"

A "Q"-encoded encoded-word which appears in a comment MUST NOT
the characters "(", ")" or "\".

- As a replacement for a "word" entity within a "phrase", for example
one that precedes an address in a From, To, or Cc header. The
definition for phrase from RFC 822 thus becomes

phrase = 1*(encoded-word / word

In this case the set of characters that may be used in a "Q"-
encoded-word is restricted to: decimal digits, "!", "*", "+", "-", "/", "=", and "_" (underscore
ASCII 95.)>.

These are the ONLY locations where an encoded-word may appear.
particular, an encoded-word MUST NOT appear in any portion of
"address". In addition, an encoded-word MUST NOT be used in
Received header field

Whenever such words appear in a header being displayed, an
mail reader will decode the text and render it appropriately

Only textual data (printable and white space characters) should
encoded using this scheme. However, since these encoding
allow the encoding of arbitrary 8-bit values, mail readers
implement this decoding should also ensure that display of
decoded data on the recipient's terminal will not cause
side-effects

Use of these methods to encode non-textual data (e.g., pictures
sounds) is not defined by this memo. Use of encoded-words
represent strings of purely ASCII characters is allowed,
discouraged



Moore [Page 5]

RFC 1342 Non-ASCII Mail Headers June 1992


Recognition of encoded-words in message headers

An encoded-word may be distinguished from an ordinary "word", "text",
or "ctext", as follows: An encoded-word begins with "=?", ends
"?=", contains exactly four "?" characters including the delimiters
and is followed by a SPACE or newline. If the "word", "text",
"ctext" does not meet the above tests, it should be displayed as
appears in the message header

If the mail reader does not support the character set used, it
either display the encoded-word as ordinary text (i.e., as it
in the header), or it may substitute an appropriate
indicating that the decoded text could not be displayed



A mail composing program claiming compliance with this
MUST ensure that any string of printable ASCII characters in
message header that begins with "=?" and ends with "?=" be a
encoded-word

A mail reading program claiming compliance with this
must be able to distinguish encoded-words from "text", "ctext",
"word"s anytime they appear in appropriate places in message headers
The program must be able to display unencoded text if the
set is "US-ASCII". For the ISO-8859-* character sets, the
reading program must at least be able to display the characters
are also in the ASCII set



From: =?US-ASCII?Q?Keith_Moore?= To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=

From: =?ISO-8859-1?Q?Olle_J=E4rnefors?= To: ietf-822@dimacs.rutgers.edu, ojarnef@admin.kth.
Subject: Time for ISO 10646?

To: Dave Crocker stanford.edu
Cc: ietf-822@dimacs.rutgers.edu, paf@comsol.
From: =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= Subject: Re: RFC-HDR care and






Moore [Page 6]

RFC 1342 Non-ASCII Mail Headers June 1992


From: Nathaniel Borenstein (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=)
To: Greg Vaudreuil , Ned
,
Keith Moore Subject: Test of new header
MIME-Version: 1.0
Content-type: text/plain; charset=ISO-8859-1



[1] Borenstein N., and N. Freed, "MIME (Multipurpose Internet
Extensions): Mechanisms for Specifying and Describing the
of Internet Message Bodies", RFC 1341, Bellcore, Innosoft
June 1992.

[2] Crocker, D., "Standard for the Format of ARPA Internet
Messages", RFC 822, UDEL, August 1982.

Security

Security issues are not discussed in this memo

Author's

Keith
University of
107 Ayres
Knoxville TN 37996-1301

EMail: moore@cs.utk.




















Moore [Page 7]







if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.




RFC documents can be found at I.E.T.F.



Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX







Spectrum