As per Relevance of the word internet, we have this rfc below:











Network Working Group N. Borenstein,
Request for Comments: 1341 N. Freed,
June 1992



MIME (Multipurpose Internet Mail Extensions):


Mechanisms for Specifying and
the Format of Internet Message


Status of this

This RFC specifies an IAB standards track protocol for
Internet community, and requests discussion and
for improvements. Please refer to the current edition
the "IAB Official Protocol Standards" for
standardization state and status of this protocol
Distribution of this memo is unlimited



RFC 822 defines a message representation protocol
specifies considerable detail about message headers,
which leaves the message content, or message body, as
ASCII text. This document redefines the format of
bodies to allow multi-part textual and non-textual
bodies to be represented and exchanged without loss
information. This is based on earlier work documented
RFC 934 and RFC 1049, but extends and revises that work
Because RFC 822 said so little about message bodies,
document is largely orthogonal to (rather than a
of) RFC 822.

In particular, this document is designed to
facilities to include multiple objects in a single message
to represent body text in character sets other than US
ASCII, to represent formatted multi-font text messages,
represent non-textual material such as images and
fragments, and generally to facilitate later
defining new types of Internet mail for use by
mail agents

This document does NOT extend Internet mail header fields
permit anything other than US-ASCII text data. It
recognized that such extensions are necessary, and they
the subject of a companion document [RFC -1342].

A table of contents appears at the end of this document






Borenstein & Freed [Page i







1

Since its publication in 1982, RFC 822 [RFC-822] has
the standard format of textual mail messages on
Internet. Its success has been such that the RFC 822
has been adopted, wholly or partially, well beyond
confines of the Internet and the Internet SMTP
defined by RFC 821 [RFC-821]. As the format has seen
use, a number of limitations have proven
restrictive for the user community

RFC 822 was intended to specify a format for text messages
As such, non-text messages, such as multimedia messages
might include audio or images, are simply not mentioned
Even in the case of text, however, RFC 822 is inadequate
the needs of mail users whose languages require the use
character sets richer than US ASCII [US-ASCII]. Since
822 does not specify mechanisms for mail containing audio
video, Asian language text, or even text in most
languages, additional specifications are

One of the notable limitations of RFC 821/822 based
systems is the fact that they limit the contents
electronic mail messages to relatively short lines
seven-bit ASCII. This forces users to convert any non
textual data that they may wish to send into seven-bit
representable as printable ASCII characters before
a local mail UA (User Agent, a program with which
users send and receive mail). Examples of such
currently used in the Internet include pure hexadecimal
uuencode, the 3-in-4 base 64 scheme specified in RFC 1113,
the Andrew Toolkit Representation [ATK], and many others

The limitations of RFC 822 mail become even more apparent
gateways are designed to allow for the exchange of
messages between RFC 822 hosts and X.400 hosts. X.400 [X400]
specifies mechanisms for the inclusion of non-textual
parts within electronic mail messages. The
standards for the mapping of X.400 messages to RFC 822
messages specify that either X.400 non-textual body
should be converted to (not encoded in) an ASCII format,
that they should be discarded, notifying the RFC 822
that discarding has occurred. This is clearly undesirable
as information that a user may wish to receive is lost
Even though a user's UA may not have the capability
dealing with the non-textual body part, the user might
some mechanism external to the UA that can extract
information from the body part. Moreover, it does not
for the fact that the message may eventually be
back into an X.400 message handling system (i.e., the X.400
message is "tunneled" through Internet mail), where
non-textual information would definitely become
again




Borenstein & Freed [Page 1]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


This document describes several mechanisms that combine
solve most of these problems without introducing any
incompatibilities with the existing world of RFC 822 mail
In particular, it describes

1. A MIME-Version header field, which uses a version
to declare a message to be conformant with
specification and allows mail processing agents
distinguish between such messages and those
by older or non-conformant software, which is
to lack such a field

2. A Content-Type header field, generalized from RFC 1049
[RFC-1049], which can be used to specify the type
subtype of data in the body of a message and to
specify the native representation (encoding) of
data

2.a. A "text" Content-Type value, which can be used
represent textual information in a number
character sets and formatted text
languages in a standardized manner

2.b. A "multipart" Content-Type value, which can
used to combine several body parts, possibly
differing types of data, into a single message

2.c. An "application" Content-Type value, which can
used to transmit application data or binary data
and hence, among other uses, to implement
electronic mail file transfer service

2.d. A "message" Content-Type value, for
a mail message

2.e An "image" Content-Type value, for
still image (picture) data

2.f. An "audio" Content-Type value, for
audio or voice data

2.g. A "video" Content-Type value, for
video or moving image data, possibly with audio
part of the composite video data format

3. A Content-Transfer-Encoding header field, which can
used to specify an auxiliary encoding that was
to the data in order to allow it to pass through
transport mechanisms which may have data or
set limitations

4. Two optional header fields that can be used to
describe the data in a message body, the Content-ID
Content-Description header fields



Borenstein & Freed [Page 2]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


MIME has been carefully designed as an extensible mechanism
and it is expected that the set of content-type/
pairs and their associated parameters will
significantly with time. Several other MIME fields,
including character set names, are likely to have new
defined over time. In order to ensure that the set of
values is developed in an orderly, well-specified,
public manner, MIME defines a registration process
uses the Internet Assigned Numbers Authority (IANA) as
central registry for such values. Appendix F
details about how IANA registration is accomplished

Finally, to specify and promote interoperability, Appendix
of this document provides a basic applicability
for a subset of the above mechanisms that defines a
level of "conformance" with this document

HISTORICAL NOTE: Several of the mechanisms described
this document may seem somewhat strange or even baroque
first reading. It is important to note that
with existing standards AND robustness across
practice were two of the highest priorities of the
group that developed this document. In particular
compatibility was always favored over elegance

2 Notations, Conventions, and Generic BNF

This document is being published in two versions, one
plain ASCII text and one as PostScript. The latter
recommended, though the textual contents are identical.
Andrew-format copy of this document is also available
the first author (Borenstein).

Although the mechanisms specified in this document are
described in prose, most are also described formally in
modified BNF notation of RFC 822. Implementors will need
be familiar with this notation in order to understand
specification, and are referred to RFC 822 for a
explanation of the modified BNF notation

Some of the modified BNF in this document makes reference
syntactic entities that are defined in RFC 822 and not
this document. A complete formal grammar, then, is
by combining the collected grammar appendix of this
with that of RFC 822.

The term CRLF, in this document, refers to the sequence
the two ASCII characters CR (13) and LF (10) which,
together, in this order, denote a line break in RFC 822
mail

The term "character set", wherever it is used in
document, refers to a coded character set, in the sense
ISO character set standardization work, and must not



Borenstein & Freed [Page 3]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


misinterpreted as meaning "a set of characters."

The term "message", when not further qualified, means
the (complete or "top-level") message being transferred on
network, or a message encapsulated in a body of
"message".

The term "body part", in this document, means one of
parts of the body of a multipart entity. A body part has
header and a body, so it makes sense to speak about the
of a body part

The term "entity", in this document, means either a
or a body part. All kinds of entities share the
that they have a header and a body

The term "body", when not further qualified, means the
of an entity, that is the body of either a message or of
body part

Note : the previous four definitions are clearly circular
This is unavoidable, since the overal structure of a
message is indeed recursive

In this document, all numeric and octet values are given
decimal notation

It must be noted that Content-Type values, subtypes,
parameter names as defined in this document are case
insensitive. However, parameter values are case-
unless otherwise specified for the specific parameter

FORMATTING NOTE: This document has been carefully
for ease of reading. The PostScript version of
document, in particular, places notes like this one,
may be skipped by the reader, in a smaller, italicized
font, and indents it as well. In the text version, only
indentation is preserved, so if you are reading the
version of this you might consider using the
version instead. However, all such notes will be
and preceded by "NOTE:" or some similar introduction,
in the text version

The primary purpose of these non-essential notes is
convey information about the rationale of this document,
to place this document in the proper historical
evolutionary context. Such information may be skipped
those who are focused entirely on building a
implementation, but may be of use to those who wish
understand why this document is written as it is

For ease of recognition, all BNF definitions have
placed in a fixed-width font in the PostScript version
this document



Borenstein & Freed [Page 4]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


3 The MIME-Version Header

Since RFC 822 was published in 1982, there has really
only one format standard for Internet messages, and
has been little perceived need to declare the
standard in use. This document is an independent
that complements RFC 822. Although the extensions in
document have been defined in such a way as to be
with RFC 822, there are still circumstances in which
might be desirable for a mail-processing agent to
whether a message was composed with the new standard
mind

Therefore, this document defines a new header field, "MIME
Version", which is to be used to declare the version of
Internet message body format standard in use

Messages composed in accordance with this document
include such a header field, with the following
text

MIME-Version: 1.0

The presence of this header field is an assertion that
message has been composed in compliance with this document

Since it is possible that a future document might extend
message format standard again, a formal BNF is given for
content of the MIME-Version field

MIME-Version :=

Thus, future format specifiers, which might replace
extend "1.0", are (minimally) constrained by the
of "text", which appears in RFC 822.

Note that the MIME-Version header field is required at
top level of a message. It is not required for each
part of a multipart entity. It is required for the
headers of a body of type "message" if and only if
embedded message is itself claimed to be MIME-compliant
















Borenstein & Freed [Page 5]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


4 The Content-Type Header

The purpose of the Content-Type field is to describe
data contained in the body fully enough that the
user agent can pick an appropriate agent or mechanism
present the data to the user, or otherwise deal with
data in an appropriate manner

HISTORICAL NOTE: The Content-Type header field was
defined in RFC 1049. RFC 1049 Content-types used a
and less powerful syntax, but one that is largely
with the mechanism given here

The Content-Type header field is used to specify the
of the data in the body of an entity, by giving type
subtype identifiers, and by providing auxiliary
that may be required for certain types. After the type
subtype names, the remainder of the header field is simply
set of parameters, specified in an attribute/value notation
The set of meaningful parameters differs for the
types. The ordering of parameters is not significant
Among the defined parameters is a "charset" parameter
which the character set used in the body may be declared
Comments are allowed in accordance with RFC 822 rules
structured header fields

In general, the top-level Content-Type is used to
the general type of data, while the subtype specifies
specific format for that type of data. Thus, a Content-
of "image/xyz" is enough to tell a user agent that the
is an image, even if the user agent has no knowledge of
specific image format "xyz". Such information can be used
for example, to decide whether or not to show a user the
data from an unrecognized subtype -- such an action might
reasonable for unrecognized subtypes of text, but not
unrecognized subtypes of image or audio. For this reason
registered subtypes of audio, image, text, and video,
not contain embedded information that is really of
different type. Such compound types should be
using the "multipart" or "application" types

Parameters are modifiers of the content-subtype, and do
fundamentally affect the requirements of the host system
Although most parameters make sense only with
content-types, others are "global" in the sense that
might apply to any subtype. For example, the "boundary
parameter makes sense only for the "multipart" content-type
but the "charset" parameter might make sense with
content-types

An initial set of seven Content-Types is defined by
document. This set of top-level names is intended to
substantially complete. It is expected that additions
the larger set of supported types can generally



Borenstein & Freed [Page 6]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


accomplished by the creation of new subtypes of
initial types. In the future, more top-level types may
defined only by an extension to this standard. If
primary type is to be used for any reason, it must be
a name starting with "X-" to indicate its non-
status and to avoid a potential conflict with a
official name

In the Extended BNF notation of RFC 822, a Content-
header field value is defined as follows

Content-Type := type "/" subtype *[";" parameter

type := "application" / "audio
/ "image" / "message
/ "multipart" / "text
/ "video" / x-

x-token := followed, with
intervening white space, by any token

subtype :=

parameter := attribute "="

attribute :=

value := token / quoted-

token := 1*
tspecials := "(" / ")" / "<" / ">" / "@" ; Must be
/ "," / ";" / ":" / "\" / <"> ; quoted-string
/ "/" / "[" / "]" / "?" / "." ; to use
/ "=" ; parameter

Note that the definition of "tspecials" is the same as
RFC 822 definition of "specials" with the addition of
three characters "/", "?", and "=".

Note also that a subtype specification is MANDATORY.
are no default subtypes

The type, subtype, and parameter names are not
sensitive. For example, TEXT, Text, and TeXt are
equivalent. Parameter values are normally case sensitive
but certain parameters are interpreted to be case
insensitive, depending on the intended use. (For example
multipart boundaries are case-sensitive, but the "access
type" for message/External-body is not case-sensitive.)

Beyond this syntax, the only constraint on the definition
subtype names is the desire that their uses must
conflict. That is, it would be undesirable to have



Borenstein & Freed [Page 7]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


different communities using "Content-Type
application/foobar" to mean two different things.
process of defining new content-subtypes, then, is
intended to be a mechanism for imposing restrictions,
simply a mechanism for publicizing the usages. There are
therefore, two acceptable mechanisms for defining
Content-Type subtypes

1. Private values (starting with "X-") may
defined bilaterally between two
agents without outside registration
standardization

2. New standard values must be documented
registered with, and approved by IANA,
described in Appendix F. Where intended
public use, the formats they refer to
also be defined by a published specification
and possibly offered for standardization

The seven standard initial predefined Content-Types
detailed in the bulk of this document. They are

text -- textual information. The primary subtype
"plain", indicates plain (unformatted) text.
special software is required to get the
meaning of the text, aside from support for
indicated character set. Subtypes are to be
for enriched text in forms where
software may enhance the appearance of the text
but such software must not be required in order
get the general idea of the content.
subtypes thus include any readable word
format. A very simple and portable subtype
richtext, is defined in this document
multipart -- data consisting of multiple parts
independent data types. Four initial
are defined, including the primary "mixed
subtype, "alternative" for representing the
data in multiple formats, "parallel" for
intended to be viewed simultaneously, and "digest
for multipart entities in which each part is
type "message".
message -- an encapsulated message. A body
Content-Type "message" is itself a fully
RFC 822 conformant message which may contain
own different Content-Type header field.
primary subtype is "rfc822". The "partial
subtype is defined for partial messages, to
the fragmented transmission of bodies that
thought to be too large to be passed through
transport facilities. Another subtype
"External-body", is defined for specifying
bodies by reference to an external data source



Borenstein & Freed [Page 8]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


image -- image data. Image requires a display
(such as a graphical display, a printer, or a
machine) to view the information.
subtypes are defined for two widely-used
formats, jpeg and gif
audio -- audio data, with initial subtype "basic".
Audio requires an audio output device (such as
speaker or a telephone) to "display" the contents
video -- video data. Video requires the capability
display moving images, typically
specialized hardware and software. The
subtype is "mpeg".
application -- some other kind of data,
either uninterpreted binary data or information
be processed by a mail-based application.
primary subtype, "octet-stream", is to be used
the case of uninterpreted binary data, in
case the simplest recommended action is to
to write the information into a file for the user
Two additional subtypes, "ODA" and "PostScript",
are defined for transporting ODA and
documents in bodies. Other expected uses
"application" include spreadsheets, data
mail-based scheduling systems, and languages
"active" (computational) email. (Note that
email entails several securityconsiderations
which are discussed later in this memo
particularly in the context
application/PostScript.)

Default RFC 822 messages are typed by this protocol as
text in the US-ASCII character set, which can be
specified as "Content-type: text/plain; charset=us-ascii".
If no Content-Type is specified, either by error or by
older user agent, this default is assumed. In the
of a MIME-Version header field, a receiving User Agent
also assume that plain US-ASCII text was the sender'
intent. In the absence of a MIME-Version specification
plain US-ASCII text must still be assumed, but the sender'
intent might have been otherwise

RATIONALE: In the absence of any Content-Type header
or MIME-Version header field, it is impossible to be
that a message is actually text in the US-ASCII
set, since it might well be a message that, using
conventions that predate this document, includes text
another character set or non-textual data in a manner
cannot be automatically recognized (e.g., a
compressed UNIX tar file). Although there is no
acceptable alternative to treating such untyped messages
"text/plain; charset=us-ascii", implementors should
aware that if a message lacks both the MIME-Version and
Content-Type header fields, it may in practice
almost anything



Borenstein & Freed [Page 9]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


It should be noted that the list of Content-Type
given here may be augmented in time, via the
described above, and that the set of subtypes is expected
grow substantially

When a mail reader encounters mail with an unknown Content
type value, it should generally treat it as equivalent
"application/octet-stream", as described later in
document

5 The Content-Transfer-Encoding Header

Many Content-Types which could usefully be transported
email are represented, in their "natural" format, as 8-
character or binary data. Such data cannot be
over some transport protocols. For example, RFC 821
restricts mail messages to 7-bit US-ASCII data with 1000
character lines

It is necessary, therefore, to define a standard
for re-encoding such data into a 7-bit short-line format
This document specifies that such encodings will
indicated by a new "Content-Transfer-Encoding" header field
The Content-Transfer-Encoding field is used to indicate
type of transformation that has been used in order
represent the body in an acceptable manner for transport

Unlike Content-Types, a proliferation of Content-Transfer
Encoding values is undesirable and unnecessary. However
establishing only a single Content-Transfer-
mechanism does not seem possible. There is a
between the desire for a compact and efficient encoding
largely-binary data and the desire for a readable
of data that is mostly, but not entirely, 7-bit data.
this reason, at least two encoding mechanisms are necessary
a "readable" encoding and a "dense" encoding

The Content-Transfer-Encoding field is designed to
an invertible mapping between the "native" representation
a type of data and a representation that can be
exchanged using 7 bit mail transport protocols, such
those defined by RFC 821 (SMTP). This field has not
defined by any previous standard. The field's value is
single token specifying the type of encoding, as
below. Formally

Content-Transfer-Encoding := "BASE64" / "QUOTED-PRINTABLE" /
"8BIT" / "7BIT" /
"BINARY" / x-

These values are not case sensitive. That is, Base64
BASE64 and bAsE64 are all equivalent. An encoding type
7BIT requires that the body is already in a seven-bit mail
ready representation. This is the default value -- that is



Borenstein & Freed [Page 10]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


"Content-Transfer-Encoding: 7BIT" is assumed if
Content-Transfer-Encoding header field is not present

The values "8bit", "7bit", and "binary" all imply that
encoding has been performed. However, they are
useful as indications of the kind of data contained in
object, and therefore of the kind of encoding that
need to be performed for transmission in a given
system. "7bit" means that the data is all represented
short lines of US-ASCII data. "8bit" means that the
are short, but there may be non-ASCII characters (
with the high-order bit set). "Binary" means that not
may non-ASCII characters be present, but also that the
are not necessarily short enough for SMTP transport

The difference between "8bit" (or any other
bit-width token) and the "binary" token is that "binary
does not require adherence to any limits on line length
to the SMTP CRLF semantics, while the bit-width tokens
require such adherence. If the body contains data in
bit-width other than 7-bit, the appropriate bit-
Content-Transfer-Encoding token must be used (e.g., "8bit
for unencoded 8 bit wide data). If the body contains
data, the "binary" Content-Transfer-Encoding token must
used

NOTE: The distinction between the Content-Transfer-
values of "binary," "8bit," etc. may seem unimportant,
that all of them really mean "none" -- that is, there
been no encoding of the data for transport. However,
labeling will be of enormous value to gateways
future mail transport systems with differing capabilities
transporting data that do not meet the restrictions of
821 transport

As of the publication of this document, there are
standardized Internet transports for which it is
to include unencoded 8-bit or binary data in mail bodies
Thus there are no circumstances in which the "8bit"
"binary" Content-Transfer-Encoding is actually legal on
Internet. However, in the event that 8-bit or binary
transport becomes a reality in Internet mail, or when
document is used in conjunction with any other 8-bit
binary-capable transport mechanism, 8-bit or binary
should be labeled as such using this mechanism

NOTE: The five values defined for the Content-Transfer
Encoding field imply nothing about the Content-Type
than the algorithm by which it was encoded or the
system requirements if unencoded

Implementors may, if necessary, define new Content
Transfer-Encoding values, but must use an x-token, which
a name prefixed by "X-" to indicate its non-standard status



Borenstein & Freed [Page 11]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


e.g., "Content-Transfer-Encoding: x-my-new-encoding".
However, unlike Content-Types and subtypes, the creation
new Content-Transfer-Encoding values is explicitly
strongly discouraged, as it seems likely to
interoperability with little potential benefit. Their
is allowed only as the result of an agreement
cooperating user agents

If a Content-Transfer-Encoding header field appears as
of a message header, it applies to the entire body of
message. If a Content-Transfer-Encoding header
appears as part of a body part's headers, it applies only
the body of that body part. If an entity is of
"multipart" or "message", the Content-Transfer-Encoding
not permitted to have any value other than a bit
(e.g., "7bit", "8bit", etc.) or "binary".

It should be noted that email is character-oriented, so
the mechanisms described here are mechanisms for
arbitrary byte streams, not bit streams. If a bit stream
to be encoded via one of these mechanisms, it must first
converted to an 8-bit byte stream using the network
bit order ("big-endian"), in which the earlier bits in
stream become the higher-order bits in a byte. A bit
not ending at an 8-bit boundary must be padded with zeroes
This document provides a mechanism for noting the
of such padding in the case of the application Content-Type
which has a "padding" parameter

The encoding mechanisms defined here explicitly encode
data in ASCII. Thus, for example, suppose an entity
header fields such as

Content-Type: text/plain; charset=ISO-8859-1
Content-transfer-encoding: base64

This should be interpreted to mean that the body is a base64
ASCII encoding of data that was originally in ISO-8859-1,
and will be in that character set again after decoding

The following sections will define the two standard
mechanisms. The definition of new content-transfer
encodings is explicitly discouraged and should only
when absolutely necessary. All content-transfer-
namespace except that beginning with "X-" is
reserved to the IANA for future use. Private
about content-transfer-encodings are also
discouraged

Certain Content-Transfer-Encoding values may only be used
certain Content-Types. In particular, it is
forbidden to use any encodings other than "7bit", "8bit",
"binary" with any Content-Type that recursively
other Content-Type fields, notably the "multipart"



Borenstein & Freed [Page 12]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


"message" Content-Types. All encodings that are desired
bodies of type multipart or message must be done at
innermost level, by encoding the actual body that needs
be encoded

NOTE ON ENCODING RESTRICTIONS: Though the
against using content-transfer-encodings on data of
multipart or message may seem overly restrictive, it
necessary to prevent nested encodings, in which data
passed through an encoding algorithm multiple times,
must be decoded multiple times in order to be
viewed. Nested encodings add considerable complexity
user agents: aside from the obvious efficiency
with such multiple encodings, they can obscure the
structure of a message. In particular, they can imply
several decoding operations are necessary simply to find
what types of objects a message contains. Banning
encodings may complicate the job of certain mail gateways
but this seems less of a problem than the effect of
encodings on user agents

NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT
TRANSFER-ENCODING: It may seem that the Content-Transfer
Encoding could be inferred from the characteristics of
Content-Type that is to be encoded, or, at the very least
that certain Content-Transfer-Encodings could be
for use with specific Content-Types. There are
reasons why this is not the case. First, given the
types of transports used for mail, some encodings may
appropriate for some Content-Type/transport combinations
not for others. (For example, in an 8-bit transport,
encoding would be required for text in certain
sets, while such encodings are clearly required for 7-
SMTP.) Second, certain Content-Types may require
types of transfer encoding under different circumstances
For example, many PostScript bodies might consist
of short lines of 7-bit data and hence require little or
encoding. Other PostScript bodies (especially those
Level 2 PostScript's binary encoding mechanism) may only
reasonably represented using a binary transport encoding
Finally, since Content-Type is intended to be an open-
specification mechanism, strict specification of
association between Content-Types and encodings
couples the specification of an application protocol with
specific lower-level transport. This is not desirable
the developers of a Content-Type should not have to be
of all the transports in use and what their limitations are

NOTE ON TRANSLATING ENCODINGS: The quoted-printable
base64 encodings are designed so that conversion
them is possible. The only issue that arises in such
conversion is the handling of line breaks. When
from quoted-printable to base64 a line break must
converted into a CRLF sequence. Similarly, a CRLF



Borenstein & Freed [Page 13]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


in base64 data should be converted to a quoted-
line break, but ONLY when converting text data

NOTE ON CANONICAL ENCODING MODEL: There was
confusion, in earlier drafts of this memo, regarding
model for when email data was to be converted to
form and encoded, and in particular how this process
affect the treatment of CRLFs, given that the
of newlines varies greatly from system to system. For
reason, a canonical model for encoding is presented
Appendix H

5.1 Quoted-Printable Content-Transfer-

The Quoted-Printable encoding is intended to represent
that largely consists of octets that correspond to
characters in the ASCII character set. It encodes the
in such a way that the resulting octets are unlikely to
modified by mail transport. If the data being encoded
mostly ASCII text, the encoded form of the data
largely recognizable by humans. A body which is
ASCII may also be encoded in Quoted-Printable to ensure
integrity of the data should the message pass through
character-translating, and/or line-wrapping gateway

In this encoding, octets are to be represented as
by the following rules

Rule #1: (General 8-bit representation) Any octet
except those indicating a line break according to
newline convention of the canonical form of the
being encoded, may be represented by an "=" followed
a two digit hexadecimal representation of the octet'
value. The digits of the hexadecimal alphabet, for
purpose, are "0123456789ABCDEF". Uppercase letters

used when sending hexadecimal data, though a
implementation may choose to recognize
letters on receipt. Thus, for example, the value 12
(ASCII form feed) can be represented by "=0C", and
value 61 (ASCII EQUAL SIGN) can be represented
"=3D". Except when the following rules allow
alternative encoding, this rule is mandatory

Rule #2: (Literal representation) Octets with
values of 33 through 60 inclusive, and 62 through 126,
inclusive, MAY be represented as the ASCII
which correspond to those octets (EXCLAMATION
through LESS THAN, and GREATER THAN through TILDE
respectively).

Rule #3: (White Space): Octets with values of 9 and 32
MAY be represented as ASCII TAB (HT) and
characters, respectively, but MUST NOT be



Borenstein & Freed [Page 14]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


represented at the end of an encoded line. Any TAB (HT
or SPACE characters on an encoded line MUST thus
followed on that line by a printable character.
particular, an "=" at the end of an encoded line
indicating a soft line break (see rule #5) may
one or more TAB (HT) or SPACE characters. It
that an octet with value 9 or 32 appearing at the
of an encoded line must be represented according
Rule #1. This rule is necessary because some
(Message Transport Agents, programs which
messages from one user to another, or perform a part
such transfers) are known to pad lines of text
SPACEs, and others are known to remove "white space
characters from the end of a line. Therefore,
decoding a Quoted-Printable body, any trailing
space on a line must be deleted, as it will
have been added by intermediate transport agents

Rule #4 (Line Breaks): A line break in a text
part, independent of what its representation
following the canonical representation of the
being encoded, must be represented by a (RFC 822)
break, which is a CRLF sequence, in the Quoted
Printable encoding. If isolated CRs and LFs, or LF
and CR LF sequences are allowed to appear in
data according to the canonical form, they must
represented using the "=0D", "=0A", "=0A=0D"
"=0D=0A" notations respectively

Note that many implementation may elect to encode
local representation of various content types directly
In particular, this may apply to plain text material
systems that use newline conventions other than
delimiters. Such an implementation is permissible,
the generation of line breaks must be generalized
account for the case where alternate representations
newline sequences are used

Rule #5 (Soft Line Breaks): The Quoted-
encoding REQUIRES that encoded lines be no more than 76
characters long. If longer lines are to be encoded
the Quoted-Printable encoding, 'soft' line breaks
be used. An equal sign as the last character on
encoded line indicates such a non-significant ('soft')
line break in the encoded text. Thus if the "raw"
of the line is a single unencoded line that says

Now's the time for all folk to come to the aid
their country

This can be represented, in the Quoted-
encoding,





Borenstein & Freed [Page 15]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


Now's the time =
for all folk to come
to the aid of their country

This provides a mechanism with which long lines
encoded in such a way as to be restored by the
agent. The 76 character limit does not count
trailing CRLF, but counts all other characters
including any equal signs

Since the hyphen character ("-") is represented as itself
the Quoted-Printable encoding, care must be taken,
encapsulating a quoted-printable encoded body in a
entity, to ensure that the encapsulation boundary does
appear anywhere in the encoded body. (A good strategy is
choose a boundary that includes a character sequence such
"=_" which can never appear in a quoted-printable body.
the definition of multipart messages later in
document.)

NOTE: The quoted-printable encoding represents something
a compromise between readability and reliability
transport. Bodies encoded with the quoted-
encoding will work reliably over most mail gateways, but
not work perfectly over a few gateways, notably
involving translation into EBCDIC. (In theory, an
gateway could decode a quoted-printable body and re-
it using base64, but such gateways do not yet exist.)
higher level of confidence is offered by the base64
Content-Transfer-Encoding. A way to get reasonably
transport through EBCDIC gateways is to also quote the


!"#$@[\]^`{|}~

according to rule #1. See Appendix B for more information

Because quoted-printable data is generally assumed to
line-oriented, it is to be expected that the breaks
the lines of quoted printable data may be altered
transport, in the same manner that plain text mail
always been altered in Internet mail when passing
systems with differing newline conventions. If
alterations are likely to constitute a corruption of
data, it is probably more sensible to use the base64
encoding rather than the quoted-printable encoding











Borenstein & Freed [Page 16]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


5.2 Base64 Content-Transfer-

The Base64 Content-Transfer-Encoding is designed
represent arbitrary sequences of octets in a form that
not humanly readable. The encoding and decoding
are simple, but the encoded data are consistently only
33 percent larger than the unencoded data. This encoding
based on the one used in Privacy Enhanced Mail applications
as defined in RFC 1113. The base64 encoding is
from RFC 1113, with one change: base64 eliminates the "*"
mechanism for embedded clear text

A 65-character subset of US-ASCII is used, enabling 6
to be represented per printable character. (The extra 65
character, "=", is used to signify a special
function.)

NOTE: This subset has the important property that it
represented identically in all versions of ISO 646,
including US ASCII, and all characters in the subset
also represented identically in all versions of EBCDIC
Other popular encodings, such as the encoding used by
UUENCODE utility and the base85 encoding specified as
of Level 2 PostScript, do not share these properties,
thus do not fulfill the portability requirements a
transport encoding for mail must meet

The encoding process represents 24-bit groups of input
as output strings of 4 encoded characters. Proceeding
left to right, a 24-bit input group is formed
concatenating 3 8-bit input groups. These 24 bits are
treated as 4 concatenated 6-bit groups, each of which
translated into a single digit in the base64 alphabet.
encoding a bit stream via the base64 encoding, the
stream must be presumed to be ordered with the most
significant-bit first. That is, the first bit in the
will be the high-order bit in the first byte, and the
bit will be the low-order bit in the first byte, and so on

Each 6-bit group is used as an index into an array of 64
printable characters. The character referenced by the
is placed in the output string. These characters,
in Table 1, below, are selected so as to be
representable, and the set excludes characters
particular significance to SMTP (e.g., ".", "CR", "LF")
to the encapsulation boundaries defined in this
(e.g., "-").










Borenstein & Freed [Page 17]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


Table 1: The Base64

Value Encoding Value Encoding Value Encoding

0 A 17 R 34 i 51
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47
14 O 31 f 48 w (pad) =
15 P 32 g 49
16 Q 33 h 50

The output stream (encoded bytes) must be represented
lines of no more than 76 characters each. All line
or other characters not found in Table 1 must be ignored
decoding software. In base64 data, characters other
those in Table 1, line breaks, and other white
probably indicate a transmission error, about which
warning message or even a message rejection might
appropriate under some circumstances

Special processing is performed if fewer than 24 bits
available at the end of the data being encoded. A
encoding quantum is always completed at the end of a body
When fewer than 24 input bits are available in an
group, zero bits are added (on the right) to form
integral number of 6-bit groups. Output character
which are not required to represent actual input data
set to the character "=". Since all base64 input is
integral number of octets, only the following cases
arise: (1) the final quantum of encoding input is
integral multiple of 24 bits; here, the final unit
encoded output will be an integral multiple of 4
with no "=" padding, (2) the final quantum of encoding
is exactly 8 bits; here, the final unit of encoded
will be two characters followed by two "="
characters, or (3) the final quantum of encoding input
exactly 16 bits; here, the final unit of encoded output
be three characters followed by one "=" padding character

Care must be taken to use the proper octets for line
if base64 encoding is applied directly to text material
has not been converted to canonical form. In particular
text line breaks should be converted into CRLF



Borenstein & Freed [Page 18]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


prior to base64 encoding. The important thing to note
that this may be done directly by the encoder rather than
a prior canonicalization step in some implementations

NOTE: There is no need to worry about quoting
encapsulation boundaries within base64-encoded parts
multipart entities because no hyphen characters are used
the base64 encoding

6 Additional Optional Content- Header

6.1 Optional Content-ID Header

In constructing a high-level user agent, it may be
to allow one body to make reference to another
Accordingly, bodies may be labeled using the "Content-ID
header field, which is syntactically identical to
"Message-ID" header field

Content-ID := msg-

Like the Message-ID values, Content-ID values must
generated to be as unique as possible

6.2 Optional Content-Description Header

The ability to associate some descriptive information with
given body is often desirable. For example, it may be
to mark an "image" body as "a picture of the Space
Endeavor." Such text may be placed in the Content
Description header field

Content-Description := *

The description is presumed to be given in the US-
character set, although the mechanism specified in [RFC
1342] may be used for non-US-ASCII Content-
values



















Borenstein & Freed [Page 19]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


7 The Predefined Content-Type

This document defines seven initial Content-Type values
an extension mechanism for private or experimental types
Further standard types must be defined by new
specifications. It is expected that most innovation in
types of mail will take place as subtypes of the seven
defined here. The most essential characteristics of
seven content-types are summarized in Appendix G

7.1 The Text Content-

The text Content-Type is intended for sending material
is principally textual in form. It is the default Content
Type. A "charset" parameter may be used to indicate
character set of the body text. The primary subtype of
is "plain". This indicates plain (unformatted) text.
default Content-Type for Internet mail is "text/plain
charset=us-ascii".

Beyond plain text, there are many formats for
what might be known as "extended text" -- text with
formatting and presentation information. An
characteristic of many such representations is that they
to some extent readable even without the software
interprets them. It is useful, then, to distinguish them
at the highest level, from such unreadable data as images
audio, or text represented in an unreadable form. In
absence of appropriate interpretation software, it
reasonable to show subtypes of text to the user, while it
not reasonable to do so with most nontextual data

Such formatted textual data should be represented
subtypes of text. Plausible subtypes of text are
given by the common name of the representation format, e.g.,
"text/richtext".

7.1.1 The charset

A critical parameter that may be specified in the Content
Type field for text data is the character set. This
specified with a "charset" parameter, as in

Content-type: text/plain; charset=us-

Unlike some other parameter values, the values of
charset parameter are NOT case sensitive. The
character set, which must be assumed in the absence of
charset parameter, is US-ASCII

An initial list of predefined character set names can
found at the end of this section. Additional character
may be registered with IANA as described in Appendix F
although the standardization of their use requires the



Borenstein & Freed [Page 20]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


IAB review and approval. Note that if the
character set includes 8-bit data, a Content-Transfer
Encoding header field and a corresponding encoding on
data are required in order to transmit the body via
mail transfer protocols, such as SMTP

The default character set, US-ASCII, has been the subject
some confusion and ambiguity in the past. Not only
there some ambiguities in the definition, there have
wide variations in practice. In order to eliminate
ambiguity and variations in the future, it is
recommended that new user agents explicitly specify
character set via the Content-Type header field. "US-ASCII
does not indicate an arbitrary seven-bit character code,
specifies that the body uses character coding that uses
exact correspondence of codes to characters specified
ASCII. National use variations of ISO 646 [ISO-646] are
ASCII and their use in Internet mail is
discouraged. The omission of the ISO 646 character set
deliberate in this regard. The character set name of "US
ASCII" explicitly refers to ANSI X3.4-1986 [US-ASCII] only
The character set name "ASCII" is reserved and must not
used for any purpose

NOTE: RFC 821 explicitly specifies "ASCII", and
an earlier version of the American Standard. Insofar as
of the purposes of specifying a Content-Type and
set is to permit the receiver to unambiguously determine
the sender intended the coded message to be interpreted
assuming anything other than "strict ASCII" as the
would risk unintentional and incompatible changes to
semantics of messages now being transmitted. This
implies that messages containing characters coded
to national variations on ISO 646, or using code-
procedures (e.g., those of ISO 2022), as well as 8-bit
multiple octet character encodings MUST use an
character set specification to be consistent with
specification

The complete US-ASCII character set is listed in [US-ASCII].
Note that the control characters including DEL (0-31, 127)
have no defined meaning apart from the combination
(ASCII values 13 and 10) indicating a new line. Two of
characters have de facto meanings in wide use: FF (12)
means "start subsequent text on the beginning of a
page"; and TAB or HT (9) often (though not always)
"move the cursor to the next available column after
current position where the column number is a multiple of 8
(counting the first column as column 0)." Apart from this
any use of the control characters or DEL in a body must
part of a private agreement between the sender
recipient. Such private agreements are discouraged
should be replaced by the other capabilities of
document



Borenstein & Freed [Page 21]




RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


NOTE: Beyond US-ASCII, an enormous proliferation
character sets is possible. It is the opinion of the
working group that a large number of character sets is NOT
good thing. We would prefer to specify a single
set that can be used universally for representing all of
world's languages in electronic mail. Unfortunately
existing practice in several communities seems to point
the continued use of multiple character sets in the
future. For this reason, we define names for a small
of character sets for which a strong constituent
exists. It is our hope that ISO 10646 or some
effort will eventually define a single world character
which can then be specified for use in Internet mail, but
the advance of that definition we cannot specify the use
ISO 10646, Unicode, or any other character set
definition is, as of this writing, incomplete

The defined charset values are

US-ASCII -- as defined in [US-ASCII].

ISO-8859-X -- where "X" is to be replaced,
necessary, for the parts of ISO-8859 [ISO
8859]. Note that the ISO 646 character
have deliberately been omitted in favor
their 8859 replacements, which are
designated character sets for Internet mail
As of the publication of this document,
legitimate values for "X" are the digits 1