As per Relevance of the word september, we have this rfc below:











Network Working Group N.
Request for Comments: 1521
Obsoletes: 1341 N.
Category: Standards Track
September 1993


MIME (Multipurpose Internet Mail Extensions) Part One
Mechanisms for Specifying and
the Format of Internet Message

Status of this

This RFC specifies an Internet standards track protocol for
Internet community, and requests discussion and suggestions
improvements. Please refer to the current edition of the "
Official Protocol Standards" for the standardization state and
of this protocol. Distribution of this memo is unlimited



STD 11, RFC 822 defines a message representation protocol
specifies considerable detail about message headers, but which
the message content, or message body, as flat ASCII text.
document redefines the format of message bodies to allow multi-
textual and non-textual message bodies to be represented
exchanged without loss of information. This is based on earlier
documented in RFC 934 and STD 11, RFC 1049, but extends and
that work. Because RFC 822 said so little about message bodies,
document is largely orthogonal to (rather than a revision of)
822.

In particular, this document is designed to provide facilities
include multiple objects in a single message, to represent body
in character sets other than US-ASCII, to represent formatted multi
font text messages, to represent non-textual material such as
and audio fragments, and generally to facilitate later
defining new types of Internet mail for use by cooperating
agents

This document does NOT extend Internet mail header fields to
anything other than US-ASCII text data. Such extensions are
subject of a companion document [RFC-1522].

This document is a revision of RFC 1341. Significant
from RFC 1341 are summarized in Appendix H





Borenstein & Freed [Page 1]

RFC 1521 MIME September 1993


Table of

1. Introduction....................................... 3
2. Notations, Conventions, and Generic BNF Grammar.... 6
3. The MIME-Version Header Field...................... 7
4. The Content-Type Header Field...................... 9
5. The Content-Transfer-Encoding Header Field......... 13
5.1. Quoted-Printable Content-Transfer-Encoding......... 18
5.2. Base64 Content-Transfer-Encoding................... 21
6. Additional Content-Header Fields................... 23
6.1. Optional Content-ID Header Field................... 23
6.2. Optional Content-Description Header Field.......... 24
7. The Predefined Content-Type Values................. 24
7.1. The Text Content-Type.............................. 24
7.1.1. The charset parameter.............................. 25
7.1.2. The Text/plain subtype............................. 28
7.2. The Multipart Content-Type......................... 28
7.2.1. Multipart: The common syntax...................... 29
7.2.2. The Multipart/mixed (primary) subtype.............. 34
7.2.3. The Multipart/alternative subtype.................. 34
7.2.4. The Multipart/digest subtype....................... 36
7.2.5. The Multipart/parallel subtype..................... 37
7.2.6. Other Multipart subtypes........................... 37
7.3. The Message Content-Type........................... 38
7.3.1. The Message/rfc822 (primary) subtype............... 38
7.3.2. The Message/Partial subtype........................ 39
7.3.3. The Message/External-Body subtype.................. 42
7.3.3.1. The "ftp" and "tftp" access-types............... 44
7.3.3.2. The "anon-ftp" access-type...................... 45
7.3.3.3. The "local-file" and "afs" access-types......... 45
7.3.3.4. The "mail-server" access-type................... 45
7.3.3.5. Examples and Further Explanations............... 46
7.4. The Application Content-Type....................... 49
7.4.1. The Application/Octet-Stream (primary) subtype..... 50
7.4.2. The Application/PostScript subtype................. 50
7.4.3. Other Application subtypes......................... 53
7.5. The Image Content-Type............................. 53
7.6. The Audio Content-Type............................. 54
7.7. The Video Content-Type............................. 54
7.8. Experimental Content-Type Values................... 54
8. Summary............................................ 56
9. Security Considerations............................ 56
10. Authors' Addresses................................. 57
11. Acknowledgements................................... 58
Appendix A -- Minimal MIME-Conformance.................... 60
Appendix B -- General Guidelines For Sending Email Data... 63
Appendix C -- A Complex Multipart Example................. 66
Appendix D -- Collected Grammar........................... 68



Borenstein & Freed [Page 2]

RFC 1521 MIME September 1993


Appendix E -- IANA Registration Procedures................ 72
E.1 Registration of New Content-type/subtype Values...... 72
E.2 Registration of New Access-type
for Message/external-body............................ 73
Appendix F -- Summary of the Seven Content-types.......... 74
Appendix G -- Canonical Encoding Model.................... 76
Appendix H -- Changes from RFC 1341....................... 78
References................................................ 80

1.

Since its publication in 1982, STD 11, RFC 822 [RFC-822] has
the standard format of textual mail messages on the Internet.
success has been such that the RFC 822 format has been adopted
wholly or partially, well beyond the confines of the Internet and
Internet SMTP transport defined by STD 10, RFC 821 [RFC-821]. As
format has seen wider use, a number of limitations have
increasingly restrictive for the user community

RFC 822 was intended to specify a format for text messages. As such
non-text messages, such as multimedia messages that might
audio or images, are simply not mentioned. Even in the case of text
however, RFC 822 is inadequate for the needs of mail users
languages require the use of character sets richer than US
[US-ASCII]. Since RFC 822 does not specify mechanisms for
containing audio, video, Asian language text, or even text in
European languages, additional specifications are needed

One of the notable limitations of RFC 821/822 based mail systems
the fact that they limit the contents of electronic mail messages
relatively short lines of seven-bit ASCII. This forces users
convert any non-textual data that they may wish to send into seven
bit bytes representable as printable ASCII characters before
a local mail UA (User Agent, a program with which human users
and receive mail). Examples of such encodings currently used in
Internet include pure hexadecimal, uuencode, the 3-in-4 base 64
scheme specified in RFC 1421, the Andrew Toolkit
[ATK], and many others

The limitations of RFC 822 mail become even more apparent as
are designed to allow for the exchange of mail messages between
822 hosts and X.400 hosts. X.400 [X400] specifies mechanisms for
inclusion of non-textual body parts within electronic mail messages
The current standards for the mapping of X.400 messages to RFC 822
messages specify either that X.400 non-textual body parts must
converted to (not encoded in) an ASCII format, or that they must
discarded, notifying the RFC 822 user that discarding has occurred
This is clearly undesirable, as information that a user may wish



Borenstein & Freed [Page 3]

RFC 1521 MIME September 1993


receive is lost. Even though a user's UA may not have the
of dealing with the non-textual body part, the user might have
mechanism external to the UA that can extract useful information
the body part. Moreover, it does not allow for the fact that
message may eventually be gatewayed back into an X.400
handling system (i.e., the X.400 message is "tunneled"
Internet mail), where the non-textual information would
become useful again

This document describes several mechanisms that combine to solve
of these problems without introducing any serious
with the existing world of RFC 822 mail. In particular,
describes

1. A MIME-Version header field, which uses a version number
declare a message to be conformant with this specification
allows mail processing agents to distinguish between
messages and those generated by older or non-conformant software
which is presumed to lack such a field

2. A Content-Type header field, generalized from RFC 1049 [RFC-1049],
which can be used to specify the type and subtype of data in
body of a message and to fully specify the native
(encoding) of such data

2.a. A "text" Content-Type value, which can be used to
textual information in a number of character sets
formatted text description languages in a
manner

2.b. A "multipart" Content-Type value, which can be used
combine several body parts, possibly of differing types
data, into a single message

2.c. An "application" Content-Type value, which can be used
transmit application data or binary data, and hence,
other uses, to implement an electronic mail file
service

2.d. A "message" Content-Type value, for encapsulating
mail message

2.e An "image" Content-Type value, for transmitting still
(picture) data

2.f. An "audio" Content-Type value, for transmitting audio
voice data




Borenstein & Freed [Page 4]

RFC 1521 MIME September 1993


2.g. A "video" Content-Type value, for transmitting video
moving image data, possibly with audio as part of
composite video data format

3. A Content-Transfer-Encoding header field, which can be used
specify an auxiliary encoding that was applied to the data
order to allow it to pass through mail transport mechanisms
may have data or character set limitations

4. Two additional header fields that can be used to further
the data in a message body, the Content-ID and Content
Description header fields

MIME has been carefully designed as an extensible mechanism, and
is expected that the set of content-type/subtype pairs and
associated parameters will grow significantly with time.
other MIME fields, notably including character set names, are
to have new values defined over time. In order to ensure that
set of such values is developed in an orderly, well-specified,
public manner, MIME defines a registration process which uses
Internet Assigned Numbers Authority (IANA) as a central registry
such values. Appendix E provides details about how IANA
is accomplished

Finally, to specify and promote interoperability, Appendix A of
document provides a basic applicability statement for a subset of
above mechanisms that defines a minimal level of "conformance"
this document

HISTORICAL NOTE: Several of the mechanisms described in
document may seem somewhat strange or even baroque at
reading. It is important to note that compatibility with
standards AND robustness across existing practice were two of
highest priorities of the working group that developed
document. In particular, compatibility was always favored
elegance

MIME was first defined and published as RFCs 1341 and 1342 [RFC-1341]
[RFC-1342]. This document is a relatively minor updating of
1341, and is intended to supersede it. The differences between
document and RFC 1341 are summarized in Appendix H. Please refer
the current edition of the "IAB Official Protocol Standards" for
standardization state and status of this protocol. Several other
documents will be of interest to the MIME implementor, in
[RFC 1343], [RFC-1344], and [RFC-1345].






Borenstein & Freed [Page 5]

RFC 1521 MIME September 1993


2. Notations, Conventions, and Generic BNF

This document is being published in two versions, one as plain
text and one as PostScript (PostScript is a trademark of
Systems Incorporated.). While the text version is the
specification, some will find the PostScript version easier to read
The textual contents are identical. An Andrew-format copy of
document is also available from the first author (Borenstein).

Although the mechanisms specified in this document are all
in prose, most are also described formally in the modified
notation of RFC 822. Implementors will need to be familiar with
notation in order to understand this specification, and are
to RFC 822 for a complete explanation of the modified BNF notation

Some of the modified BNF in this document makes reference
syntactic entities that are defined in RFC 822 and not in
document. A complete formal grammar, then, is obtained by
the collected grammar appendix of this document with that of RFC 822
plus the modifications to RFC 822 defined in RFC 1123,
specifically changes the syntax for `return', `date' and `mailbox'.

The term CRLF, in this document, refers to the sequence of the
ASCII characters CR (13) and LF (10) which, taken together, in
order, denote a line break in RFC 822 mail

The term "character set" is used in this document to refer to
method used with one or more tables to convert encoded text to
series of octets. This definition is intended to allow various
of text encodings, from simple single-table mappings such as ASCII
complex table switching methods such as those that use ISO 2022'
techniques. However, a MIME character set name must fully
the mapping to be performed

The term "message", when not further qualified, means either
(complete or "top-level") message being transferred on a network,
a message encapsulated in a body of type "message".

The term "body part", in this document, means one of the parts of
body of a multipart entity. A body part has a header and a body,
it makes sense to speak about the body of a body part

The term "entity", in this document, means either a message or a
part. All kinds of entities share the property that they have
header and a body

The term "body", when not further qualified, means the body of
entity, that is the body of either a message or of a body part



Borenstein & Freed [Page 6]

RFC 1521 MIME September 1993


NOTE: The previous four definitions are clearly circular. This
unavoidable, since the overall structure of a MIME message
indeed recursive

In this document, all numeric and octet values are given in
notation

It must be noted that Content-Type values, subtypes, and
names as defined in this document are case-insensitive. However
parameter values are case-sensitive unless otherwise specified
the specific parameter

FORMATTING NOTE: This document has been carefully formatted
ease of reading. The PostScript version of this document,
particular, places notes like this one, which may be skipped
the reader, in a smaller, italicized, font, and indents it
well. In the text version, only the indentation is preserved,
if you are reading the text version of this you might
using the PostScript version instead. However, all such notes
be indented and preceded by "NOTE:" or some similar introduction
even in the text version

The primary purpose of these non-essential notes is to
information about the rationale of this document, or to place
document in the proper historical or evolutionary context.
information may be skipped by those who are focused entirely
building a conformant implementation, but may be of use to
who wish to understand why this document is written as it is

For ease of recognition, all BNF definitions have been placed in
fixed-width font in the PostScript version of this document

3. The MIME-Version Header

Since RFC 822 was published in 1982, there has really been only
format standard for Internet messages, and there has been
perceived need to declare the format standard in use. This
is an independent document that complements RFC 822. Although
extensions in this document have been defined in such a way as to
compatible with RFC 822, there are still circumstances in which
might be desirable for a mail-processing agent to know whether
message was composed with the new standard in mind

Therefore, this document defines a new header field, "MIME-Version",
which is to be used to declare the version of the Internet
body format standard in use

Messages composed in accordance with this document MUST include



Borenstein & Freed [Page 7]

RFC 1521 MIME September 1993


a header field, with the following verbatim text

MIME-Version: 1.0

The presence of this header field is an assertion that the
has been composed in compliance with this document

Since it is possible that a future document might extend the
format standard again, a formal BNF is given for the content of
MIME-Version field

version := "MIME-Version" ":" 1*DIGIT "." 1*

Thus, future format specifiers, which might replace or extend "1.0",
are constrained to be two integer fields, separated by a period.
a message is received with a MIME-version value other than "1.0",
cannot be assumed to conform with this specification

Note that the MIME-Version header field is required at the top
of a message. It is not required for each body part of a
entity. It is required for the embedded headers of a body of
"message" if and only if the embedded message is itself claimed to
MIME-conformant

It is not possible to fully specify how a mail reader that
with MIME as defined in this document should treat a message
might arrive in the future with some value of MIME-Version other
"1.0". However, conformant software is encouraged to check
version number and at least warn the user if an unrecognized MIME
version is encountered

It is also worth noting that version control for specific content
types is not accomplished using the MIME-Version mechanism.
particular, some formats (such as application/postscript)
version numbering conventions that are internal to the
format. Where such conventions exist, MIME does nothing to
them. Where no such conventions exist, a MIME type might use
"version" parameter in the content-type field if necessary

NOTE TO IMPLEMENTORS: All header fields defined in this document
including MIME-Version, Content-type, etc., are subject to
general syntactic rules for header fields specified in RFC 822.
particular, all can include comments, which means that the
two MIME-Version fields are equivalent

MIME-Version: 1.0
MIME-Version: 1.0 (Generated by GBD-killer 3.7)




Borenstein & Freed [Page 8]

RFC 1521 MIME September 1993


4. The Content-Type Header

The purpose of the Content-Type field is to describe the
contained in the body fully enough that the receiving user agent
pick an appropriate agent or mechanism to present the data to
user, or otherwise deal with the data in an appropriate manner

HISTORICAL NOTE: The Content-Type header field was first defined
RFC 1049. RFC 1049 Content-types used a simpler and less
syntax, but one that is largely compatible with the mechanism
here

The Content-Type header field is used to specify the nature of
data in the body of an entity, by giving type and
identifiers, and by providing auxiliary information that may
required for certain types. After the type and subtype names,
remainder of the header field is simply a set of parameters
specified in an attribute/value notation. The set of
parameters differs for the different types. In particular, there
NO globally-meaningful parameters that apply to all content-types
Global mechanisms are best addressed, in the MIME model, by
definition of additional Content-* header fields. The ordering
parameters is not significant. Among the defined parameters is
"charset" parameter by which the character set used in the body
be declared. Comments are allowed in accordance with RFC 822
for structured header fields

In general, the top-level Content-Type is used to declare the
type of data, while the subtype specifies a specific format for
type of data. Thus, a Content-Type of "image/xyz" is enough to
a user agent that the data is an image, even if the user agent has
knowledge of the specific image format "xyz". Such information
be used, for example, to decide whether or not to show a user the
data from an unrecognized subtype -- such an action might
reasonable for unrecognized subtypes of text, but not
unrecognized subtypes of image or audio. For this reason,
subtypes of audio, image, text, and video, should not
embedded information that is really of a different type.
compound types should be represented using the "multipart"
"application" types

Parameters are modifiers of the content-subtype, and do
fundamentally affect the requirements of the host system.
most parameters make sense only with certain content-types,
are "global" in the sense that they might apply to any subtype.
example, the "boundary" parameter makes sense only for
"multipart" content-type, but the "charset" parameter might
sense with several content-types



Borenstein & Freed [Page 9]

RFC 1521 MIME September 1993


An initial set of seven Content-Types is defined by this document
This set of top-level names is intended to be substantially complete
It is expected that additions to the larger set of supported
can generally be accomplished by the creation of new subtypes
these initial types. In the future, more top-level types may
defined only by an extension to this standard. If another
type is to be used for any reason, it must be given a name
with "X-" to indicate its non-standard status and to avoid
potential conflict with a future official name

In the Augmented BNF notation of RFC 822, a Content-Type header
value is defined as follows

content := "Content-Type" ":" type "/" subtype *(";"
parameter
; case-insensitive matching of type and

type := "application" / "audio
/ "image" / "message
/ "multipart" / "text
/ "video" / extension-
; All values case-

extension-token := x-token / iana-

iana-token := extension token
registered with IANA, as specified
appendix E

x-token := followed,
no intervening white space, by any token

subtype := token ; case-

parameter := attribute "="

attribute := token ; case-

value := token / quoted-

token := 1* or tspecials

tspecials := "(" / ")" / "<" / ">" / "@"
/ "," / ";" / ":" / "\" / <">
/ "/" / "[" / "]" / "?" / "="
; Must be in quoted-string
; to use within parameter



Borenstein & Freed [Page 10]

RFC 1521 MIME September 1993


Note that the definition of "tspecials" is the same as the RFC 822
definition of "specials" with the addition of the three
"/", "?", and "=", and the removal of ".".

Note also that a subtype specification is MANDATORY. There are
default subtypes

The type, subtype, and parameter names are not case sensitive.
example, TEXT, Text, and TeXt are all equivalent. Parameter
are normally case sensitive, but certain parameters are
to be case-insensitive, depending on the intended use. (For example
multipart boundaries are case-sensitive, but the "access-type"
message/External-body is not case-sensitive.)

Beyond this syntax, the only constraint on the definition of
names is the desire that their uses must not conflict. That is,
would be undesirable to have two different communities
"Content-Type: application/foobar" to mean two different things.
process of defining new content-subtypes, then, is not intended to
a mechanism for imposing restrictions, but simply a mechanism
publicizing the usages. There are, therefore, two
mechanisms for defining new Content-Type subtypes

1. Private values (starting with "X-") may
defined bilaterally between two
agents without outside registration
standardization

2. New standard values must be documented
registered with, and approved by IANA,
described in Appendix E. Where intended
public use, the formats they refer to
also be defined by a published specification
and possibly offered for standardization

The seven standard initial predefined Content-Types are detailed
the bulk of this document. They are

text -- textual information. The primary subtype
"plain", indicates plain (unformatted) text.
special software is required to get the
meaning of the text, aside from support for
indicated character set. Subtypes are to be
for enriched text in forms where
software may enhance the appearance of the text
but such software must not be required in order
get the general idea of the content.
subtypes thus include any readable word



Borenstein & Freed [Page 11]

RFC 1521 MIME September 1993


format. A very simple and portable subtype
richtext, was defined in RFC 1341, with a
revision expected

multipart -- data consisting of multiple parts
independent data types. Four initial
are defined, including the primary "mixed
subtype, "alternative" for representing the
data in multiple formats, "parallel" for
intended to be viewed simultaneously, and "digest
for multipart entities in which each part is
type "message".

message -- an encapsulated message. A body
Content-Type "message" is itself all or part of
fully formatted RFC 822 conformant message
may contain its own different Content-Type
field. The primary subtype is "rfc822".
"partial" subtype is defined for partial messages
to permit the fragmented transmission of
that are thought to be too large to be
through mail transport facilities.
subtype, "External-body", is defined
specifying large bodies by reference to
external data source

image -- image data. Image requires a display
(such as a graphical display, a printer, or a
machine) to view the information.
subtypes are defined for two widely-used
formats, jpeg and gif

audio -- audio data, with initial subtype "basic".
Audio requires an audio output device (such as
speaker or a telephone) to "display" the contents

video -- video data. Video requires the capability
display moving images, typically
specialized hardware and software. The
subtype is "mpeg".

application -- some other kind of data,
either uninterpreted binary data or information
be processed by a mail-based application.
primary subtype, "octet-stream", is to be used
the case of uninterpreted binary data, in
case the simplest recommended action is to
to write the information into a file for the user



Borenstein & Freed [Page 12]

RFC 1521 MIME September 1993


An additional subtype, "PostScript", is
for transporting PostScript documents in bodies
Other expected uses for "application"
spreadsheets, data for mail-based
systems, and languages for "active
(computational) email. (Note that active
and other application data may entail
security considerations, which are discussed
in this memo, particularly in the context
application/PostScript.)

Default RFC 822 messages are typed by this protocol as plain text
the US-ASCII character set, which can be explicitly specified
"Content-type: text/plain; charset=us-ascii". If no Content-Type
specified, this default is assumed. In the presence of a MIME
Version header field, a receiving User Agent can also assume
plain US-ASCII text was the sender's intent. In the absence of
MIME-Version specification, plain US-ASCII text must still
assumed, but the sender's intent might have been otherwise

RATIONALE: In the absence of any Content-Type header field
MIME-Version header field, it is impossible to be certain that
message is actually text in the US-ASCII character set, since
might well be a message that, using the conventions that
this document, includes text in another character set or non
textual data in a manner that cannot be automatically
(e.g., a uuencoded compressed UNIX tar file). Although there
no fully acceptable alternative to treating such untyped
as "text/plain; charset=us-ascii", implementors should
aware that if a message lacks both the MIME-Version and
Content-Type header fields, it may in practice contain
anything

It should be noted that the list of Content-Type values given
may be augmented in time, via the mechanisms described above,
that the set of subtypes is expected to grow substantially

When a mail reader encounters mail with an unknown Content-
value, it should generally treat it as equivalent
"application/octet-stream", as described later in this document

5. The Content-Transfer-Encoding Header

Many Content-Types which could usefully be transported via email
represented, in their "natural" format, as 8-bit character or
data. Such data cannot be transmitted over some transport protocols
For example, RFC 821 restricts mail messages to 7-bit US-ASCII
with lines no longer than 1000 characters



Borenstein & Freed [Page 13]

RFC 1521 MIME September 1993


It is necessary, therefore, to define a standard mechanism for re
encoding such data into a 7-bit short-line format. This
specifies that such encodings will be indicated by a new "Content
Transfer-Encoding" header field. The Content-Transfer-Encoding
is used to indicate the type of transformation that has been used
order to represent the body in an acceptable manner for transport

Unlike Content-Types, a proliferation of Content-Transfer-
values is undesirable and unnecessary. However, establishing only
single Content-Transfer-Encoding mechanism does not seem possible
There is a tradeoff between the desire for a compact and
encoding of largely-binary data and the desire for a
encoding of data that is mostly, but not entirely, 7-bit data.
this reason, at least two encoding mechanisms are necessary:
"readable" encoding and a "dense" encoding

The Content-Transfer-Encoding field is designed to specify
invertible mapping between the "native" representation of a type
data and a representation that can be readily exchanged using 7
mail transport protocols, such as those defined by RFC 821 (SMTP).
This field has not been defined by any previous standard. The field'
value is a single token specifying the type of encoding,
enumerated below. Formally

encoding := "Content-Transfer-Encoding" ":"

mechanism := "7bit" ; case-
/ "quoted-printable
/ "base64"
/ "8bit
/ "binary
/ x-

These values are not case sensitive. That is, Base64 and BASE64
bAsE64 are all equivalent. An encoding type of 7BIT requires
the body is already in a seven-bit mail-ready representation.
is the default value -- that is, "Content-Transfer-Encoding: 7BIT"
assumed if the Content-Transfer-Encoding header field is not present

The values "8bit", "7bit", and "binary" all mean that NO encoding
been performed. However, they are potentially useful as
of the kind of data contained in the object, and therefore of
kind of encoding that might need to be performed for transmission
a given transport system. In particular

"7bit" means that the data is all represented as
lines of US-ASCII data




Borenstein & Freed [Page 14]

RFC 1521 MIME September 1993


"8bit" means that the lines are short, but there may
non-ASCII characters (octets with the high-
bit set).

"Binary" means that not only may non-ASCII
be present, but also that the lines are
necessarily short enough for SMTP transport

The difference between "8bit" (or any other conceivable bit-
token) and the "binary" token is that "binary" does not
adherence to any limits on line length or to the SMTP CRLF semantics
while the bit-width tokens do require such adherence. If the
contains data in any bit-width other than 7-bit, the
bit-width Content-Transfer-Encoding token must be used (e.g., "8bit
for unencoded 8 bit wide data). If the body contains binary data
the "binary" Content-Transfer-Encoding token must be used

NOTE: The distinction between the Content-Transfer-Encoding
of "binary", "8bit", etc. may seem unimportant, in that all
them really mean "none" -- that is, there has been no encoding
the data for transport. However, clear labeling will be
enormous value to gateways between future mail transport
with differing capabilities in transporting data that do not
the restrictions of RFC 821 transport

Mail transport for unencoded 8-bit data is defined in RFC-1426
[RFC-1426]. As of the publication of this document, there are
standardized Internet mail transports for which it is
to include unencoded binary data in mail bodies. Thus there
no circumstances in which the "binary" Content-Transfer-
is actually legal on the Internet. However, in the event
binary mail transport becomes a reality in Internet mail, or
this document is used in conjunction with any other binary-
transport mechanism, binary bodies should be labeled as such
this mechanism

NOTE: The five values defined for the Content-Transfer-
field imply nothing about the Content-Type other than
algorithm by which it was encoded or the transport
requirements if unencoded

Implementors may, if necessary, define new Content-Transfer-
values, but must use an x-token, which is a name prefixed by "X-"
indicate its non-standard status, e.g., "Content-Transfer-Encoding
x-my-new-encoding". However, unlike Content-Types and subtypes,
creation of new Content-Transfer-Encoding values is explicitly
strongly discouraged, as it seems likely to hinder
with little potential benefit. Their use is allowed only as



Borenstein & Freed [Page 15]

RFC 1521 MIME September 1993


result of an agreement between cooperating user agents

If a Content-Transfer-Encoding header field appears as part of
message header, it applies to the entire body of that message. If
Content-Transfer-Encoding header field appears as part of a
part's headers, it applies only to the body of that body part. If
entity is of type "multipart" or "message", the Content-Transfer
Encoding is not permitted to have any value other than a bit
(e.g., "7bit", "8bit", etc.) or "binary".

It should be noted that email is character-oriented, so that
mechanisms described here are mechanisms for encoding arbitrary
streams, not bit streams. If a bit stream is to be encoded via
of these mechanisms, it must first be converted to an 8-bit
stream using the network standard bit order ("big-endian"), in
the earlier bits in a stream become the higher-order bits in a byte
A bit stream not ending at an 8-bit boundary must be padded
zeroes. This document provides a mechanism for noting the
of such padding in the case of the application Content-Type,
has a "padding" parameter

The encoding mechanisms defined here explicitly encode all data
ASCII. Thus, for example, suppose an entity has header fields
as

Content-Type: text/plain; charset=ISO-8859-1
Content-transfer-encoding: base64

This must be interpreted to mean that the body is a base64
encoding of data that was originally in ISO-8859-1, and will be
that character set again after decoding

The following sections will define the two standard
mechanisms. The definition of new content-transfer-encodings
explicitly discouraged and should only occur when
necessary. All content-transfer-encoding namespace except
beginning with "X-" is explicitly reserved to the IANA for
use. Private agreements about content-transfer-encodings are
explicitly discouraged

Certain Content-Transfer-Encoding values may only be used on
Content-Types. In particular, it is expressly forbidden to use
encodings other than "7bit", "8bit", or "binary" with any Content
Type that recursively includes other Content-Type fields, notably
"multipart" and "message" Content-Types. All encodings that
desired for bodies of type multipart or message must be done at
innermost level, by encoding the actual body that needs to
encoded



Borenstein & Freed [Page 16]

RFC 1521 MIME September 1993


NOTE ON ENCODING RESTRICTIONS: Though the prohibition
using content-transfer-encodings on data of type multipart
message may seem overly restrictive, it is necessary to
nested encodings, in which data are passed through an
algorithm multiple times, and must be decoded multiple times
order to be properly viewed. Nested encodings add
complexity to user agents: aside from the obvious
problems with such multiple encodings, they can obscure the
structure of a message. In particular, they can imply
several decoding operations are necessary simply to find out
types of objects a message contains. Banning nested encodings
complicate the job of certain mail gateways, but this seems
of a problem than the effect of nested encodings on user agents

NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT
TRANSFER-ENCODING: It may seem that the Content-Transfer-
could be inferred from the characteristics of the Content-
that is to be encoded, or, at the very least, that
Content-Transfer-Encodings could be mandated for use with
Content-Types. There are several reasons why this is not the case
First, given the varying types of transports used for mail,
encodings may be appropriate for some Content-Type/
combinations and not for others. (For example, in an 8-
transport, no encoding would be required for text in
character sets, while such encodings are clearly required for 7-
bit SMTP.) Second, certain Content-Types may require
types of transfer encoding under different circumstances.
example, many PostScript bodies might consist entirely of
lines of 7-bit data and hence require little or no encoding
Other PostScript bodies (especially those using Level 2
PostScript's binary encoding mechanism) may only be
represented using a binary transport encoding. Finally,
Content-Type is intended to be an open-ended
mechanism, strict specification of an association
Content-Types and encodings effectively couples the
of an application protocol with a specific lower-level transport
This is not desirable since the developers of a Content-
should not have to be aware of all the transports in use and
their limitations are

NOTE ON TRANSLATING ENCODINGS: The quoted-printable and base64
encodings are designed so that conversion between them
possible. The only issue that arises in such a conversion is
handling of line breaks. When converting from quoted-printable
base64 a line break must be converted into a CRLF sequence
Similarly, a CRLF sequence in base64 data must be converted to
quoted-printable line break, but ONLY when converting text data




Borenstein & Freed [Page 17]

RFC 1521 MIME September 1993


NOTE ON CANONICAL ENCODING MODEL: There was some confusion,
earlier drafts of this memo, regarding the model for when
data was to be converted to canonical form and encoded, and
particular how this process would affect the treatment of CRLFs
given that the representation of newlines varies greatly
system to system, and the relationship between content-transfer
encodings and character sets. For this reason, a canonical
for encoding is presented as Appendix G

5.1. Quoted-Printable Content-Transfer-

The Quoted-Printable encoding is intended to represent data
largely consists of octets that correspond to printable characters
the ASCII character set. It encodes the data in such a way that
resulting octets are unlikely to be modified by mail transport.
the data being encoded are mostly ASCII text, the encoded form of
data remains largely recognizable by humans. A body which
entirely ASCII may also be encoded in Quoted-Printable to ensure
integrity of the data should the message pass through a character
translating, and/or line-wrapping gateway

In this encoding, octets are to be represented as determined by
following rules

Rule #1: (General 8-bit representation) Any octet, except
indicating a line break according to the newline convention of
canonical (standard) form of the data being encoded, may
represented by an "=" followed by a two digit
representation of the octet's value. The digits of
hexadecimal alphabet, for this purpose, are "0123456789ABCDEF".
Uppercase letters must be used when sending hexadecimal data
though a robust implementation may choose to recognize
letters on receipt. Thus, for example, the value 12 (ASCII
feed) can be represented by "=0C", and the value 61 (ASCII
SIGN) can be represented by "=3D". Except when the
rules allow an alternative encoding, this rule is mandatory

Rule #2: (Literal representation) Octets with decimal values of 33
through 60 inclusive, and 62 through 126, inclusive, MAY
represented as the ASCII characters which correspond to
octets (EXCLAMATION POINT through LESS THAN, and GREATER
through TILDE, respectively).

Rule #3: (White Space): Octets with values of 9 and 32 MAY
represented as ASCII TAB (HT) and SPACE characters, respectively
but MUST NOT be so represented at the end of an encoded line.
TAB (HT) or SPACE characters on an encoded line MUST thus
followed on that line by a printable character. In particular,



Borenstein & Freed [Page 18]

RFC 1521 MIME September 1993


"=" at the end of an encoded line, indicating a soft line
(see rule #5) may follow one or more TAB (HT) or SPACE characters
It follows that an octet with value 9 or 32 appearing at the
of an encoded line must be represented according to Rule #1.
rule is necessary because some MTAs (Message Transport Agents
programs which transport messages from one user to another,
perform a part of such transfers) are known to pad lines of
with SPACEs, and others are known to remove "white space
characters from the end of a line. Therefore, when decoding
Quoted-Printable body, any trailing white space on a line must
deleted, as it will necessarily have been added by
transport agents

Rule #4 (Line Breaks): A line break in a text body, independent
what its representation is following the canonical
of the data being encoded, must be represented by a (RFC 822)
break, which is a CRLF sequence, in the Quoted-Printable encoding
Since the canonical representation of types other than text do
generally include the representation of line breaks, no hard
breaks (i.e. line breaks that are intended to be meaningful
to be displayed to the user) should occur in the quoted-
encoding of such types. Of course, occurrences of "=0D", "=0A",
"0A=0D" and "=0D=0A" will eventually be encountered. In general
however, base64 is preferred over quoted-printable for
data

Note that many implementations may elect to encode the
representation of various content types directly, as described
Appendix G. In particular, this may apply to plain text
on systems that use newline conventions other than
delimiters. Such an implementation is permissible, but
generation of line breaks must be generalized to account for
case where alternate representations of newline sequences
used

Rule #5 (Soft Line Breaks): The Quoted-Printable encoding
that encoded lines be no more than 76 characters long. If
lines are to be encoded with the Quoted-Printable encoding, 'soft
line breaks must be used. An equal sign as the last character on
encoded line indicates such a non-significant ('soft') line
in the encoded text. Thus if the "raw" form of the line is
single unencoded line that says

Now's the time for all folk to come to the aid
their country

This can be represented, in the Quoted-Printable encoding,




Borenstein & Freed [Page 19]

RFC 1521 MIME September 1993


Now's the time =
for all folk to come
to the aid of their country

This provides a mechanism with which long lines are encoded
such a way as to be restored by the user agent. The 76
limit does not count the trailing CRLF, but counts all
characters, including any equal signs

Since the hyphen character ("-") is represented as itself in
Quoted-Printable encoding, care must be taken, when encapsulating
quoted-printable encoded body in a multipart entity, to ensure
the encapsulation boundary does not appear anywhere in the
body. (A good strategy is to choose a boundary that includes
character sequence such as "=_" which can never appear in a quoted
printable body. See the definition of multipart messages later
this document.)

NOTE: The quoted-printable encoding represents something of
compromise between readability and reliability in transport
Bodies encoded with the quoted-printable encoding will
reliably over most mail gateways, but may not work perfectly
a few gateways, notably those involving translation into EBCDIC
(In theory, an EBCDIC gateway could decode a quoted-printable
and re-encode it using base64, but such gateways do not
exist.) A higher level of confidence is offered by the base64
Content-Transfer-Encoding. A way to get reasonably
transport through EBCDIC gateways is to also quote the


!"#$@[\]^`{|}~

according to rule #1. See Appendix B for more information

Because quoted-printable data is generally assumed to be line
oriented, it is to be expected that the representation of the
between the lines of quoted printable data may be altered
transport, in the same manner that plain text mail has always
altered in Internet mail when passing between systems with
newline conventions. If such alterations are likely to constitute
corruption of the data, it is probably more sensible to use
base64 encoding rather than the quoted-printable encoding

WARNING TO IMPLEMENTORS: If binary data are encoded in quoted
printable, care must be taken to encode CR and LF characters as "=0D
and "=0A", respectively. In particular, a CRLF sequence in
data should be encoded as "=0D=0A". Otherwise, if CRLF
represented as a hard line break, it might be incorrectly decoded



Borenstein & Freed [Page 20]

RFC 1521 MIME September 1993


platforms with different line break conventions

For formalists, the syntax of quoted-printable data is described
the following grammar

quoted-printable := ([*(ptext / SPACE / TAB) ptext] ["="] CRLF
; Maximum line length of 76 characters excluding

ptext := octet /character except "=", SPACE, or TAB
; characters not listed as "mail-safe" in Appendix
; are also not recommended

octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
; octet must be used for characters > 127, =, SPACE, or TAB
; and is recommended for any characters not listed
; Appendix B as "mail-safe".

5.2. Base64 Content-Transfer-

The Base64 Content-Transfer-Encoding is designed to
arbitrary sequences of octets in a form that need not be
readable. The encoding and decoding algorithms are simple, but
encoded data are consistently only about 33 percent larger than
unencoded data. This encoding is virtually identical to the one
in Privacy Enhanced Mail (PEM) applications, as defined in RFC 1421.
The base64 encoding is adapted from RFC 1421, with one change: base64
eliminates the "*" mechanism for embedded clear text

A 65-character subset of US-ASCII is used, enabling 6 bits to
represented per printable character. (The extra 65th character, "=",
is used to signify a special processing function.)

NOTE: This subset has the important property that it
represented identically in all versions of ISO 646, including
ASCII, and all characters in the subset are also
identically in all versions of EBCDIC. Other popular encodings
such as the encoding used by the uuencode utility and the base85
encoding specified as part of Level 2 PostScript, do not
these properties, and thus do not fulfill the
requirements a binary transport encoding for mail must meet

The encoding process represents 24-bit groups of input bits as
strings of 4 encoded characters. Proceeding from left to right,
24-bit input group is formed by concatenating 3 8-bit input groups
These 24 bits are then treated as 4 concatenated 6-bit groups,
of which is translated into a single digit in the base64 alphabet
When encoding a bit stream via the base64 encoding, the bit
must be presumed to be ordered with the most-significant-bit first



Borenstein & Freed [Page 21]

RFC 1521 MIME September 1993


That is, the first bit in the stream will be the high-order bit
the first byte, and the eighth bit will be the low-order bit in
first byte, and so on

Each 6-bit group is used as an index into an array of 64
characters. The character referenced by the index is placed in
output string. These characters, identified in Table 1, below,
selected so as to be universally representable, and the set
characters with particular significance to SMTP (e.g., ".", CR, LF
and to the encapsulation boundaries defined in this document (e.g.,
"-").

Table 1: The Base64

Value Encoding Value Encoding Value Encoding Value
0 A 17 R 34 i 51
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47
14 O 31 f 48 w (pad) =
15 P 32 g 49
16 Q 33 h 50

The output stream (encoded bytes) must be represented in lines of
more than 76 characters each. All line breaks or other
not found in Table 1 must be ignored by decoding software. In base64
data, characters other than those in Table 1, line breaks, and
white space probably indicate a transmission error, about which
warning message or even a message rejection might be
under some circumstances

Special processing is performed if fewer than 24 bits are
at the end of the data being encoded. A full encoding quantum
always completed at the end of a body. When fewer than 24 input
are available in an input group, zero bits are added (on the right
to form an integral number of 6-bit groups. Padding at the end
the data is performed using the '=' character. Since all base64
input is an integral number of octets, only the following cases



Borenstein & Freed [Page 22]

RFC 1521 MIME September 1993


arise: (1) the final quantum of encoding input is an
multiple of 24 bits; here, the final unit of encoded output will
an integral multiple of 4 characters with no "=" padding, (2)
final quantum of encoding input is exactly 8 bits; here, the
unit of encoded output will be two characters followed by two "="
padding characters, or (3) the final quantum of encoding input
exactly 16 bits; here, the final unit of encoded output will be
characters followed by one "=" padding character

Because it is used only for padding at the end of the data,
occurrence of any '=' characters may be taken as evidence that
end of the data has been reached (without truncation in transit).
such assurance is possible, however, when the number of
transmitted was a multiple of three

Any characters outside of the base64 alphabet are to be ignored
base64-encoded data. The same applies to any illegal sequence
characters in the base64 encoding, such as "====="

Care must be taken to use the proper octets for line breaks if base64
encoding is applied directly to text material that has not
converted to canonical form. In particular, text line breaks must
converted into CRLF sequences prior to base64 encoding. The
thing to note is that this may be done directly by the encoder
than in a prior canonicalization step in some implementations

NOTE: There is no need to worry about quoting
encapsulation boundaries within base64-encoded parts of
entities because no hyphen characters are used in the base64
encoding

6. Additional Content-Header

6.1. Optional Content-ID Header

In constructing a high-level user agent, it may be desirable to
one body to make reference to another. Accordingly, bodies may
labeled using the "Content-ID" header field, which is
identical to the "Message-ID" header field

id := "Content-ID" ":" msg-
Like the Message-ID values, Content-ID values must be generated to
world-unique

The Content-ID value may be used for uniquely identifying
entities in several contexts, particularly for cacheing
referenced by the message/external-body mechanism. Although
Content-ID header is generally optional, its use is mandatory



Borenstein & Freed [Page 23]

RFC 1521 MIME September 1993


implementations which generate data of the optional MIME Content-
"message/external-body". That is, each message/external-body
must have a Content-ID field to permit cacheing of such data

It is also worth noting that the Content-ID value has
semantics in the case of the multipart/alternative content-type
This is explained in the section of this document dealing
multipart/alternative

6.2. Optional Content-Description Header

The ability to associate some descriptive information with a
body is often desirable. For example, it may be useful to mark
"image" body as "a picture of the Space Shuttle Endeavor." Such
may be placed in the Content-Description header field

description := "Content-Description" ":" *

The description is presumed to be given in the US-ASCII
set, although the mechanism specified in [RFC-1522] may be used
non-US-ASCII Content-Description values

7. The Predefined Content-Type

This document defines seven initial Content-Type values and
extension mechanism for private or experimental types.
standard types must be defined by new published specifications.
is expected that most innovation in new types of mail will take
as subtypes of the seven types defined here. The most
characteristics of the seven content-types are summarized in
F

7.1 The Text Content-

The text Content-Type is intended for sending material which
principally textual in form. It is the default Content-Type.
"charset" parameter may be used to indicate the character set of
body text for some text subtypes, notably including the
subtype, "text/plain", which indicates plain (unformatted) text.
default Content-Type for Internet mail is "text/plain; charset=us
ascii".

Beyond plain text, there are many formats for representing what
be known as "extended text" -- text with embedded formatting
presentation information. An interesting characteristic of many
representations is that they are to some extent readable even
the software that interprets them. It is useful, then,
distinguish them, at the highest level, from such unreadable data



Borenstein & Freed [Page 24]

RFC 1521 MIME September 1993


images, audio, or text represented in an unreadable form. In
absence of appropriate interpretation software, it is reasonable
show subtypes of text to the user, while it is not reasonable to
so with most nontextual data

Such formatted textual data should be represented using subtypes
text. Plausible subtypes of text are typically given by the
name of the representation format, e.g., "text/richtext" [RFC-1341].

7.1.1. The charset

A critical parameter that may be specified in the Content-Type
for text/plain data is the character set. This is specified with
"charset" parameter, as in

Content-type: text/plain; charset=us-

Unlike some other parameter values, the values of the
parameter are NOT case sensitive. The default character set,
must be assumed in the absence of a charset parameter, is US-ASCII

The specification for any future subtypes of "text" must
whether or not they will also utilize a "charset" parameter, and
possibly restrict its values as well. When used with a
body, the semantics of the "charset" parameter should be identical
those specified here for "text/plain", i.e., the body
entirely of characters in the given charset. In particular,
of future text subtypes should pay close attention the
implications of multibyte character sets for their
definitions

This RFC specifies the definition of the charset parameter for
purposes of MIME to be a unique mapping of a byte stream to glyphs,
mapping which does not require external profiling information

An initial list of predefined character set names can be found at
end of this section. Additional character sets may be
with IANA, although the standardization of their use requires
usual IESG [RFC-1340] review and approval. Note that if