As per Relevance of the word parameter, we have this rfc below:
Network Working Group M.
Request for Comments: 3023 IBM Tokyo Research
Obsoletes: 2376 S. St.
Updates: 2048 simonstl.
Category: Standards Track D.
Skymoon
January 2001
XML Media
Status of this
This document specifies an Internet standards track protocol for
Internet community, and requests discussion and suggestions
improvements. Please refer to the current edition of the "
Official Protocol Standards" (STD 1) for the standardization
and status of this protocol. Distribution of this memo is unlimited
Copyright
Copyright (C) The Internet Society (2001). All Rights Reserved
This document standardizes five new media types -- text/xml
application/xml, text/xml-external-parsed-entity, application/xml
external-parsed-entity, and application/xml-dtd -- for use
exchanging network entities that are related to the Extensible
Language (XML). This document also standardizes a convention (
the suffix '+xml') for naming media types outside of these five
when those media types represent XML MIME (Multipurpose Internet
Extensions) entities. XML MIME entities are currently exchanged
the HyperText Transfer Protocol on the World Wide Web, are
integral part of the WebDAV protocol for remote web authoring,
are expected to have utility in many domains
Major differences from RFC 2376 are (1) the addition of text/xml
external-parsed-entity, application/xml-external-parsed-entity,
application/xml-dtd, (2) the '+xml' suffix convention (which
updates the RFC 2048 registration process), and (3) the discussion
"utf-16le" and "utf-16be".
Murata, et al. Standards Track [Page 1]
RFC 3023 XML Media Types January 2001
Table of
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Notational Conventions . . . . . . . . . . . . . . . . . . . 4
3. XML Media Types . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Text/xml Registration . . . . . . . . . . . . . . . . . . . 7
3.2 Application/xml Registration . . . . . . . . . . . . . . . . 9
3.3 Text/xml-external-parsed-entity Registration . . . . . . . . 11
3.4 Application/xml-external-parsed-entity Registration . . . . 12
3.5 Application/xml-dtd Registration . . . . . . . . . . . . . . 13
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4. The Byte Order Mark (BOM) and Conversions to/from the UTF-16
Charset . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5. Fragment Identifiers . . . . . . . . . . . . . . . . . . . . 15
6. The Base URI . . . . . . . . . . . . . . . . . . . . . . . . 15
7. A Naming Convention for XML-Based Media Types . . . . . . . 16
7.1 Referencing . . . . . . . . . . . . . . . . . . . . . . . . 18
8. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8.1 Text/xml with UTF-8 Charset . . . . . . . . . . . . . . . . 19
8.2 Text/xml with UTF-16 Charset . . . . . . . . . . . . . . . . 19
8.3 Text/xml with UTF-16BE Charset . . . . . . . . . . . . . . . 19
8.4 Text/xml with ISO-2022-KR Charset . . . . . . . . . . . . . 20
8.5 Text/xml with Omitted Charset . . . . . . . . . . . . . . . 20
8.6 Application/xml with UTF-16 Charset . . . . . . . . . . . . 20
8.7 Application/xml with UTF-16BE Charset . . . . . . . . . . . 21
8.8 Application/xml with ISO-2022-KR Charset . . . . . . . . . . 21
8.9 Application/xml with Omitted Charset and UTF-16 XML
Entity . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
8.10 Application/xml with Omitted Charset and UTF-8 Entity . . . 22
8.11 Application/xml with Omitted Charset and Internal
Declaration . . . . . . . . . . . . . . . . . . . . . . . . 22
8.12 Text/xml-external-parsed-entity with UTF-8 Charset . . . . . 22
8.13 Application/xml-external-parsed-entity with UTF-16 Charset . 23
8.14 Application/xml-external-parsed-entity with UTF-16BE Charset 23
8.15 Application/xml-dtd . . . . . . . . . . . . . . . . . . . . 23
8.16 Application/mathml+xml . . . . . . . . . . . . . . . . . . . 24
8.17 Application/xslt+xml . . . . . . . . . . . . . . . . . . . . 24
8.18 Application/rdf+xml . . . . . . . . . . . . . . . . . . . . 24
8.19 Image/svg+xml . . . . . . . . . . . . . . . . . . . . . . . 24
8.20 INCONSISTENT EXAMPLE: Text/xml with UTF-8 Charset . . . . . 25
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . 25
10. Security Considerations . . . . . . . . . . . . . . . . . . 25
References . . . . . . . . . . . . . . . . . . . . . . . . . 27
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 31
A. Why Use the '+xml' Suffix for XML-Based MIME Types? . . . . 32
A.1 Why not just use text/xml or application/xml and let the
processor dispatch to the correct application based on
referenced DTD? . . . . . . . . . . . . . . . . . . . . . . 32
Murata, et al. Standards Track [Page 2]
RFC 3023 XML Media Types January 2001
A.2 Why not create a new subtree (e.g., image/xml.svg)
represent XML MIME types? . . . . . . . . . . . . . . . . . 32
A.3 Why not create a new top-level MIME type for XML-based
types? . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.4 Why not just have the MIME processor 'sniff' the content
determine whether it is XML? . . . . . . . . . . . . . . . . 33
A.5 Why not use a MIME parameter to specify that a media
uses XML syntax? . . . . . . . . . . . . . . . . . . . . . . 33
A.6 How about labeling with parameters in the other
(e.g., application/xml; Content-Feature=iotp)? . . . . . . . 34
A.7 How about a new superclass MIME parameter that is defined
apply to all MIME types (e.g., Content-Type
application/iotp; $superclass=xml)? . . . . . . . . . . . . 34
A.8 What about adding a new parameter to the Content-
header or creating a new Content-Structure header
indicate XML syntax? . . . . . . . . . . . . . . . . . . . . 35
A.9 How about a new Alternative-Content-Type header? . . . . . . 35
A.10 How about using a conneg tag instead (e.g., accept-features
(syntax=xml))? . . . . . . . . . . . . . . . . . . . . . . . 35
A.11 How about a third-level content-type, such as text/xml/rdf? 35
A.12 Why use the plus ('+') character for the suffix '+xml'? . . 36
A.13 What is the semantic difference between application/foo
application/foo+xml? . . . . . . . . . . . . . . . . . . . . 36
A.14 What happens when an even better markup language (e.g.,
EBML) is defined, or a new category of data? . . . . . . . . 36
A.15 Why must I use the '+xml' suffix for my new XML-based
type? . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
B. Changes from RFC 2376 . . . . . . . . . . . . . . . . . . . 37
C. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38
Full Copyright Statement . . . . . . . . . . . . . . . . . . 39
1.
The World Wide Web Consortium has issued Extensible Markup
(XML) 1.0 (Second Edition)[XML]. To enable the exchange of
network entities, this document standardizes five new media types --
text/xml, application/xml, text/xml-external-parsed-entity
application/xml-external-parsed-entity, and application/xml-dtd --
well as a naming convention for identifying XML-based MIME
types
XML entities are currently exchanged on the World Wide Web, and
is also used for property values and parameter marshalling by
WebDAV[RFC2518] protocol for remote web authoring. Thus, there is
need for a media type to properly label the exchange of XML
entities
Murata, et al. Standards Track [Page 3]
RFC 3023 XML Media Types January 2001
Although XML is a subset of the Standard Generalized Markup
(SGML) ISO 8879[SGML], which has been assigned the media
text/sgml and application/sgml, there are several reasons why use
text/sgml or application/sgml to label XML is inappropriate. First
there exist many applications that can process XML, but that
process SGML, due to SGML's larger feature set. Second,
applications cannot always process XML entities, because XML
features of recent technical corrigenda to SGML. Third,
definition of text/sgml and application/sgml in [RFC1874]
parameters for SGML bit combination transformation format (SGML
bctf), and SGML boot attribute (SGML-boot). Since XML does not
these parameters, it would be ambiguous if such parameters were
for an XML MIME entity. For these reasons, the best approach
labeling XML network entities is to provide new media types for XML
Since XML is an integral part of the WebDAV Distributed
Protocol, and since World Wide Web Consortium Recommendations
conventionally been assigned IETF tree media types, and since
media types (HTML, SGML) have been assigned IETF tree media types
the XML media types also belong in the IETF media types tree
Similarly, XML will be used as a foundation for other media types
including types in every branch of the IETF media types tree.
facilitate the processing of such types, media types based on XML
but that are not identified using text/xml or application/xml,
be named using a suffix of '+xml' as described in Section 7.
will allow XML-based tools -- browsers, editors, search engines,
other processors -- to work with all XML-based media types
2. Notational
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
document are to be interpreted as described in [RFC2119].
As defined in [RFC2781], the three charsets "utf-16", "utf-16le",
"utf-16be" are used to label UTF-16 text. In this document, "
UTF-16 family" refers to those three charsets. By contrast,
phrases "utf-16" or UTF-16 in this document refer specifically to
single charset "utf-16".
As sometimes happens between two communities, both MIME and XML
defined the term entity, with different meanings. Section 2.4
[RFC2045] says
"The term 'entity' refers specifically to the MIME-defined
fields and contents of either a message or one of the parts in
body of a multipart entity."
Murata, et al. Standards Track [Page 4]
RFC 3023 XML Media Types January 2001
Section 4 of [XML] says
"An XML document may consist of one or many storage units"
entities that "have content" and are normally "identified
name".
In this document, "XML MIME entity" is defined as the latter (an
entity) encapsulated in the former (a MIME entity).
3. XML Media
This document standardizes five media types related to XML
entities: text/xml, application/xml, text/xml-external-parsed-entity
application/xml-external-parsed-entity, and application/xml-dtd
Registration information for these media types is described in
sections below
Within the XML specification, XML MIME entities can be
into four types. In the XML terminology, they are called "
entities", "external DTD subsets", "external parsed entities",
"external parameter entities". The media types text/xml
application/xml MAY be used for "document entities", while text/xml
external-parsed-entity or application/xml-external-parsed-
SHOULD be used for "external parsed entities". The media
application/xml-dtd SHOULD be used for "external DTD subsets"
"external parameter entities". application/xml and text/xml MUST
be used for "external parameter entities" or "external DTD subsets",
and MUST NOT be used for "external parsed entities" unless they
also well-formed "document entities" and are referenced as such
Note that [RFC2376] (which this document obsoletes) allowed
usage, although in practice it is likely to have been rare
Neither external DTD subsets nor external parameter entities parse
XML documents, and while some XML document entities may be used
external parsed entities and vice versa, there are many cases
the two are not interchangeable. XML also has unparsed entities
internal parsed entities, and internal parameter entities, but
are not XML MIME entities
If an XML document -- that is, the unprocessed, source XML
-- is readable by casual users, text/xml is preferable
application/xml. MIME user agents (and web user agents) that do
have explicit support for text/xml will treat it as text/plain,
example, by displaying the XML MIME entity as plain text
Application/xml is preferable when the XML MIME entity is
by casual users. Similarly, text/xml-external-parsed-entity
Murata, et al. Standards Track [Page 5]
RFC 3023 XML Media Types January 2001
preferable when an external parsed entity is readable by
users, but application/xml-external-parsed-entity is preferable
a plain text display is inappropriate
NOTE: Users are in general not used to text containing tags
as , and often find such tags quite disorienting
annoying. If one is not sure, the conservative principle
suggest using application/* instead of text/* so as not to
information in front of users that they will quite likely
understand
The top-level media type "text" has some restrictions on
entities and they are described in [RFC2045] and [RFC2046].
particular, the UTF-16 family, UCS-4, and UTF-32 are not
(except over HTTP[RFC2616], which uses a MIME-like mechanism). Thus
if an XML document or external parsed entity is encoded in
character encoding schemes, it cannot be labeled as text/xml
text/xml-external-parsed-entity (except for HTTP).
Text/xml and application/xml behave differently when the
parameter is not explicitly specified. If the default charset (i.e.,
US-ASCII) for text/xml is inconvenient for some reason (e.g., bad
servers), application/xml provides an alternative (see "
parameters" of application/xml registration in Section 3.2).
same rules apply to the distinction between text/xml-external
parsed-entity and application/xml-external-parsed-entity
XML provides a general framework for defining sequences of
data. In some cases, it may be desirable to define new media
that use XML but define a specific application of XML, perhaps due
domain-specific security considerations or runtime information
Furthermore, such media types may allow UTF-8 or UTF-16 only
prohibit other charsets. This document does not prohibit such
types and in fact expects them to proliferate. However,
of such media types are STRONGLY RECOMMENDED to use this document
a basis for their registration. In particular, the charset
SHOULD be used in the same manner, as described in Section 7.1,
order to enhance interoperability
An XML document labeled as text/xml or application/xml might
namespace declarations, stylesheet-linking processing
(PIs), schema information, or other declarations that might be
to suggest how the document is to be processed. For example,
document might have the XHTML namespace and a reference to a
stylesheet. Such a document might be handled by applications
would use this information to dispatch the document for
processing
Murata, et al. Standards Track [Page 6]
RFC 3023 XML Media Types January 2001
3.1 Text/xml
MIME media type name:
MIME subtype name:
Mandatory parameters:
Optional parameters:
Although listed as an optional parameter, the use of the
parameter is STRONGLY RECOMMENDED, since this information can
used by XML processors to determine authoritatively the
encoding of the XML MIME entity. The charset parameter can
be used to provide protocol-specific operations, such as charset
based content negotiation in HTTP. "utf-8" [RFC2279] is
recommended value, representing the UTF-8 charset. UTF-8
supported by all conforming processors of [XML].
If the XML MIME entity is transmitted via HTTP, which uses
MIME-like mechanism that is exempt from the restrictions on
text top-level type (see section 19.4.1 of [RFC2616]), "utf-16"
[RFC2781]) is also recommended. UTF-16 is supported by
conforming processors of [XML]. Since the handling of CR, LF
NUL for text types in most MIME applications would cause
transformations of individual octets in UTF-16 multi-
characters, gateways from HTTP to these MIME applications
transform the XML MIME entity from text/xml; charset="utf-16"
application/xml; charset="utf-16".
Conformant with [RFC2046], if a text/xml entity is received
the charset parameter omitted, MIME processors and XML
MUST use the default charset value of "us-ascii"[ASCII]. In
where the XML MIME entity is transmitted via HTTP, the
charset value is still "us-ascii". (Note: There is
inconsistency between this specification and HTTP/1.1, which
ISO-8859-1[ISO8859] as the default for a historical reason.
XML is a new format, a new default should be chosen for
I18N. US-ASCII was chosen, since it is the intersection of UTF-8
and ISO-8859-1 and since it is already used by MIME.)
There are several reasons that the charset parameter
authoritative. First, some MIME processing engines do
of MIME bodies of the top-level media type "text"
reference to any of the internal content. Thus, it is
that some agent might change text/xml; charset="iso-2022-jp"
text/xml; charset="utf-8" without modifying the
declaration of an XML document. Second, text/xml must
Murata, et al. Standards Track [Page 7]
RFC 3023 XML Media Types January 2001
compatible with text/plain, since MIME agents that do
understand text/xml will fallback to handling it as text/plain
If the charset parameter for text/xml were not authoritative,
fallback would cause data corruption. Third, recent web
have been improved so that users can specify the
parameter. Fourth, [RFC2130] specifies that the
specification scheme is the "charset" parameter
Since the charset parameter is authoritative, the charset is
always declared within an XML encoding declaration. Thus,
care is needed when the recipient strips the MIME header
provides persistent storage of the received XML MIME entity (e.g.,
in a file system). Unless the charset is UTF-8 or UTF-16,
recipient SHOULD also persistently store information about
charset, perhaps by embedding a correct XML encoding
within the XML MIME entity
Encoding considerations: This media type MAY be encoded
appropriate for the charset and the capabilities of the
MIME transport. For 7-bit transports, data in UTF-8 MUST
encoded in quoted-printable or base64. For 8-bit clean
(e.g., 8BITMIME[RFC1652] ESMTP or NNTP[RFC0977]), UTF-8 does
need to be encoded. Over HTTP[RFC2616], no content-transfer
encoding is necessary and UTF-16 may also be used
Security considerations: See Section 10.
Interoperability considerations: XML has proven to be
across WebDAV clients and servers, and for import and export
multiple XML authoring tools. For maximum interoperability
validating processors are recommended. Although non-
processors may be more efficient, they are not required to
all features of XML. For further information, see sub-section 2.9
"Standalone Document Declaration" and section 5 "Conformance"
[XML].
Published specification: Extensible Markup Language (XML) 1.0 (
Edition)[XML].
Applications which use this media type: XML is device-, platform-,
and vendor-neutral and is supported by a wide range of Web
agents, WebDAV[RFC2518] clients and servers, as well as
authoring tools
Additional information
Murata, et al. Standards Track [Page 8]
RFC 3023 XML Media Types January 2001
Magic number(s): None
Although no byte sequences can be counted on to always
present, XML MIME entities in ASCII-compatible
(including UTF-8) often begin with hexadecimal 3C 3F 78 6D 6
("hexadecimal
FF 00 3C 00 3F 00 78 00 6D 00 6C or FF FE 3C 00 3F 00 78 00 6
00 6C 00 (the Byte Order Mark (BOM) followed by "
more information, see Appendix F of [XML].
File extension(s): .
Macintosh File Type Code(s): "TEXT
Person and email address for further information
MURATA Makoto (FAMILY Given)
Simon St.Laurent
Daniel Kohn
Intended usage:
Author/Change controller: The XML specification is a work product
the World Wide Web Consortium's XML Working Group, and was
by
Tim Bray
Jean Paoli microsoft.com
C. M. Sperberg-McQueen
Eve Maler
The W3C, and the W3C XML Core Working Group, have change
over the XML specification
3.2 Application/xml
MIME media type name:
MIME subtype name:
Mandatory parameters:
Murata, et al. Standards Track [Page 9]
RFC 3023 XML Media Types January 2001
Optional parameters:
Although listed as an optional parameter, the use of the
parameter is STRONGLY RECOMMENDED, since this information can
used by XML processors to determine authoritatively the charset
the XML MIME entity. The charset parameter can also be used
provide protocol-specific operations, such as charset-
content negotiation in HTTP
"utf-8" [RFC2279] and "utf-16" [RFC2781] are the
values, representing the UTF-8 and UTF-16 charsets, respectively
These charsets are preferred since they are supported by
conforming processors of [XML].
If an application/xml entity is received where the
parameter is omitted, no information is being provided about
charset by the MIME Content-Type header. Conforming
processors MUST follow the requirements in section 4.3.3 of [XML
that directly address this contingency. However, MIME
that are not XML processors SHOULD NOT assume a default charset
the charset parameter is omitted from an application/xml entity
There are several reasons that the charset parameter
authoritative. First, recent web servers have been improved
that users can specify the charset parameter. Second, [RFC2130]
specifies that the recommended specification scheme is
"charset" parameter
On the other hand, it has been argued that the charset
should be omitted and the mechanism described in Appendix F
[XML] (which is non-normative) should be solely relied on.
approach would allow users to avoid configuration of the
parameter; an XML document stored in a file is likely to contain
correct encoding declaration or BOM (if necessary), since
operating system does not typically provide charset
for files. If users would like to rely on the
declaration or BOM and to hide charset information from protocols
they may determine not to use the parameter
Since the charset parameter is authoritative, the charset is
always declared within an XML encoding declaration. Thus,
care is needed when the recipient strips the MIME header
provides persistent storage of the received XML MIME entity (e.g.,
in a file system). Unless the charset is UTF-8 or UTF-16,
recipient SHOULD also persistently store information about
charset, perhaps by embedding a correct XML encoding
within the XML MIME entity
Murata, et al. Standards Track [Page 10]
RFC 3023 XML Media Types January 2001
Encoding considerations: This media type MAY be encoded
appropriate for the charset and the capabilities of the
MIME transport. For 7-bit transports, data in either UTF-8
UTF-16 MUST be encoded in quoted-printable or base64. For 8-
clean transport (e.g., 8BITMIME[RFC1652] ESMTP or NNTP[RFC0977]),
UTF-8 is not encoded, but the UTF-16 family MUST be encoded
base64. For binary clean transports (e.g., HTTP[RFC2616]),
content-transfer-encoding is necessary
Security considerations: See Section 10.
Interoperability considerations: Same as Section 3.1.
Published specification: Same as Section 3.1.
Applications which use this media type: Same as Section 3.1.
Additional information: Same as Section 3.1.
Person and email address for further information: Same as
3.1.
Intended usage:
Author/Change controller: Same as Section 3.1.
3.3 Text/xml-external-parsed-entity
MIME media type name:
MIME subtype name: xml-external-parsed-
Mandatory parameters:
Optional parameters:
The charset parameter of text/xml-external-parsed-entity
handled the same as that of text/xml as described in Section 3.1.
Encoding considerations: Same as Section 3.1.
Security considerations: See Section 10.
Interoperability considerations: XML external parsed entities are
interoperable as XML documents, though they have a less
constrained structure and therefore need to be referenced by
documents for proper handling by XML processors. Similarly,
documents cannot be reliably used as external parsed
Murata, et al. Standards Track [Page 11]
RFC 3023 XML Media Types January 2001
because external parsed entities are prohibited from
standalone document declarations or DTDs. Identifying
external parsed entities with their own content type
enhance interoperability of both XML documents and XML
parsed entities
Published specification: Same as Section 3.1.
Applications which use this media type: Same as Section 3.1.
Additional information
Magic number(s): Same as Section 3.1.
File extension(s): .xml or .
Macintosh File Type Code(s): "TEXT
Person and email address for further information: Same as
3.1.
Intended usage:
Author/Change controller: Same as Section 3.1.
3.4 Application/xml-external-parsed-entity
MIME media type name:
MIME subtype name: xml-external-parsed-
Mandatory parameters:
Optional parameters:
The charset parameter of application/xml-external-parsed-entity
handled the same as that of application/xml as described
Section 3.2.
Encoding considerations: Same as Section 3.2.
Security considerations: See Section 10.
Interoperability considerations: Same as those for text/xml
external-parsed-entity as described in Section 3.3.
Published specification: Same as text/xml as described in
3.1.
Murata, et al. Standards Track [Page 12]
RFC 3023 XML Media Types January 2001
Applications which use this media type: Same as Section 3.1.
Additional information
Magic number(s): Same as Section 3.1.
File extension(s): .xml or .
Macintosh File Type Code(s): "TEXT
Person and email address for further information: Same as
3.1.
Intended usage:
Author/Change controller: Same as Section 3.1.
3.5 Application/xml-dtd
MIME media type name:
MIME subtype name: xml-
Mandatory parameters:
Optional parameters:
The charset parameter of application/xml-dtd is handled the
as that of application/xml as described in Section 3.2.
Encoding considerations: Same as Section 3.2.
Security considerations: See Section 10.
Interoperability considerations: XML DTDs have proven to
interoperable by DTD authoring tools and XML browsers,
others
Published specification: Same as text/xml as described in
3.1.
Applications which use this media type: DTD authoring tools
external DTD subsets as well as external parameter entities.
browsers may also access external DTD subsets and
parameter entities
Murata, et al. Standards Track [Page 13]
RFC 3023 XML Media Types January 2001
Additional information
Magic number(s): Same as Section 3.1.
File extension(s): .dtd or .
Macintosh File Type Code(s): "TEXT
Person and email address for further information: Same as
3.1.
Intended usage:
Author/Change controller: Same as Section 3.1.
3.6
The following list applies to text/xml, text/xml-external-parsed
entity, and XML-based media types under the top-level type "text
that define the charset parameter according to this specification
o Charset parameter is strongly recommended
o If the charset parameter is not specified, the default is "us
ascii". The default of "iso-8859-1" in HTTP is
overridden
o No error handling provisions
o An encoding declaration, if present, is irrelevant, but
saving a received resource as a file, the correct
declaration SHOULD be inserted
The next list applies to application/xml, application/xml-external
parsed-entity, application/xml-dtd, and XML-based media types
top-level types other than "text" that define the charset
according to this specification
o Charset parameter is strongly recommended, and if present,
takes precedence
o If the charset parameter is omitted, conforming XML
MUST follow the requirements in section 4.3.3 of [XML].
Murata, et al. Standards Track [Page 14]
RFC 3023 XML Media Types January 2001
4. The Byte Order Mark (BOM) and Conversions to/from the UTF-16
Section 4.3.3 of [XML] specifies that XML MIME entities in
charset "utf-16" MUST begin with a byte order mark (BOM), which is
hexadecimal octet sequence 0xFE 0xFF (or 0xFF 0xFE, depending
endian). The XML Recommendation further states that the BOM is
encoding signature, and is not part of either the markup or
character data of the XML document
Due to the presence of the BOM, applications that convert XML
"utf-16" to a non-Unicode encoding MUST strip the BOM
conversion. Similarly, when converting from another encoding
"utf-16", the BOM MUST be added after conversion is complete
In addition to the charset "utf-16", [RFC2781] introduces "utf-16le
(little endian) and "utf-16be" (big endian) as well. The BOM
prohibited for these charsets. When an XML MIME entity is encoded
"utf-16le" or "utf-16be", it MUST NOT begin with the BOM but
contain an encoding declaration. Conversion from "utf-16" to "utf
16be" or "utf-16le" and conversion in the other direction MUST
or add the BOM, respectively
5. Fragment
Section 4.1 of [RFC2396] notes that the semantics of a
identifier (the part of a URI after a "#") is a property of the
resulting from a retrieval action, and that the format
interpretation of fragment identifiers is dependent on the media
of the retrieval result
As of today, no established specifications define identifiers for
media types. However, a working draft published by W3C, namely "
Pointer Language (XPointer)", attempts to define fragment
for text/xml and application/xml. The current specification
XPointer is available at http://www.w3.org/TR/xptr
6. The Base
Section 5.1 of [RFC2396] specifies that the semantics of a
URI reference embedded in a MIME entity is dependent on the base URI
The base URI is either (1) the base URI embedded in the MIME entity
(2) the base URI of the encapsulating MIME entity, (3) the URI
to retrieve the MIME entity, or (4) the application-dependent
base URI, where (1) has the highest precedence. [RFC2396]
specifies that the mechanism for embedding the base URI is
on the media type
Murata, et al. Standards Track [Page 15]
RFC 3023 XML Media Types January 2001
As of today, no established specifications define mechanisms
embedding the base URI in XML MIME entities. However, a
Recommendation published by W3C, namely "XML Base", attempts
define such a mechanism for text/xml, application/xml, text/xml
external-parsed-entity, and application/xml-external-parsed-entity
The current specification for XML Base is available
http://www.w3.org/TR/xmlbase
7. A Naming Convention for XML-Based Media
This document recommends the use of a naming convention (a suffix
'+xml') for identifying XML-based MIME media types, whatever
particular content may represent. This allows the use of generic
processors and technologies on a wide variety of different
document types at a minimum cost, using existing frameworks for
type registration
Although the use of a suffix was not considered as part of
original MIME architecture, this choice is considered to provide
most functionality with the least potential for
problems or lack of future extensibility. The alternatives to the '
+xml' suffix and the reason for its selection are described
Appendix A
As XML development continues, new XML document types are
rapidly. Many of these XML document types would benefit from
identification possibilities of a more specific MIME media type
text/xml or application/xml can provide, and it is likely that
new media types for XML-based document types will be registered
the near and ongoing future
While the benefits of specific MIME types for particular types of
documents are significant, all XML documents share common
and syntax that make possible common processing
Some areas where 'generic' processing is useful include
o Browsing - An XML browser can display any XML document with
provided [CSS] or [XSLT] style sheet, whatever the vocabulary
that document
o Editing - Any XML editor can read, modify, and save any
document
o Fragment identification - XPointers (work in progress) can
with any XML document, whatever vocabulary it uses and whether
not it uses XPointer for its own fragment identification
Murata, et al. Standards Track [Page 16]
RFC 3023 XML Media Types January 2001
o Hypertext linking - XLink (work in progress) hypertext linking
designed to connect any XML documents, regardless of vocabulary
o Searching - XML-oriented search engines, web crawlers, agents,
query tools should be able to read XML documents and extract
names and content of elements and attributes even if the tools
ignorant of the particular vocabulary used for elements
attributes
o Storage - XML-oriented storage systems, which keep XML
internally in a parsed form, should similarly be able to process
store, and recreate any XML document
o Well-formedness and validity checking - An XML processor
confirm that any XML document is well-formed and that it is
(i.e., conforms to its declared DTD or Schema).
When a new media type is introduced for an XML-based format, the
of the media type SHOULD end with '+xml'. This convention will
applications that can process XML generically to detect that the
entity is supposed to be an XML document, verify this assumption
invoking some XML processor, and then process the XML
accordingly. Applications may match for types that represent
MIME entities by comparing the subtype to the pattern '*/*+xml'. (
course, 4 of the 5 media types defined in this document -- text/xml
application/xml, text/xml-external-parsed-entity,
application/xml-external-parsed-entity -- also represent XML
entities while not conforming to the '*/*+xml' pattern.)
NOTE: Section 14.1 of HTTP[RFC2616] does not support
headers of the form "Accept: */*+xml" and so this header MUST
be used in this way. Instead, content negotiation[RFC2703]
potentially be used if an XML-based MIME type were needed
XML generic processing is not always appropriate for XML-based
types. For example, authors of some such media types may wish
the types remain entirely opaque except to applications that
specifically designed to deal with that media type. By NOT
the naming convention '+xml', such media types can avoid XML-
processing. Since generic processing will be useful in many cases
however -- including in some situations that are difficult to
ahead of time -- those registering media types SHOULD use the '+xml
convention unless they have a particularly compelling reason not to
The registration process for these media types is described
[RFC2048]. The registrar for the IETF tree will encourage new XML
based media type registrations in the IETF tree to follow
guideline. Registrars for other trees SHOULD follow this
Murata, et al. Standards Track [Page 17]
RFC 3023 XML Media Types January 2001
in order to ensure maximum interoperability of their XML-
documents. Similarly, media subtypes that do not represent XML
entities MUST NOT be allowed to register with a '+xml' suffix
7.1
Registrations for new XML-based media types under the top-level
"text" SHOULD, in specifying the charset parameter and
considerations, define them as: "Same as [charset parameter /
encoding considerations] of text/xml as specified in RFC 3023."
Registrations for new XML-based media types under top-level
other than "text" SHOULD, in specifying the charset parameter
encoding considerations, define them as: "Same as [charset
/ encoding considerations] of application/xml as specified in
3023."
The use of the charset parameter is STRONGLY RECOMMENDED, since
information can be used by XML processors to
authoritatively the charset of the XML MIME entity
These registrations SHOULD specify that the XML-based media
being registered has all of the security considerations described
RFC 3023 plus any additional considerations specific to that
type
These registrations SHOULD also make reference to RFC 3023
specifying magic numbers, fragment identifiers, base URIs, and use
the BOM
These registrations MAY reference the text/xml registration in
3023 in specifying interoperability considerations, if
considerations are not overridden by issues specific to that
type
8.
The examples below give the value of the MIME Content-type header
the XML declaration (which includes the encoding declaration)
the XML MIME entity. For UTF-16 examples, the Byte Order
character is denoted as "{BOM}", and the XML declaration is
to come at the beginning of the XML MIME entity,
following the BOM. Note that other MIME headers may be present,
the XML MIME entity may contain other data in addition to the
declaration; the examples focus on the Content-type header and
encoding declaration for clarity
Murata, et al. Standards Track [Page 18]
RFC 3023 XML Media Types January 2001
8.1 Text/xml with UTF-8
Content-type: text/xml; charset="utf-8"
encoding="utf-8"?>
This is the recommended charset value for use with text/xml.
the charset parameter is provided, MIME and XML processors MUST
the enclosed entity as UTF-8 encoded
If sent using a 7-bit transport (e.g., SMTP[RFC0821]), the XML
entity MUST use a content-transfer-encoding of either quoted
printable or base64. For an 8-bit clean transport (e.g., 8
ESMTP or NNTP), or a binary clean transport (e.g., HTTP),
content-transfer-encoding is necessary
8.2 Text/xml with UTF-16
Content-type: text/xml; charset="utf-16"
{BOM}encoding='utf-16'?>
{BOM}
This is possible only when the XML MIME entity is transmitted
HTTP, which uses a MIME-like mechanism and is a binary-
protocol, hence does not perform CR and LF transformations and
NUL octets. As described in [RFC2781], the UTF-16 family MUST NOT
used with media types under the top-level type "text" except
HTTP (see section 19.4.1 of [RFC2616] for details).
Since HTTP is binary clean, no content-transfer-encoding
necessary
8.3 Text/xml with UTF-16BE
Content-type: text/xml; charset="utf-16be
encoding='utf-16be'?>
Observe that the BOM does not exist. This is again possible
when the XML MIME entity is transmitted via HTTP
Murata, et al. Standards Track [Page 19]
RFC 3023 XML Media Types January 2001
8.4 Text/xml with ISO-2022-KR
Content-type: text/xml; charset="iso-2022-kr
encoding='iso-2022-kr'?>
This example shows text/xml with a Korean charset (e.g., Hangul
encoded following the specification in [RFC1557]. Since the
parameter is provided, MIME and XML processors MUST treat
enclosed entity as encoded per RFC 1557.
Since ISO-2022-KR has been defined to use only 7 bits of data,
content-transfer-encoding is necessary with any transport
8.5 Text/xml with Omitted
Content-type: text/
{BOM}encoding="utf-16"?>
{BOM}
This example shows text/xml with the charset parameter omitted.
this case, MIME and XML processors MUST assume the charset is "us
ascii", the default charset value for text media types specified
[RFC2046]. The default of "us-ascii" holds even if the text/
entity is transported using HTTP
Omitting the charset parameter is NOT RECOMMENDED for text/xml.
example, even if the contents of the XML MIME entity are UTF-16
UTF-8, or the XML MIME entity has an explicit encoding declaration
XML and MIME processors MUST assume the charset is "us-ascii".
8.6 Application/xml with UTF-16
Content-type: application/xml; charset="utf-16"
{BOM}encoding="utf-16"?>
{BOM}
This is a recommended charset value for use with application/xml
Since the charset parameter is provided, MIME and XML processors
treat the enclosed entity as UTF-16 encoded
Murata, et al. Standards Track [Page 20]
RFC 3023 XML Media Types January 2001
If sent using a 7-bit transport (e.g., SMTP) or an 8-bit
transport (e.g., 8BITMIME ESMTP or NNTP), the XML MIME entity MUST
encoded in quoted-printable or base64. For a binary clean
(e.g., HTTP), no content-transfer-encoding is necessary
8.7 Application/xml with UTF-16BE
Content-type: application/xml; charset="utf-16be
encoding='utf-16be'?>
Observe that the BOM does not exist. Since the charset parameter
provided, MIME and XML processors MUST treat the enclosed entity
UTF-16BE encoded
8.8 Application/xml with ISO-2022-KR
Content-type: application/xml; charset="iso-2022-kr
encoding="iso-2022-kr"?>
This example shows application/xml with a Korean charset (e.g.,
Hangul) encoded following the specification in [RFC1557]. Since
charset parameter is provided, MIME and XML processors MUST treat
enclosed entity as encoded per RFC 1557, independent of whether
XML MIME entity has an internal encoding declaration (this
does show such a declaration, which agrees with the
parameter).
Since ISO-2022-KR has been defined to use only 7 bits of data,
content-transfer-encoding is necessary with any transport
8.9 Application/xml with Omitted Charset and UTF-16 XML MIME
Content-type: application/
{BOM}encoding="utf-16"?>
{BOM}
For this example, the XML MIME entity begins with a BOM. Since
charset has been omitted, a conforming XML processor follows
requirements of [XML], section 4.3.3. Specifically, the
processor reads the BOM, and thus knows deterministically that
charset is UTF-16.
Murata, et al. Standards Track [Page 21]
RFC 3023 XML Media Types January 2001
An XML-unaware MIME processor SHOULD make no assumptions about
charset of the XML MIME entity
8.10 Application/xml with Omitted Charset and UTF-8
Content-type: application/
In this example, the charset parameter has been omitted, and there
no BOM. Since there is no BOM, the XML processor follows
requirements in section 4.3.3 of [XML], and optionally applies
mechanism described in Appendix F (which is non-normative) of [XML
to determine the charset encoding of UTF-8. The XML MIME entity
not contain an encoding declaration, but since the encoding is UTF-8,
this is still a conforming XML MIME entity
An XML-unaware MIME processor SHOULD make no assumptions about
charset of the XML MIME entity
8.11 Application/xml with Omitted Charset and Internal
Content-type: application/
encoding="iso-10646-ucs-4"?>
In this example, the charset parameter has been omitted, and there
no BOM. However, the XML MIME entity does have an
declaration inside the XML MIME entity that specifies the entity'
charset. Following the requirements in section 4.3.3 of [XML],
optionally applying the mechanism described in Appendix F (non
normative) of [XML], the XML processor determines the charset of
XML MIME entity (in this example, UCS-4).
An XML-unaware MIME processor SHOULD make no assumptions about
charset of the XML MIME entity
8.12 Text/xml-external-parsed-entity with UTF-8
Content-type: text/xml-external-parsed-entity; charset="utf-8"
encoding="utf-8"?>
This is the recommended charset value for use with text/xml
external-parsed-entity. Since the charset parameter is provided
MIME and XML processors MUST treat the enclosed entity as UTF-8
encoded
Murata, et al. Standards Track [Page 22]
RFC 3023 XML Media Types January 2001
If sent using a 7-bit transport (e.g., SMTP), the XML MIME
MUST use a content-transfer-encoding of either quoted-printable
base64. For an 8-bit clean transport (e.g., 8BITMIME ESMTP or NNTP),
or a binary clean transport (e.g., HTTP) no content-transfer-
is necessary
8.13 Application/xml-external-parsed-entity with UTF-16
Content-type: application/xml-external-parsed-entity
charset="utf-16"
{BOM}encoding="utf-16"?>
{BOM}
This is a recommended charset value for use with application/xml
external-parsed-entity. Since the charset parameter is provided
MIME and XML processors MUST treat the enclosed entity as UTF-16
encoded
If sent using a 7-bit transport (e.g., SMTP) or an 8-bit
transport (e.g., 8BITMIME ESMTP or NNTP), the XML MIME entity MUST
encoded in quoted-printable or base64. For a binary clean
(e.g., HTTP), no content-transfer-encoding is necessary
8.14 Application/xml-external-parsed-entity with UTF-16BE
Content-type: application/xml-external-parsed-entity
charset="utf-16be
encoding="utf-16be"?>
Since the charset parameter is provided, MIME and XML processors
treat the enclosed entity as UTF-16BE encoded
8.15 Application/xml-
Content-type: application/xml-dtd; charset="utf-8"
encoding="utf-8"?>
Charset "utf-8" is a recommended charset value for use
application/xml-dtd. Since the charset parameter is provided,
and XML processors MUST treat the enclosed entity as UTF-8 encoded
Murata, et al. Standards Track [Page 23]
RFC 3023 XML Media Types January 2001
8.16 Application/mathml+
Content-type: application/mathml+
MathML documents are XML documents whose content
mathematical information, as defined by [MathML]. As a format
on XML, MathML documents SHOULD use the '+xml' suffix convention
their MIME content-type identifier. However, no content type has
been registered for MathML and so this media type should not be
until such registration has been completed
8.17 Application/xslt+
Content-type: application/xslt+
Extensible Stylesheet Language (XSLT) documents are XML
whose content describes stylesheets for other XML documents,
defined by [XSLT]. As a format based on XML, XSLT documents
use the '+xml' suffix convention in their MIME content-
identifier. However, no content type has yet been registered
XSLT and so this media type should not be used until
registration has been completed
8.18 Application/rdf+
Content-type: application/rdf+
RDF documents identified using this MIME type are XML documents
content describes metadata, as defined by [RDF]. As a format
on XML, RDF documents SHOULD use the '+xml' suffix convention
their MIME content-type identifier. However, no content type has
been registered for RDF and so this media type should not be
until such registration has been completed
8.19 Image/svg+
Content-type: image/svg+
Murata, et al. Standards Track [Page 24]
RFC 3023 XML Media Types January 2001
Scalable Vector Graphics (SVG) documents are XML documents
content describes graphical information, as defined by [SVG]. As
format based on XML, SVG documents SHOULD use the '+xml'
convention in their MIME content-type identifier. However,
content type has yet been registered for SVG and so this media
should not be used until such registration has been completed
8.20 INCONSISTENT EXAMPLE: Text/xml with UTF-8
Content-type: text/xml; charset="utf-8"
encoding="iso-8859-1"?>
Since the charset parameter is provided in the Content-Type header
MIME and XML processors MUST treat the enclosed entity as UTF-8
encoded. That is, the "iso-8859-1" encoding MUST be ignored
Processors generating XML MIME entities MUST NOT label
charset information between the MIME Content-Type and the
declaration
9. IANA
As described in Section 7, this document updates the [RFC2048]
registration process for XML-based MIME types
10. Security
XML, as a subset of SGML, has all of the same security
as specified in [RFC1874], and likely more, due to its
ubiquitous deployment
To paraphrase section 3 of RFC 1874, XML MIME entities
information to be parsed and processed by the recipient's XML system
These entities may contain and such systems may permit
system level commands to be executed while processing the data.
the extent that an XML system will execute arbitrary command strings
recipients of XML MIME entities may be a risk. In general, it may
possible to specify commands that perform unauthorized
operations or make changes to the display processor's
that affect subsequent operations
In general, any information stored outside of the direct control
the user -- including CSS style sheets, XSL transformations,
declarations, and DTDs -- can be a source of insecurity, by
obvious or subtle means. For example, a tiny "whiteout attack
modification made to a "master" style sheet could make words
critical locations disappear in user documents, without
Murata, et al. Standards Track [Page 25]
RFC 3023 XML Media Types January 2001
modifying the user document or the stylesheet it references. Thus
the security of any XML document is vitally dependent on all of
documents recursively referenced by that document
The entity lists and DTDs for XHTML 1.0[XHTML], for instance,
likely to be a commonly used set of information. Many
will use and trust them, few of whom will know much about the
of security on the W3C's servers, or on any similarly
repository
The simplest attack involves adding declarations that
validation. Adding extraneous declarations to a list of
entities can effectively "break the contract" used by documents.
tiny change that produces a fatal error in a DTD could halt
processing on a large scale. Extraneous declarations are
obvious, but more sophisticated tricks, like changing attributes
being optional to required, can be difficult to track down.
the most dangerous option available to crackers is redefining
values for attributes: e.g., if developers have relied on
attributes for security, a relatively small change might
enormous quantities of information
Apart from the structural possibilities, another option, "
spoofing," can be used to insert text into documents, vandalizing
perhaps conveying an unintended message. Because XML 1.0
multiple entity declarations, and the first declaration
precedence, it's possible to insert malicious content where an
is used, such as by inserting the full text of Winnie the Pooh
every occurrence of —.
Use of the digital signatures work currently underway by the
working group may eventually ameliorate the dangers of
external documents not under one's own control
Use of XML is expected to be varied, and widespread. XML is
scrutiny by a wide range of communities for use as a common
for community-specific metadata. For example, the
Core[RFC2413] group is using XML for document metadata, and a
effort has begun that is considering use of XML for
information. Other groups view XML as a mechanism for
parameters for remote procedure calls. More uses of XML
undoubtedly arise
Security considerations will vary by domain of use. For example,
medical records will have much more stringent privacy and
considerations than XML library metadata. Similarly, use of XML as
parameter marshalling syntax necessitates a case by case
review
Murata, et al. Standards Track [Page 26]
RFC 3023 XML Media Types January 2001
XML may also have some of the same security concerns as plain text
Like plain text, XML can contain escape sequences that,
displayed, have the potential to change the display
environment in ways that adversely affect subsequent operations
Possible effects include, but are not limited to, locking
keyboard, changing display parameters so subsequent displayed text
unreadable, or even changing display parameters to
obscure or distort subsequent displayed material so that its
is lost or altered. Display processors SHOULD either filter
material from displayed text or else make sure to reset all
settings after a given display operation is complete
Some terminal devices have keys whose output, when pressed, can
changed by sending the display processor a character sequence.
this is possible the display of a text object containing
character sequences could reprogram keys to perform some illicit
dangerous action when the key is subsequently pressed by the user
In some cases not only can keys be programmed, they can be
remotely, making it possible for a text display operation to
perform some unwanted action. As such, the ability to program
SHOULD be blocked either by filtering or by disabling the ability
program keys entirely
Note that it is also possible to construct XML documents that
use of what XML terms "entity references" (using the XML meaning
the term "entity" as described in Section 2), to construct
expansions of text. Recursive expansions are prohibited by [XML]
XML processors are required to detect them. However, even non
recursive expansions may cause problems with the finite
resources of computers, if they are performed many times
[ASCII] "US-ASCII. Coded Character Set -- 7-Bit American
Code for Information Interchange", ANSI X3.4-1986, 1986.
[CSS] Bos, B., Lie, H.W., Lilley, C. and I. Jacobs, "
Style Sheets, level 2 (CSS2) Specification", World
Web Consortium Recommendation REC-CSS2, May 1998,
.
[ISO8859] "ISO-8859. International Standard --
Processing -- 8-bit Single-Byte Coded Graphic
Sets -- Part 1: Latin alphabet No. 1, ISO-8859-1:1987",
1987.
Murata, et al. Standards Track [Page 27]
RFC 3023 XML Media Types January 2001
[MathML] Ion, P. and R. Miner, "Mathematical Markup
(MathML) 1.01", World Wide Web Consortium
REC-MathML, July 1999, .
[PNG] Boutell, T., "PNG (Portable Network Graphics
Specification", World Wide Web Consortium
REC-png, October 1996, .
[RDF] Lassila, O. and R.R. Swick, "Resource
Framework (RDF) Model and Syntax Specification",
Wide Web Consortium Recommendation REC-rdf-syntax
February 1999, .
[RFC0821] Postel, J., "Simple Mail Transfer Protocol", STD 10,
821, August 1982.
[RFC0977] Kantor, B. and P. Lapsley, "Network News
Protocol", RFC 977, February 1986.
[RFC1557] Choi, U., Chon, K. and H. Park, "Korean Character
for Internet Messages", RFC 1557, December 1993.
[RFC1652] Klensin, J., Freed, N., Rose, M., Stefferud, E. and D
Crocker, "SMTP Service Extension for 8bit-MIMEtransport",
RFC 1652, July 1994.
[RFC1874] Levinson, E., "SGML Media Types", RFC 1874, December 1995.
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose