As per Relevance of the word structure, we have this rfc below:
Network Working Group M.
Request for Comments: 3072 March 2001
Category:
Structured Data Exchange Format (SDXF
Status of this
This memo provides information for the Internet community. It
not specify an Internet standard of any kind. Distribution of
memo is unlimited
Copyright
Copyright (C) The Internet Society (2001). All Rights Reserved
IESG
This document specifies a data exchange format and, partially, an
that can be used for creating and parsing such a format. The
notes that the same problem space can be addressed using formats
the IETF normally uses including ASN.1 and XML. The document
is strongly encouraged to carefully read section 13 before
SDXF over ASN.1 or XML. Further, when storing text in SDXF, the
is encourage to use the datatype for UTF-8, specified in section 2.5.
This specification describes an all-purpose interchange format
use as a file format or for net-working. Data is organized in
which can be ordered in hierarchical structures. This format
self-describing and CPU-independent
Table of
1. Introduction ................................................. 2
2. Description of the SDXF data format .......................... 3
3. Introduction to the SDXF functions ........................... 5
3.1 General remarks .............................................. 5
3.2 Writing a SDXF buffer ........................................ 5
3.3 Reading a SDXF buffer ........................................ 6
3.4 Example ...................................................... 6
4. Platform independence ........................................ 8
5. Compression .................................................. 9
6. Encryption ...................................................11
7. Arrays........................................................11
8. Description of the SDXF functions ............................12
Wildgrube Informational [Page 1]
RFC 3072 Structured Data Exchange Format March 2001
8.1 Introduction .................................................12
8.2 Basic definitions ............................................13
8.3 Definitions for C++ ..........................................15
8.4 Common Definitions ...........................................16
8.5 Special functions ............................................17
9. 'Support' of UTF-8 ...........................................19
10. Security Considerations .....................................19
11. Some general hints ..........................................20
12. IANA Considerations .........................................20
13. Discussion ..................................................21
13.1 SDXF vs. ASN.1 ..............................................21
13.2 SDXF vs. XML ................................................22
14. Author's Address ............................................24
15. Acknowledgements ............................................24
16. References ..................................................24
17. Full Copyright Statement ....................................26
1.
The purpose of the Structured Data eXchange Format (SDXF) is
permit the interchange of an arbitrary structured data block
different kinds of data (numerical, text, bitstrings). Because
is normalized to an abstract computer architecture
"network format", SDXF is usable as a network interchange
format
This data format is not limited to any application, the demand
this format is that it is usable as a text format for word
processing, as a picture format, a sound format, for remote
calls with complex parameters, suitable for document formats,
interchanging business data, etc
SDXF is self-describing, every program can unpack every SDXF-
without knowing the meaning of the individual data elements
Together with the description of the data format a set of
will be introduced. With the help of these functions one can
and access the data elements of SDXF. The idea is that a
should only use these functions instead of maintaining the
by himself on the level of bits and bytes. (In the speech
object-oriented programming these functions are methods of an
which works as a handle for a given SDXF data block.)
SDXF is not limited to a specific platform, along with a
preparation of the SDXF functions the SDXF data can be
(via network or data carrier) across the boundaries of
architectures (specified by the character code like ASCII, ANSI
EBCDIC and the byte order for binary data).
Wildgrube Informational [Page 2]
RFC 3072 Structured Data Exchange Format March 2001
SDXF is also prepared to compress and encrypt parts or the
block of SDXF data
2. Description of SDXF data format
2.1 First we introduce the term "chunk". A chunk is a data
with a fixed set of components. A chunk may be "elementary"
"structured". The latter one contains itself one or more
chunks
A chunk consists of a header and the data body (content):
+----------+-----+-------+-----------------------------------+
| Name | Pos.| Length| Description |
+----------+-----+-------+-----------------------------------+
| chunk-ID | 1 | 2 | ID of the chunk (unsigned short) |
| flags | 3 | 1 | type and properties of this chunk |
| length | 4 | 3 | length of the following data |
| content | 7 | *) | net data or a list of of chunks |
+----------+-----+-------+-----------------------------------+
(* as stated in "length". total length of chunk is length+6.
chunk ID is a non-zero positive number
or more visually
+----+----+----+----+----+----+----+----+----+-...
| chunkID | fl | length |
+----+----+----+----+----+----+----+----+----+-...
or in ASN.1 syntax
chunk ::=
{
chunkID INTEGER (1..65535),
flags BIT STRING
length OCTET STRING SIZE 3, -- or: INTEGER (0..16777215)
content OCTET
}
2.2 Structured chunk
A structured chunk is marked as such by the flag byte (see 2.5).
Opposed to an elementary chunk its content consists of a list
chunks (elementary or structured):
Wildgrube Informational [Page 3]
RFC 3072 Structured Data Exchange Format March 2001
+----+-+---+-------+-------+-------+-----+-------+
| id |f|len| chunk | chunk | chunk | ... | chunk |
+----+-+---+-------+-------+-------+-----+-------+
With the help of this concept you can reproduce every
structured data into a SDXF chunk
2.3 Some Remarks about the internal representation of the chunk'
elements
Binary values are always in high-order-first (big endian) format
like the binary values in the IP header (network format). A
of 300 (=256 + 32 + 12) is stored
+----+----+----+----+----+----+----+----+----+--
| | | 00 01 2C |
+----+----+----+----+----+----+----+----+----+--
in hexadecimal notation
This is also valid for the chunk-ID
2.4 Character values in the content portion are also an object
adaptation: see chapter 4.
2.5 Meaning of the flag-bits: Let us represent the flag byte in
manner
+-+-+-+-+-+-+-+-+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
| | | | | | | |
| | | | | | | +--
| | | | | | +----
| | | | | +------ short
| | | | +-------- encrypted
| | | +---------- compressed
| | |
+-+-+------------ data type (0..7)
data types are
0 -- pending structure (chunk is inconsistent, see also 11.1)
1 --
2 -- bit
3 --
4 --
5 -- float (ANSI/IEEE 754-1985)
Wildgrube Informational [Page 4]
RFC 3072 Structured Data Exchange Format March 2001
6 -- UTF-8
7 --
2.6 A short chunk has no data body. The 3 byte Length field is used
data bytes instead. This is used in order to save space when
are many small chunks
2.7 Compressed and encrypted chunks are explained in chapter 5 and 6.
2.8 Arrays are explained in chapter 7.
2.9 Handling of UTF-8 is explained in chapter 9.
2.10 Not all combinations of bits are allowed or reasonable
- the flags 'array' and 'short' are mutually exclusive
- 'short' is not applicable for data type 'structure' and 'float'.
- 'array' is not applicable for data type 'structure'.
3. Introduction to the SDXF
3.1 General
The functionality of the SDXF concept is not bounded to
programming language, but of course the functions themselves must
coded in a particular language. I discuss these functions in C
C++, because in the meanwhile these languages are available on
all platforms
All these functions for reading and writing SDXF chunks uses only
parameter, a parameter structure. In C++ this parameter structure
part of the "SDXF class" and the SDXF functions are methods of
class
An exact description of the interface is given in chapter 8.
3.2 Writing a SDXF
For to write SDXF chunks, there are following functions
init -- initialize the parameter
create -- create a new
leave -- "close" a structured
Wildgrube Informational [Page 5]
RFC 3072 Structured Data Exchange Format March 2001
3.3 Reading a SDXF
For to read SDXF chunks, there are following functions
init -- initialize the parameter
enter -- "go into" a structured
next -- "go to" the next chunk inside a structured
extract -- extract the content of an elementary chunk
user's data
leave -- "go out" off a structured
3.4 Example
3.4.1 Writing
For demonstration we use a reduced (outlined) C++ Form of
functions with polymorph definitions
void create (short chunkID); // opens a new structure
void create (short chunkID, char *string);
// creates a new chunk with dataType character, etc.)
The sequence
SDXF x(new); // create the SDXF object "x" for a new
// includes the "init
x.create (3301); // opens a new
x.create (3302, "first chunk");
x.create (3303, "second chunk");
x.create (3304); // opens a new
x.create (3305, "chunk in a structure");
x.create (3306, "next chunk in a structure");
x.leave (); // closes the inner
x.create (3307, "third chunk");
x.leave (); // closes the outer
Wildgrube Informational [Page 6]
RFC 3072 Structured Data Exchange Format March 2001
creates a chunk which we can show graphically like
3301
|
+--- 3302 = "first chunk
|
+--- 3303 = "second chunk
|
+--- 3304
| |
| +--- 3305 = "chunk in a structure
| |
| +--- 3306 = "next chunk in a structure
|
+--- 3307 = "last chunk
3.4.2
A typically access to a structured SDXF chunk is a selection
a loop
SDXF x(old); // defines a SDXF object "x" for an old
x.enter (); // enters the
while (x.rc == 0) // 0 == ok, rc will set by the SDXF
{
switch (x.chunkID
{
case 3302:
x.extract (data1, maxLength1);
// extr. 1st chunk into data
break
case 3303:
x.extract (data2, maxLength2);
// extr. 2nd chunk into data
break
case 3304: // we know this is a
x.enter (); // enters the inner
while (x.rc == 0) // inner
{
switch (x.chunkID
{
case 3305:
x.extract (data3, maxLength3);
// extr. the chunk inside struct
Wildgrube Informational [Page 7]
RFC 3072 Structured Data Exchange Format March 2001
break
case 3306:
x.extract (data4, maxLength4);
// extr. 2nd chunk inside struct
break
}
x.next (); // returns x.rc == 1 at end of
} // end-
break
case 3307:
x.extract (data5, maxLength5);
// extract last chunk into
break
// default: none - ignore unknown chunks !!!
} // end-
x.next (); // returns x.rc = 1 at end of
} // end-
4. Platform
The very most of the computer platforms today have a 8-Bits-in-a-
architecture, which enables data exchange between these platforms
But there are two significant points in which platforms may
different
a) The representation of binary numerical (the short and long int
floats).
b) The representation of characters (ASCII/ANSI vs. EBCDIC
Point (a) is the phenomenon of "byte swapping": How is a short
value 259 = 0x0103 = X'0103' be stored at address 4402?
The two flavours are
4402 4403
01 03 the big-endian,
03 01 the little-endian
Point (b) is represented by a table of the assignment of the 256
possible values of a Byte to printable or control characters. (
ASCII the letter "A" is assigned to value (or position) 0x41 = 65,
EBCDIC it is 0xC1 = 193.)
Wildgrube Informational [Page 8]
RFC 3072 Structured Data Exchange Format March 2001
The solution of these problems is to normalize the data
We fix
(a) The internal representation of binary numerals are 2-
in big-endian order
(b) The internal representation of characters is ISO 8859-1 (
known as Latin 1).
The fixing of point (b) should be regarded as a first strike.
some environment 8859-1 seems not to be the best choice, in a
or russian environment 8859-5 or 8859-7 are appropriate
Nevertheless, in a specific group (or world) of applications, that
to say all the applications which wants to interchange data with
defined protocol (via networking or diskette or something else),
internal character table must be unique
So a possibility to define a translation table (and his inversion
should be given
Important: You construct a SDXF chunk not for a specific addressee
but you adapt your data into a normalized format (or network format).
This adaption is not done by the programmer, it will be done by
create and extract function. An administrator has take care
defining the correct translation tables
5.
As stated in 2.5 there is a flag bit which declares that
following data (elementary or structured) are compressed. This
is not further interpretable until it is decompressed.
is transparently done by the SDXF functions: "create" does
compression for elementary chunks, "leave" for structured chunks
"extract" does the decompression for elementary chunks, "enter"
structured chunks
Transparently means that the programmer has only to tell the
functions that he want compress the following chunk(s).
For choosing between different compression methods and
controlling the decompressed (original) length, there is
additional definition
Wildgrube Informational [Page 9]
RFC 3072 Structured Data Exchange Format March 2001
After the chunk header for a compressed chunk, a compression
is following
+-----------------------+---------------+---------------->
| chunk header | compr. header | compressed
+---+---+---+---+---+---+---+---+---+---+---------------->
|chunkID|flg| length |md | orglength |
+---+---+---+---+---+---+---+---+---+---+---------------->
- 'orglength' is the original (decompressed) length of the data
- 'md' is the "compression method": Two methods are described here
# method 01 for a simple (fast but not very effective
"Run Length 1" or "Byte Run 1" algorithm. (More then
consecutive identical characters are replaced by the number
these characters and the character itself.)
more precisely
The compressed data consists of several sections of
length. Every section starts with a "counter" byte, a
"tiny" (8 bit) integer, which contains a length information
If this byte contains the value "n",
with n >= 0 (and n <128), the next n+1 bytes will be
unchanged
with n < 0 (and n > -128), the next byte will be
-n+1 times
n = -128 will be ignored
Appending blanks will be cutted in general. If these
necessary, they can be reconstructed while "extract"ing
the parameter field "filler" (see 8.2.1) set to
character
# method 02 for the wonderful "deflate" algorithm which
from the "zip"-people
The authors are
Jean-loup Gailly (deflate routine),
Mark Adler (inflate routine), and others
The deflate format is described in [DEFLATE].
The values for the compression method number are maintained
IANA, see chap. 12.1.
Wildgrube Informational [Page 10]
RFC 3072 Structured Data Exchange Format March 2001
6.
As stated in 2.5 there is a flag bit which declares that
following data (elementary or structured) is encrypted. This data
not interpretable until it is decrypted. En/Decryption
transparently done by the SDXF functions, "create" does
encryption for elementary chunks, "leave" for structured chunks
"extract" does the decryption for elementary chunks, "enter"
structured chunks. (Yes it sounds very similar to chapter 5.)
then one encryption method for a given range of applications is
very reasonable. Some encryption algorithms work with block
algorithms. That means that the length of the data to encrypt must
rounded up to the next multiple of this block length. This
(zero means non-blocking) is reported by the encryption
routine (addressed by the option field *encryptProc, see chapter 8.5)
with mode=3. If blocking is used, at least one byte is added,
last byte of the lengthening data contains the number of added
minus one. With this the decryption interface routine can
the real data length
If an application (or network connect handshaking protocol) needs
negotiate an encryption method it should be used a method
maintained by IANA, see chap. 12.2.
Even the en/decryption is done transparently, an encryption
(password) must be given to the SDXF functions. Encryption is
after translating character data into, decryption is done
translation from the internal ("network-") format
If both, encryption and compression are applied on the same chunk
compression is done first - compression on good encrypted data (
strings appears as different after encryption) tends to
compression rates
7.
An array is a sequence of chunks with identical chunk-ID, length
data type
At first a hint: in principle a special definition in SDXF for
an array is not really necessary
It is not forbidden that there are more than one chunk with
chunk-ID within the same structured chunk
Therefore with a sequence of SDX_next / SDX_extract calls one
fill the destination array step by step
Wildgrube Informational [Page 11]
RFC 3072 Structured Data Exchange Format March 2001
If there are many occurrences of chunks with the same chunk-ID (and
comparative small length), the overhead of the chunk-packages may
significant
Therefore the array flag is introduced. An array chunk has only
chunk header for the complete sequence of elementary chunks.
the chunk header for an array chunk, an array header is following
This is a short integer (big endian!) which contains the number
the array elements (CT). Every element has a fixed length (EL),
the chunklength (CL) is CL = EL * CT + 2.
The data elements follows immediately after the array header
The complete array will be constructed by SDX_create, the
array will be read by SDX_extract
The parameter fields (see 8.2.1) 'dataLength' and 'count' are
for the SDXF functions 'extract' and 'create':
Field 'dataLength' is the common length of the array elements
'count' is the actual dimension of the array for 'create' (input).
For the 'extract' function 'count' acts both as an input and
parameter
Input : the maximum
output: the actual array dimension
(If output count is greater than input count, the 'data cutted
warning will be responded and the destination array is filled up
the maximum dimension.)
8. Description of the SDXF
8.1
Following the principles of Object Oriented Programming, not only
description of the data is necessary, but also the functions
manipulate data - the "methods".
For the programmer knowing the methods is more important than
the data structure, the methods has to know the exact
of the data and guarantees the consistence of the data while
them
Wildgrube Informational [Page 12]
RFC 3072 Structured Data Exchange Format March 2001
A SDXF object is an instance of a parameter structure which acts as
programming interface. Especially it points to an actual SDXF
chunk, and, while processing on this data, there is a pointer to
actual inner chunk which will be the focus for the next operation
The benefit of an exact interface description is the same as
for example the standard C library functions: By using
interfaces your code remains platform independent
8.2 Basic
8.2.1 The SDXF Parameter
All SDXF access functions need only one parameter, a pointer to
SDXF parameter structure
First 3 prerequisite definitions
typedef short int ChunkID
typedef unsigned char Byte
typedef struct
{
ChunkID chunkID
Byte flags
char length [3];
Byte data
} Chunk
And now the parameter structure
typedef
{
ChunkID chunkID; // name (ID) of
Byte *container; // pointer to the whole
long bufferSize; // size of
Chunk *currChunk; // pointer to actual
long dataLength; // length of data in
long maxLength; // max. length of Chunk for SDX_
long remainingSize; // rem. size in cont. after SDX_
long value; // for data type
double fvalue; // for data type
char *function; // name of the executed SDXF
Byte *data; // pointer to
Byte *cryptkey; // pointer to Crypt
short count; // (max.) number of elements in an
short dataType; // Chunk data type / init open
short ec; // extended return-
Wildgrube Informational [Page 13]
RFC 3072 Structured Data Exchange Format March 2001
short rc; // return-
short level; // level of
char filler; // filler char for SDX_
Byte encrypt; // Indication if data to encrypt (0 / 1)
Byte compression; // compression
// (00=none, 01=RL1, 02=zip/deflate
} SDX_obj, *SDX_handle
Only the "public" fields of the parameter structure which acts
input and output for the SDXF functions is described here. A
implementation may add some "private" fields to this structure
8.2.2 Basic
All these functions works with a SDX_handle as the only
parameter. Every function returns as output ec and rc as a report
success. For the values for ec, rc and dataType see chap. 8.4.
1. SDX_init : Initialize the parameter structure
input : container, dataType, bufferSize (for dataType =
SDX_NEW only
output: currChunk, dataLength (for dataType = SDX_OLD only),
ec, rc
the other fields of the parameter structure will
initialized
2. SDX_enter : Enter a structured chunk
You can access the first chunk inside this structured chunk
input :
output: currChunk, chunkID, dataLength, level, dataType
ec,
3. SDX_leave : Leave the actual entered structured chunk
input :
output: currChunk, chunkID, dataLength, level, dataType
ec,
4. SDX_next : Go to the next chunk inside a structured chunk
input :
output: currChunk, chunkID, dataLength, dataType, count, ec,
At the end of a structured chunk SDX_next returns rc =
SDX_RC_failed and ec = SDX_EC_eoc (end of chunk
The actual structured chunk is SDX_leave'd automatically
Wildgrube Informational [Page 14]
RFC 3072 Structured Data Exchange Format March 2001
5. SDX_extract : Extract data of the actual chunk
(If actual chunk is structured, only a copy is done,
the data is converted to host format.)
input / output depends on the dataType
if dataType is structured, binary or char
input : data, maxLength, count,
output: dataLength, count, ec,
if dataType is numeric (float resp.):
input :
output: value (fvalue resp.), ec,
6. SDX_select : Go to the (next) chunk with a given chunkID
input :
output: currChunk, dataLength, dataType, ec,
7. SDX_create : Creating a new chunk (at the end of the
structured chunk).
input : chunkID, dataLength, data, (f)value, dataType
compression, encrypt,
update: remainingSize,
output: currChunk, dataLength, ec,
8. SDX_append : Append a complete chunk at the end of the
structured chunk).
input : data, maxLength,
update: remainingSize,
output: chunkID, chunkLength, maxLength, dataType, ec,
8.3 Definitions for C++
This is the specification of the SDXF class in C++: (The type 'Byte
is defined as "unsigned char" for bitstrings, opposed to "
char" for character strings
class C_
{
public
// constructors and destructor
C_SDXF (); //
C_SDXF (Byte *cont); // old
C_SDXF (Byte *cont, long size); // new
C_SDXF (long size); // new
~C_SDXF ();
// methods
Wildgrube Informational [Page 15]
RFC 3072 Structured Data Exchange Format March 2001
void init (void); // old
void init (Byte *cont); // old
void init (Byte *cont, long size); // new
void init (long size); // new
void enter (void);
void leave (void);
void next (void);
long extract (Byte *data, long length); // chars,
long extract (void); // numeric
void create (ChunkID); //
void create (ChunkID, long value); //
void create (ChunkID, double fvalue); //
void create (ChunkID, Byte *data, long length);//
void create (ChunkID, char *data); //
void set_compression (Byte compression_method);
void set_encryption (Byte *encryption_key);
// interface
ChunkID id; // see 8.4.1
short dataType; // see 8.4.2
long length; // length of data or
long value
double fvalue
short rc; // the raw return code see 8.4.3
short ec; // the extended return code see 8.4.4
protected
// implementation dependent ...
};
8.4 Common Definitions
8.4.1 Definition of ChunkID
typedef short ChunkID
8.4.2 Values for dataType
SDX_DT_inconsistent = 0
SDX_DT_structured = 1
SDX_DT_binary = 2
SDX_DT_numeric = 3
SDX_DT_char = 4
SDX_DT_float = 5
Wildgrube Informational [Page 16]
RFC 3072 Structured Data Exchange Format March 2001
SDX_DT_UTF8 = 6
data types for SDX_init
SDX_OLD = 1
SDX_NEW = 2
8.4.3 Values for rc
SDX_RC_ok = 0
SDX_RC_failed = 1
SDX_RC_warning = 1
SDX_RC_illegalOperation = 2
SDX_RC_dataError = 3
SDX_RC_parameterError = 4
SDX_RC_programError = 5
SDX_RC_noMemory = 6
8.4.4 Values for ec
SDX_EC_ok = 0
SDX_EC_eoc = 1 // end of
SDX_EC_notFound = 2
SDX_EC_dataCutted = 3
SDX_EC_overflow = 4
SDX_EC_wrongInitType = 5
SDX_EC_comprerr = 6 // compression
SDX_EC_forbidden = 7
SDX_EC_unknown = 8
SDX_EC_levelOvflw = 9
SDX_EC_paramMissing = 10
SDX_EC_magicError = 11
SDX_EC_not_consistent = 12
SDX_EC_wrongDataType = 13
SDX_EC_noMemory = 14
SDX_EC_error = 99 // rc is
8.5 Special
Besides the basic definitions there is a global
(SDX_getOptions) which returns a pointer to a global table
options
With the help of these options you can adapt the behaviour of SDXF
Especially you can define an alternative pair of translation
or an alternative function which reads these tables from an
resource (p.e. from disk).
Wildgrube Informational [Page 17]
RFC 3072 Structured Data Exchange Format March 2001
Within this table of options there is also a pointer to the
which is used for encryption / decryption: You can install your
encryption algorithm by setting this pointer
The options pointer is received by
SDX_TOptions *opt = SDX_getOptions ();
With
typedef
{
Byte *toHost; // Trans tab net ->
Byte *toNet; // Trans tab host ->
int maxlevel; // highest possible
int translation; // translation net <->
// is in effect=1 or not=0
TEncryptProc *encryptProc; // alternate encryption
TGetTablesProc *getTablesProc; // alternate routine
// translation
TcvtUTF8Proc *convertUTF8; // routine to convert to/from UTF-8
} SDX_TOptions
typedef long TencryptProc (
int mode, // 1= to encrypt, 2= to decrypt, 3= encrypted
Byte *buffer, // data to en/
long len, // len: length of
char *passw); //
// returns length of en/de-crypted
// (parameter buffer and passw are ignored for mode=3)
// returns blocksize for mode=3 and len=0.
// blocksize is zero for non-blocking
typedef int TGetTablesProc (Byte **toNet, Byte **toHost);
// toNet, toHost: pointer to output params. Both
// points to translation tables of 256 Bytes
// returns success: 1 = ok, 0 = error
typedef int TcvtUTF8
( int mode, // 1 = to UTF-8, 2 = from UTF-8
Byte *target, int *targetlength, //
Byte *source, int sourcelength); //
// targetlength contains maximal size as input param
// returns success: 1 = ok, 0 = no
Wildgrube Informational [Page 18]
RFC 3072 Structured Data Exchange Format March 2001
9. 'Support' of UTF-8.
Many systems supports [UTF-8] as a character format for
data. The benefit is that no fixing of a specific character set
an application is needed because the set of 'all' characters is used
represented by the 'Universal Character Set' UCS-2 [UCS], a
byte coding for characters
SDXF does not really deal with UTF-8 by itself, there are
possibilities to interprete an UTF-8 sequence: The application may
- reconstruct the UCS-2 sequence
- accepts only the pure ASCII character and maps non-ASCII to
special 'non-printable' character
- target is pure ASCII, non-ASCII is replaced in a senseful
(French accented vowels replaced by vowels without accents, etc.).
- target is a specific ANSI character set, the non-ASCII chars
mapped as possible, other replaced to a 'non-printable'.
- etc
But SDXF offers an interface for the 'extract' and 'create
functions
A function pointer may be specified in the options table to
this possibility (see 8.5). Default for this pointer is NULL:
further conversions are done by SDXF, the data are copied 'as is',
is treated as a bit string as for data type 'binary'.
If this function is specified, it is used by the 'create'
with the 'toUTF8' mode, and by the 'extract' function with the '
fromUTF8' mode. The invoking of these functions is done by
transparently
If the function returns zero (no conversion) SDXF copies the
without conversion
10. Security
Any corruption of data in the chunk headers denounce the
SDXF structure
Any corruption of data in a encrypted or compressed SDXF
makes this chunk unusable. An integrity check after decryption
decompression should be done by the "enter" function
While using TCP/IP (more precisely: IP) as a transmission medium
can trust on his CRC check on the transport layer
Wildgrube Informational [Page 19]
RFC 3072 Structured Data Exchange Format March 2001
11. Some general
1. A consistent construction of a SDXF structure is done if
"create" to a structured chunk is closed by a paired "leave".
While a structured chunk is under construction, his data type
set to zero - that means: this chunk is inconsistent.
SDX_leave function sets the datatype to "structured".
2. While creating an elementary chunk a platform
transformation to a platform independent format of the data
performed - at the end of construction the content of the
is ready to transport to another site, without any
translation
3. As you see no data definition in your programming language
needed for to construct a specific SDXF structure. The data
created dynamically by function calls
4. With SDXF as a base you can define protocols for client /
applications. These protocols may be extended in
compatibility manner by following two rules
Rule 1: Ignore unknown chunkIDs
Rule 2: The sequence of chunks should not be significant
12. IANA
The compression and encryption algorithms for SDXF is not fixed,
is open for various algorithms. Therefore an agreement is
to interprete the compression and encryption algorithm
numbers. (Encryption methods are not a semantic part of SDXF,
may be used for a connection protocol to negotiate the
method to use.)
Following two items are registered by IANA
12.1 COMPRESSION METHODS FOR
The compressed SDXF chunk starts with a "compression header".
header contains the compression method as an unsigned 1-Byte
(1-255). These numbers are assigned by IANA and listed here
Wildgrube Informational [Page 20]
RFC 3072 Structured Data Exchange Format March 2001
method Description
--------- ------------------------------- -------------
01 RUN-LENGTH algorithm see chap. 5
02 DEFLATE (ZIP) see [DEFLATE
03-239 IANA to
240-255 private or application
12.2 ENCRYPTION METHODS FOR
An unique encryption method is fixed or negotiated by handshaking
For the latter one a number for each encryption method is necessary
These numbers are unsigned 1-Byte integers (1-255). These
are assigned by IANA and listed here
method
--------- ------------------------------
01-239 IANA to
240-255 private or application
12.3 Hints for assigning a number
Developers which want to register a compression or encrypt method
SDXF should contact IANA for a method number. The ASSIGNED
document should be referred to for a current list of METHOD
and their corresponding protocols, see [IANA]. The new method
be a standard published as a RFC or by a established
organization (as OSI).
13.
There are already some standards for Internet data exchanging,
prefers ASN.1 and XML therefore. So the reasons for establish a
data format should be discussed
13.1 SDXF vs. ASN.1
The demand of ASN.1 (see [ASN.1]) is to serve program
independent means to define data structures. The real data
which is used to send the data is not defined by ASN.1 but
BER or PER (or some derivates of them like CER and DER) are used
this context, see [BER] and [PER].
Wildgrube Informational [Page 21]
RFC 3072 Structured Data Exchange Format March 2001
The idea behind ASN.1 is: On every platform on which a
application is to develop descriptions of the used data
are available in ASN.1 notation. Out off these notations the
language dependent definitions are generated with the help of
ASN.1-compiler
This compiler generates also transform functions for these
structures for to pack and unpack to and from the BER (or other
format
A direct comparison between ASN.1 and SDXF is somehow inappropriate
The data format of SDXF is related rather to BER (and relatives).
The use of ASN.1 to define data structures is no contradiction
SDXF, but: SDXF does not require a complete data structure to
the message to send, nor a complete data structure will be
out off the received message
The main difference lies in the concept of building
interpretation of the message, I want to name it the "static"
"dynamic" concept
o ASN.1 uses a "static" approach: The whole data structure
exists before the message can be created
o SDXF constructs and interpretes the message in a "dynamic" way
the message will be packed and unpacked step by step by
functions
The use of static structures may be appropriate for a series
applications, but for complex tasks it is often impossible to
the message as a whole. As an example try to define an ASN.1
description for a complex structured text document which is
in XML: There are sections and paragraphs and text elements
may recursively consist of sections with specific text attributes
13.2 SDXF vs.
On the one hand SDXF and XML are similar as they can handle
recursive complex data stream. The main difference is the kind
data which are to be maintained
o XML works with pure text data (though it should be noted that
character representation is not standardized by XML). And: a
document with all his tags is readable by human. Binary data
graphic is not included directly but may be referenced by
external link as in HTML
Wildgrube Informational [Page 22]
RFC 3072 Structured Data Exchange Format March 2001
In XML there is no strong separation between informational
control data, escape characters (like "<" and "&") and
construction are used to distinguish between
two types of data
o SDXF maintains machine-readable data, it is not designed to
readable by human nor to edit SDXF data with a text editor (
more if compression and encryption is used). With the help of
SDXF functions you have a quick and easy access to every
element. The standard parser for a SDXF data structure
always a simple template, the "while - switch -case ID -
enter/extract" pattern as outlined in chap. 3.4.2.
Because of the complete different philosophy behind XML and SDXF (
even ASN.1) a direct comparison may not be very senseful, as XML
its own right to exist next to ASN.1 (and even SDXF).
Nevertheless there is a chance to convert a XML data stream into
SDXF structure: As a first strike, every XML tag becomes a
chunk ID. An elementary sequence pure text can
transformed into an elementary (non-structured) chunk with data
"character". Tags with attributes and sequences with nested tags
transformed into structured chunks. Because XML allows a
sequence everywhere in a text stream, an artificially "
text" tag must be introduced
If is the tag for text elements, the sequence
this is a text with attributes
is to be "in thought" replaced by
this is a text with
attributes
(With "et" as the "elementary text" tag
Wildgrube Informational [Page 23]
RFC 3072 Structured Data Exchange Format March 2001
This results in following SDXF structure
ID_
|
+-- ID_et = " this is a text "
|
+-- ID_
| |
| +-- ID_value = "bold
| |
| +-- ID_et = "with
|
+-- ID_et = " attributes
ID_t and ID_et may be represented by the same chunk ID,
distinguished by the data type ("structured" for and "character
for )
Binary data as pictures can be directly imbedded into a
structure instead referencing them as an external link like in HTML
14. Author's
Max
Schlossstrasse 120
60486
EMail: max@wildgrube.
15.
I would like to thank Michael J. Slifcak (mslifcak@iss.net) for
supporting discussions
16.
[ASN.1] Information processing systems - Open
Interconnection, "Specification of Abstract Syntax
One (ASN.1)", International Organization
Standardization, International Standard 8824,
1987.
[BER] Information Processing Systems - Open
Interconnection - "Specification of Basic Encoding
for Abstract Notation One (ASN.1)",
Organization for Standardization, International
8825-1, December 1987.
Wildgrube Informational [Page 24]
RFC 3072 Structured Data Exchange Format March 2001
[DEFLATE] Deutsch, P., "DEFLATE Compressed Data Format
version 1.3", RFC 1951, May 1996.
[IANA] Internet Assigned Numbers Authority
http://www.iana.org/numbers.
[PER] Information Processing Systems - Open
Interconnection -"Specification of Packed Encoding
for Abstract Syntax Notation One (ASN.1)",
Organization for Standardization, International
8825-2.
[UCS] ISO/IEC 10646-1:1993. International Standard --
technology -- Universal Multiple-Octet Coded Character
(UCS
[UTF8] Yergeau, F., "UTF-8, a transformation format of ISO 10646",
RFC 2279, January 1998.
Wildgrube Informational [Page 25]
RFC 3072 Structured Data Exchange Format March 2001
17. Full Copyright
Copyright (C) The Internet Society (2001). All Rights Reserved
This document and translations of it may be copied and furnished
others, and derivative works that comment on or otherwise explain
or assist in its implementation may be prepared, copied,
and distributed, in whole or in part, without restriction of
kind, provided that the above copyright notice and this paragraph
included on all such copies and derivative works. However,
document itself may not be modified in any way, such as by
the copyright notice or references to the Internet Society or
Internet organizations, except as needed for the purpose
developing Internet standards in which case the procedures
copyrights defined in the Internet Standards process must
followed, or as required to translate it into languages other
English
The limited permissions granted above are perpetual and will not
revoked by the Internet Society or its successors or assigns
This document and the information contained herein is provided on
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED,
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE
Funding for the RFC Editor function is currently provided by
Internet Society
Wildgrube Informational [Page 26]
if you see any problems within the linking, don't worry be happy,
this is version 0.1 of the Relevance System and you gotta expect some crappy subroutines sometimes,
just be content we did not write this in Java, which would have made this "bigger and better" HAHAHHA.
RFC documents can be found at I.E.T.F.
Relevance System Copyright © 2002 Spectrum WorldResearch
other technical nosh by ServerMasters Corporation
collaboration of BobX