Comparison of data serialization formats

This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

Overview

Name Creator-maintainer Based on Standardized? Specification Binary? Human-readable? Supports references?e Schema-IDL? Standard APIs Supports Zero-copy operations
Apache Avro Apache Software Foundation N/A Yes Apache Avro™ 1.7.5 Specification Yes No N/A Yes (built-in) N/A N/A
ASN.1 ISO, IEC, ITU-T N/A Yes ISO/IEC 8824; X.680 series of ITU-T Recommendations Yes
(BER, DER, PER, OER, or custom via ECN)
Yes
(XER, GSER, or custom via ECN)
Partialf Yes (built-in) N/A N/A
Bencode Bram Cohen (creator)
BitTorrent, Inc. (maintainer)
N/A Yes Part of BitTorrent protocol specification Partially
(numbers and delimiters are ASCII)
No No No No N/A
Binn Bernardo Ramos N/A Yes Binn Specification Yes No No No No Yes
Bond Microsoft N/A No Bond IDL Specification Yes Yes
(JSON,
XML)
No Yes No N/A
BSON MongoDB JSON Yes BSON Specification Yes No No No No N/A
Candle Markup Henry Luo XML, JSON, JavaFX Yes Candle Markup Reference No Yes Yes
(XPointer, XPath)
Yes
(Candle Pattern Reference)
Yes
(XQuery, XPath)
N/A
Cap’n Proto Kenton Varda N/A No Cap'n Proto Encoding Spec Yes No Yes Yes No Yes
Comma-separated values (CSV) RFC author:
Yakov Shafranovich
N/A Partial
(myriad informal variants used)
RFC 4180
(among others)
No Yes No No No No
D-Bus Message Protocol freedesktop.org N/A Yes D-Bus Specification Yes No No Partial
(Signature strings)
Yes
(see D-Bus)
N/A
Flat Buffers Google N/A N/A flatbuffers github page Specification Yes No N/A Supports references?e Yes C++, with support for Java, C# and Go Yes
GVariant GLib D-Bus MP Yes GVariant Serialization Yes No No Yes
(Type strings)
No N/A
Fast Infoset ISO, IEC, ITU-T XML Yes ITU-T X.891 and ISO/IEC 24824-1:2007 Yes Yes
(XML)
Yes
(XPointer, XPath)
Yes
(XML schema)
Yes
(DOM, SAX, XQuery, XPath)
N/A
HOCON Typesafe Inc. JSON No "HOCON (Human-Optimized Config Object Notation)" No Yes Yes ? Yes
(native Java API for all JVM languages)
No
JSON Douglas Crockford JavaScript syntax Yes RFC 7159
(ancillary:
RFC 6901,
RFC 6902)
No, but see BSON, Smile, UBJSON Yes Yes
(JSON Pointer (RFC 6901);
alternately:
JSONPath, JPath, JSPON, json:select()), JSON-LD
Partial
(JSON Schema Proposal, Kwalify, Rx, Itemscript Schema), JSON-LD
Partial
(Clarinet, JSONQuery, JSONPath), JSON-LD
No
KMIP Oasis n/a Yes Oasis Yes (Tag, Type, Length, Value) Yes No No No N/A
MessagePack Sadayuki Furuhashi JSON (loosely) Yes MessagePack format specification Yes No No No No Yes
Netstrings Dan Bernstein N/A Yes netstrings.txt Yes Yes No No No Yes
OGDL Rolf Veen ? Yes Specification Yes
(Binary Specification)
Yes Yes
(Path Specification)
Yes
(Schema WD)
N/A
OPC-UA Binary OPC Foundation N/A Yes opcfoundation.org Yes No Yes No No N/A
PHP's serialize() & unserialize() PHP Group N/A Yes No Yes Yes Yes No Yes N/A
Data::Dumper format (Core Perl Module) Gurusamy Sarathy (ActiveState developer) Perl data types Yes No ? Yes No ? Yes N/A
Property list NeXT (creator)
Apple (maintainer)
? Partial Public DTD for XML format Yesa Yesb No ? Cocoa, CoreFoundation, OpenStep, GnuStep No
Protocol Buffers (protobuf) Google N/A Yes Developer Guide: Encoding Yes Partiald No Yes (built-in) C++, Java, Python No
ROOT CERN & FNAL N/A No N/A Yes Yes
(optional XML output for debugging)
Yes Yes
(C++ object persistency framework)
Yes
(Native C++ API, bindings for Python, Ruby, and others)
N/A
S-expressions Internet Draft author:
Ron Rivest
Lisp, Netstrings Partial
(largely de facto)
"S-Expressions" Internet Draft Yes
("Canonical representation")
Yes
("Advanced transport representation")
No No N/A
SCaViS jWork.ORG N/A Yes N/A Yes Yes
(XML, Java Serialization, ProtocolBuffers)
Yes Yes
(Java object persistency, XML, ProtocolBuffers)
Yes
(Native Java API, bindings for Jython, JRuby, Groovy and others)
N/A
Smile Tatu Saloranta JSON Yes Smile Format Specification Yes No No Partial
(JSON Schema Proposal, other JSON schemas/IDLs)
Partial
(via JSON APIs implemented with Smile backend, on Jackson, Python)
N/A
Structured Data eXchange Formats Max Wildgrube N/A Yes RFC 3072 Yes No No No N/A
Thrift Facebook (creator)
Apache (maintainer)
N/A No Original whitepaper Yes Partialc No Yes (built-in) N/A
UBJSON The Buzz Media, LLC JSON, BSON No Yes No No No No N/A
VelocyPack (VPack) ArangoDB N/A No VelocyPack (VPack) Version 1 Specification Yes No Partialg No Yes
(C++ API reference implementation)
Yes
eXternal Data Representation (XDR) Sun Microsystems (creator)
IETF (maintainer)
N/A Yes RFC 4506 Yes No Yes Yes Yes N/A
XML W3C SGML Yes W3C Recommendations:
1.0 (Fifth Edition)
1.1 (Second Edition)
Partial
(Binary XML)
Yes Yes
(XPointer, XPath)
Yes
(XML schema, RELAX_NG)
Yes
(DOM, SAX, XQuery, XPath)
No
XML-RPC Dave Winer[1] XML, SOAP[1] Yes XML-RPC Specification No Yes No No No No
YAML Clark Evans,
Ingy döt Net,
and Oren Ben-Kiki
C, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON[2] Yes Version 1.2 No Yes Yes Partial
(Kwalify, Rx, built-in language type-defs)
No No

Syntax comparison of human-readable formats

Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
ASN.1
(XML Encoding Rules)
<foo /> <foo>true</foo> <foo>false</foo> <foo>685230</foo> <foo>6.8523015e+5</foo> <foo>A to Z</foo>
<SeqOfUnrelatedDatatypes>
    <isMarried>true</isMarried>
    <hobby />
    <velocity>-42.1e7</velocity>
    <bookname>A to Z</bookname>
    <bookname>We said, "no".</bookname>
</SeqOfUnrelatedDatatypes>
An object (the key is a field name):
<person>
    <isMarried>true</isMarried>
    <hobby />
    <height>1.85</height>
    <name>Bob Peterson</name>
</person>

A data mapping (the key is a data value):

<competition>
    <measurement>
        <name>John</name>
        <height>3.14</height>
    </measurement>
    <measurement>
        <name>Jane</name>
        <height>2.718</height>
    </measurement>
</competition>

a

Candle Markup (), "" true false 685230
-685230
6.8523015e+5 "A to Z"
"""
A
to
Z
"""
(true, (), -42.1e7, "A to Z")
_{%342=true A%20to%20Z=(1, 2, 3)}
or
_{
  _{key=42 value=true}
  _{key="A to Z" value=(1, 2, 3)}
}
CSVb nulla
(or an empty element in the row)a
1a
truea
0a
falsea
685230
-685230a
6.8523015e+5a A to Z
"We said, ""no""."
true,,-42.1e7,"A to Z"
42,1
A to Z,1,2,3
Netstringsc 0:,a
4:null,a
1:1,a
4:true,a
1:0,a
5:false,a
6:685230,a 9:6.8523e+5,a 6:A to Z, 29:4:true,0:,7:-42.1e7,6:A to Z,, 41:9:2:42,1:1,,25:6:A to Z,12:1:1,1:2,1:3,,,,a
JSON null true false 685230
-685230
6.8523015e+5 "A to Z" [true, null, -42.1e7, "A to Z"] {"42": true, "A to Z": [1, 2, 3]}
OGDL nulla truea falsea 685230a 6.8523015e+5a "A to Z"
'A to Z'
NoSpaces
true
null
-42.1e7
"A to Z"

(true, null, -42.1e7, "A to Z")

42
  true
"A to Z"
  1
  2
  3
42
  true
"A to Z", (1, 2, 3)
OpenDDL ref {null} bool {true} bool {false} int32 {685230}
int32 {0x74AE}
int32 {0b111010010101110}
float {6.8523015e+5} string {"A to Z"} Homogeneous array:
int32 {1, 2, 3, 4, 5}

Heterogeneous array:

array
{
    bool {true}
    ref {null}
    float {-42.1e7}
    string {"A to Z"}
}
dict
{
    value (key = "42") {bool {true}}
    value (key = "A to Z") {int32 {1, 2, 3}}
}
PHP's serialize() & unserialize() N; b:1; b:0; i:685230;
i:-685230;
d:685230.150000000023283064365386962890625;
d:INF;
d:-INF;
d:NAN;
s:6:"A to Z"; a:4:{i:0;b:1;i:1;N;i:2;d:-421000000;i:3;s:6:"A to Z";} Associative array:
a:2:{i:42;b:1;s:6:"A to Z";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}
Object:
O:8:"stdClass":2:{s:4:"John";d:3.140000000000000124344978758017532527446746826171875;s:4:"Jane";d:2.717999999999999971578290569595992565155029296875;}
Property list
(plain text format)[4]
N/A <*BY> <*BN> <*I685230> <*R6.8523015e+5> "A to Z" ( <*BY>, <*R-42.1e7>, "A to Z" )
{
    "42" = <*BY>;
    "A to Z" = ( <*I1>, <*I2>, <*I3> );
}
Property list
(XML format)[5][6]
N/A <true /> <false /> <integer>685230</integer> <real>6.8523015e+5</real> <string>A to Z</string>
<array>
    <true />
    <real>-42.1e7</real>
    <string>A to Z</string>
</array>
<dict>
    <key>42</key>
    <true />
    <key>A to Z</key>
    <array>
        <integer>1</integer>
        <integer>2</integer>
        <integer>3</integer>
    </array>
</dict>
S-expressions NIL
nil
T
#te
true
NIL
#fe
false
685230 6.8523015e+5 abc
"abc"
#616263#
3:abc
{MzphYmM=}
|YWJj|
(T NIL -42.1e7 "A to Z") ((42 T) ("A to Z" (1 2 3)))
YAML ~
null
Null
NULL[7]
y
Y
yes
Yes
YES
on
On
ON
true
True
TRUE[8]
n
N
no
No
NO
off
Off
OFF
false
False
FALSE[8]
685230
+685_230
-685230
02472256
0x_0A_74_AE
0b1010_0111_0100_1010_1110
190:20:30[9]
6.8523015e+5
685.230_15e+03
685_230.15
190:20:30.15
.inf
-.inf
.Inf
.INF
.NaN
.nan
.NAN[10]
A to Z
"A to Z"
'A to Z'
[y, ~, -42.1e7, "A to Z"]
- y
-
- -42.1e7
- A to Z
{"John":3.14, "Jane":2.718}
42: y
A to Z: [1, 2, 3]
XMLd <null />a <boolean val="true"/>a

<true />a

<boolean val="false"/>a

<false />a

<integer>685230</integer>a <float>6.8523015e+5</float>a A to Z a
<array>
  <element type="boolean">true</element>
  <element type="null"/>
  <element type="float">-42.1e7</element>
  <element type="string">A to Z</element>
</array>
a
<associative-array>
  <entry>
    <key type="integer">42</key>
    <value type="boolean">true</value>
  </entry>
  <entry>
    <key type="string">A to Z</key>
    <value>
      <array>
        <element type="integer" val="1"/>
        <element type="integer" val="2"/>
        <element type="integer" val="3"/>
      </array>
    </value>
  </entry>
</associative-array>
XML-RPC <value><boolean>1</boolean></value> <value><boolean>0</boolean></value> <value><int>685230</int></value> <value><double>6.8523015e+5</double></value> <value><string>A to Z</string></value>
<value><array>
  <data>
  <value><boolean>1</boolean></value>
  <value><double>-42.1e7</double></value>
  <value><string>A to Z</string></value>
  </data>
  </array></value>
<value><struct>
  <member>
    <name>42</name>
    <value><boolean>1</boolean></value>
    </member>
  <member>
    <name>A to Z</name>
    <value>
      <array>
        <data>
          <value><int>1</int></value>
          <value><int>2</int></value>
          <value><int>3</int></value>
          </data>
        </array>
      </value>
    </member>
</struct>

Comparison of binary formats

Format Null Booleans Integer Floating-point String Array Associative array/Object
ASN.1
(BER, PER or OER encoding)
NULL type BOOLEAN:
  • BER: as 1 byte in binary form;
  • PER: as 1 bit;
  • OER: as 1 byte
INTEGER:
  • BER: variable-length big-endian binary representation (up to 2^(2^1024) bits);
  • PER Unaligned: a fixed number of bits if the integer type has a finite range; a variable number of bits otherwise;
  • PER Aligned: a fixed number of bits if the integer type has a finite range and the size of the range is less than 65536; a variable number of octets otherwise;
  • OER: one, two, or four octets (either signed or unsigned) if the integer type has a finite range that fits in that number of octets; a variable number of octets otherwise
REAL:

base-10 real values are represented as character strings in ISO 6093 format;

binary real values are represented in a binary format that includes the mantissa, the base (2, 8, or 16), and the exponent;

the special values NaN, -INF, +INF, and negative zero are also supported

Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString, UTF8String) data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order) user definable type
Binn[11] \x00 True: \x01
False: \x02
big-endian 2's complement signed and unsigned 8/16/32/64 bits single: big-endian binary32
double: big-endian binary64
UTF-8 encoded, null terminated, preceded by int8 or int32 string length in bytes Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + list items Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + key/value pairs
BSON[12] Null type - 0 bytes for value True: one byte \x01
False: \x00
int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement double: little-endian binary64 UTF-8 encoded, preceded by int32 encoded string length in bytes BSON embedded document with numeric keys BSON embedded document
Concise Binary Object Representation (CBOR)[13] \xf6 True: \xf5
False: \xf4
Small positive number \x00-\x17, small negative number \x20-\x37 (abs(N) <= 23)

8bit: positive \x18\xhh, negative \x38\xhh
16bit: positive \x19<uint16_t>, negative \x39<uint16_t>
32bit: positive \x1A<uint32_t>, negative \x3A<uint32_t>
64bit: positive \x1B<uint64_t>, negative \x3B<uint64_t>
Negative number x encoded as ~x (binary inversion) or as (-x-1)
Byte order - Big-endian

Typecode (one byte) + IEEE half/single/double Typecode with length (like integer coding) and content.

Bytestring and UTF-8 have different typecode

Typecode with count (like integer coding) and items Typecode with pairs count (like integer coding) and pairs
MessagePack \xc0 True: \xc3
False: \xc2
Single byte "fixnum" (values -32..127)

or typecode (one byte) + big-endian (u)int8/16/32/64

Typecode (one byte) + IEEE single/double Typecode + up to 15 bytes
or
typecode + length as uint8/16/32 + bytes;
encoding is unspecified[14]
As "fixarray" (single-byte prefix + up to 15 array items)

or typecode (one byte) + 2-4 bytes length + array items

As "fixmap" (single-byte prefix + up to 15 key-value pairs)

or typecode (one byte) + 2-4 bytes length + key-value pairs

Netstrings 0:, True: 1:1,

False: 1:0,

OGDL Binary
Property list
(binary format)
Protocol Buffers[15] Variable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value (n << 1) XOR (n >> 31)

Variable encoding length signed 64-bit: varint encoding of "ZigZag"-encoded (n << 1) XOR (n >> 63)
Constant encoding length 32-bit: 32 bits in little-endian 2's complement
Constant encoding length 64-bit: 64 bits in little-endian 2's complement

floats: little-endian binary32

doubles: little-endian binary64

UTF-8 encoded, preceded by varint-encoded integer length of string in bytes Repeated value with the same tag N/A
Sereal 0x25 True: 0x3b
False: 0x3a
Single byte POS/NEG (values -16..15)

or typecode (one byte) + "varint" encoded variable length integer or typecode (one byte) + "zigzag" encoded variable length integer

Typecode (one byte) + IEEE single/double/quad As "SHORT_BINARY" (single-byte prefix + up to 31 raw bytes)

or typecode (one byte, including boolean UTF8-encoding flag) + "varint" encoded length + raw bytes

As "ARRAYREF" (single-byte prefix + up to 15 array items)

or typecode (one byte) + "varint" encoded length + array items

As "HASHREF" (single-byte prefix + up to 15 key-value pairs)

or typecode (one byte) + "varint" encoded length + key-value pairs. Distinguishes hashmaps from objects / class instances.

Smile \x21 True: \x23
False: \x22
Single byte "small" (values -16..15 encoded using \xc0 - \xdf),

zigzag-encoded varints (1 - 11 databytes), or BigInteger

IEEE single/double, BigDecimal Length-prefixed "short" Strings (up to 64 bytes), marker-terminated "long" Strings and (optional) back-references Arbitrary-length heterogenous arrays with end-marker Arbitrary-length key/value pairs with end-marker
Structured Data eXchange Formats (SDXF) big-endian signed 24bit or 32bit integer big-endian IEEE double either UTF-8 or ISO 8859-1 encoded list of elements with identical ID and size, preceded by array header with int16 length chunks can contain other chunks to arbitrary depth
Thrift
Transenc 0x82 True: 0x81
False: 0x80
Single byte integers in the range [-32;127]

Fixed length integers for 8-bits, 16-bits, 32-bits, and 64-bits integers.

Encoded as two's complement little-endian values.

Little-endian IEEE single/double precision numbers. UTF-8 encoded type-length-value string. Balanced brackets with an optional array count. Arrays can be nested. Balanced brackets with an optional object count. Objects can be nested.
VelocyPack[16] 0x00 none,
0x18 null
True: 0x1a
False: 0x19
signed integers, little-endian, 1 to 8 bytes, 2's complement: 0x20-0x27 + int;

unsigned integers, little-endian, 1 to 8 bytes: 0x28-0x2f + uint;
small integers 0, 1, ... 9: 0x30-0x39;
small negative integers -6, -5, ..., -1: 0x3a-0x3f;
UTC-date in milliseconds since the epoch, little-endian, 2's complement: 0x1c + uint64

double IEEE-754, little-endian: 0x1b + uint64 equivalent;

positive long packed BCD-encoded float: 0xc8-0xcf + 8 bytes;
negative long packed BCD-encoded float: 0xd0-0xd7 + 8 bytes

UTF-8 string, 0-126 bytes length: 0x40-0xbe + 0..126 bytes;

variable length UTF-8 string, little-endian, unsigned integer, not zero-terminated and may contain zero bytes: 0xbf + 8 bytes byte-length + string

empty array: 0x01;

array without index table, all sub items 1/2/4/8 bytes byte-length;
array with 1/2/4/8 byte index table offsets, byte-length and number of sub values;
compact array, no index table: 0x13

empty object: 0x0a;

object with 1/2/4/8 byte index table offsets, sorted by attribute name, 1/2/4/8 byte byte-length and number of sub values;
object with 1/2/4/8 byte index table offsets, not sorted by attribute name, 1/2/4/8 byte byte-length and number of sub values;
compact object, no index table: 0x14

It should be noted that any XML based representation can be compressed, or generated as, using EXI - Efficient XML Interchange, which is a "Schema Informed" (as opposed to schema-required, or schema-less) binary compression standard for XML.

See also

References

  1. 1 2 http://www.xml.com/pub/a/ws/2001/04/04/soap.html
  2. Ben-Kiki, Oren; Evans, Clark; Net, Ingy döt (2009-10-01). "YAML Ain’t Markup Language (YAML) Version 1.2". The Official YAML Web Site. Retrieved 2012-02-10.
  3. https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.text_format
  4. http://www.gnustep.org/resources/documentation/Developer/Base/Reference/NSPropertyList.html
  5. http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man5/plist.5.html
  6. http://developer.apple.com/mac/library/documentation/CoreFoundation/Conceptual/CFPropertyLists/Articles/XMLTags.html#//apple_ref/doc/uid/20001172-CJBEJBHH
  7. Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Null Language-Independent Type for YAML Version 1.1". YAML.org. Retrieved 2009-09-12.
  8. 1 2 Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Boolean Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
  9. Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-02-11). "Integer Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
  10. Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Floating-Point Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
  11. https://github.com/liteserver/binn/blob/master/spec.md
  12. http://bsonspec.org
  13. RFC 7049
  14. https://github.com/msgpack/msgpack/blob/master/spec.md#formats-str
  15. https://developers.google.com/protocol-buffers/docs/encoding
  16. https://github.com/arangodb/velocypack/blob/master/VelocyPack.md

External links

This article is issued from Wikipedia - version of the Monday, March 28, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.