Fast Infoset
Fast Infoset (or FI) is an international standard that specifies a binary encoding format for the XML Information Set (XML Infoset) as an alternative to the XML document format. It aims to provide more efficient serialization than the text-based XML format.
One can think of FI as a lossless compression, such as gzip, for XML, except that while the original formatting is lost, no information is lost in the conversion from XML to FI and back to XML. While compression is to reduce size, FI aims to optimize both document size and processing performance.
The Fast Infoset specification is defined by both the ITU-T and the ISO standards bodies. FI is officially named ITU-T Rec. X.891 and ISO/IEC 24824-1 (Fast Infoset), respectively. However, it is commonly referred to by the name Fast Infoset. The standard was published by ITU-T on May 14, 2005, and by ISO on May 4, 2007.
The Fast Infoset standard can be downloaded from the ITU website. There are no intellectual property restrictions on its implementation and use.
A common misconception is that FI requires ASN.1 tool support. Although the formal specification uses ASN.1 formalisms, it uses custom encoding rules via Encoding Control Notation (ECN). ASN.1 tools are not required by implementations.
An alternative is FleXPath [1]
Structure
The underlying file format is ASN.1, with tag/length/value blocks. Text values of attributes and elements are therefore stored with length prefixes rather than end delimiters, so there is no need to escape special characters. The equivalent of end tags ("terminators") are only needed at the end of a list of child-elements, and binary data need not be base64 encoded.
Fast Infoset is a higher level format built upon ASN.1 formalisms. Element and attribute names are stored within the octet stream, unlike traditional ASN.1. This means that it is possible to recover a conventional XML file from the binary stream without the need to reference any XML Schema. It does not attempt to convert an XML Schema directly into an ASN.1 definition. (ASN.1 "Tags" are just type names, e.g. String, Integer, or complex types.) ASN.1 together with ECN is used to define the file format.
An index table is built for most strings, which includes element and attribute names, and their values. This means that the text of repeated tags and values only appears once per document.
Implementations
Reference implementation
A Java implementation of the FI specification is available as part of the GlassFish project. The library is open source and is distributed under the terms of the Apache License 2.0. Several projects use this implementation, including the reference implementation for JAX-WS used in GlassFish Metro. QtitanFastInfoset - implementation for C++ is available under commercial license as a component for Digia Qt Framework.
Performance
Because Fast Infosets are compressed as part of the XML generation process, they are much faster than using Zip-style compression algorithms on an XML stream, although they can produce slightly larger files.
SAX-type parsing performance of Fast Infoset is also much faster than parsing performance of XML 1.0, even without any Zip-style compression. Typical increases in parsing speed observed for the reference Java implementation are a factor of 10 over Java Xerces, and a factor of 4 over the Piccolo driver (one of the fastest Java-based XML parsers).[2][3][4]
Typical applications
Portable devices – Mobile devices typically have low bandwidth data connections and slower CPUs. Fast Infoset uses less bandwidth than XML and is faster to process, making it a superior choice.
Persisting large volumes of data – When persisting XML either to file or a database, the volume of data your system produces can often get out of hand. This has a number of detrimental effects; the access times go up as you're reading more data, CPU load goes up as XML data takes more effort to process, and your storage costs go up. By persisting your XML data in Fast Infoset format, it is possible to reduce the data volume by up to 80 percent.
Passing XML via the Internet – As soon as an application starts passing information over the internet, one of the main bottlenecks is bandwidth. If you send reasonable chunks of data, this bottleneck can seriously degrade the performance of your client applications and limit your server's ability to process requests. Reducing the amount of data moving across the internet reduces the time it takes a message to be sent or received, while increasing the number of transactions a server can process per hour.
See also
References
- ↑ Amer-Yahia, Sihem, Laks VS Lakshmanan, and Shashank Pandit. "FleXPath: flexible structure and full-text querying for XML." Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, 2004.
- ↑ "Fast Infoset performance reports". 2005-10-06. Retrieved 2007-10-11.
- ↑ "Japex Report: ParsingPerformance". 2005-01-10. Retrieved 2007-10-11.
- ↑ "Japex Report: SizePerformance". 2005-01-10. Retrieved 2007-10-11.
External links
- A heavy technical description on OTN
- FastInfoset.NET home page
- FI project home page
- Fast Infoset page at the ASN.1 site
- OSS Fast Infoset Tools page
- Free download of the Fast Infoset standard (ITU-T Rec. X.891) from the ITU Web site
- Free download of the Fast Infoset standard (ISO/IEC 24824-1:2007) from ISO Freely Available Standards