hOCR

hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Markup Language (XML) in form of Hypertext Markup Language (HTML) or XHTML.[1]

Applications

Software that utilizes this format includes:

See also

References

  1. Thomas Breuel, ed. (March 2010). "The hOCR Embedded OCR Workflow and Output Format".

External links

This article is issued from Wikipedia - version of the Tuesday, March 08, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.