ALTO (XML)
ALTO (Analyzed Layout and Text Object) is an open XML Schema developed by the Library of Congress for OCR text and layout information. It is often used with Metadata Encoding and Transmission Standard (METS).
Structure
An ALTO file consists of three major sections as children of the root <alto> element:[1]
- <Description> section contains metadata about the ALTO file itself and processing information on how the file was created.
- <Styles> section contains the text and paragraph styles with their individual descriptions:
- <TextStyle> has font descriptions
- <ParagraphStyle> has paragraph descriptions, e.g. alignment information
- <Layout> section contains the content information. It is subdivided into <Page> elements.
<?xml version="1.0"?>
<alto>
<Description>
<MeasurementUnit/>
<sourceImageInformation/>
<Processing/>
</Description>
<Styles>
<TextStyle/>
<ParagraphStyle/>
</Styles>
<Layout>
<Page>
<TopMargin/>
<LeftMargin/>
<RightMargin/>
<BottomMargin/>
<PrintSpace/>
</Page>
</Layout>
</alto>
See also
- Metadata Encoding and Transmission Standard (METS)
- Dublin Core, an ISO metadata standard
- Preservation Metadata: Implementation Strategies (PREMIS)
- Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
- hOCR
External links
- ALTO (Analyzed Layout and Text Object) standards on Library of Congress website
- More info about METS/ALTO by CCS GmbH
- METS ALTO Introduction by CCS GmbH
References
This article is issued from Wikipedia - version of the Thursday, November 26, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.