Document conversion
Document conversion is the act of converting one document's format to another, which allows the document to be read in many more applications. Documents can be converted into
- other source document formats
- consumer formats
- structured data
How it works
The conversion of the file is usually done by the application that it was created with, though there are also various third-party tools to perform it. Most file formats can be disassembled with a hex editor. Alternatively conversions can be automatically provided by Web services that connect to a document storage or delivery system - such as file directory or a document / content management applications. The content transformation services can run on a local server, in the Web or in the cloud. Conversion tools can also be combined with a delivery component - that publishes converted data into a database, filesystem or other systems.
PHi provides Online Publishing, Digital Publishing Services, Document Conversion, Electronic Publishing print and electronic publishing work.
Examples
- Conversion of markup languages using pandoc
- Converting .doc (Microsoft Word format) to .odt (OpenOffice.org format)
- Converting .ppt (Microsoft PowerPoint format) to .odp (OpenOffice.org format)
- Converting .shw (Corel Presentation format) to .ppt (Microsoft PowerPoint format)
- Converting .doc (word format) to .pdf (PDF format)
- Converting .doc (Microsoft Word format)to a Web site based on structured HTML (Hypertext)
- Converting .doc (Microsoft Word format)to .swf (flash format)
- Converting .doc (Microsoft Word format)to .mp3 (audio format)
Paper documents conversion
The task of converting scanned paper documents to useful electronic formats is one of the most important applications for document conversion. Documents, scanned to image formats, have lots of limitations such as large file size, impossibility of context search and content reuse. Consideration should be given to conversion to more useful formats, such as:
- Searchable: PDF
- Archive: PDF/A – for the long-term storage
- Compressed: MRC-PDF
- Editable: TXT, RTF, DOC, XLS, PPT
- Structured: XML, HTML
Content extraction from the document image is the task of Optical Character Recognition (OCR) or Intelligent Character Recognition (ICR) technologies. Modern OCR applications convert image files to different document formats with saving not just content but also the structure of document (ADRT).
Paper documents conversion applications
Company | Product | Import formats | Export formats |
---|---|---|---|
Expervision | TypeReader 2008 | BMP, PCX, DCX, JPEG, PNG, TIFF, PDF | DOC, XLS, DOCX, XLSX, RTF, TXT, HTML, DBF, CSV, PDF, ASCII (Comma Delimited or Tab Delimited), WordPerfect, TypeReader Native Format, TypeReader Text Only |
ABBYY | FineReader 9.0 | BMP, PCX, DCX, JPEG, JPEG 2000, PNG, TIFF, PDF, GIF, XPS, DjVu | DOC, XLS, DOCX, XLSX, PPT, RTF, TXT, HTML, DBF, CSV, PDF/A, PDF, MRC-PDF, LIT, WordML |
ExactCODE | ExactScan Pro 2 | TIFF, JPEG, JPEG 2000, PNG, BMP, PCX, GIF, PDF | PDF, RTF, HTML, TXT, |
I.R.I.S. Group | Readiris 12 | JPEG, BMP, TIFF, PDF, DjVu, JPEG 2000 | DOC, DOCX, XLS, XLSX, PDF, ODT, XPS, PDF/A, HTML, RTF, WPD |
Nuance Communications | OmniPage Professional 17 | TXT, TIFF, JPEG, BMP, PCX, GIF, PDF, MAX | DOC, DOCX, XML, XLS, XLSX, PPTX, PDF, RTF, HTML, XSN, XPS, WordML |
Special PDF conversion applications:
Company | Product | Convert PDF from (formats) | Convert PDF to (formats) |
---|---|---|---|
ABBYY | PDF Transformer 3.0 | DOC, XLS, DOCX, XLSX, PPT, RTF, PPTX, VSD, VSDX and any application via printing function | DOC, XLS, DOCX, XLSX, PPT, RTF, TXT, HTML, DBF, searchable PDF/A, searchable PDF |
Ascertia | PDF Sign&Seal | DOC, DOCX, XLS, XLSX, PPT, RTF - any file using File > Print | JPG files |
ExactCODE | OCRKit 2 | TIFF, JPEG, JPEG 2000, PNG, BMP, PCX, GIF, PDF | PDF, RTF, HTML, TXT, |
Nitro PDF Software | Nitro PDF Professional | DOC, XLS, DOCX, XLSX, PPT, RTF and others | DOC, DOCX, RTF, image files |
Software Depot Online | Docsmartz PDF Converter Professional | - | DOC, RTF, image files, XLSX, Postscript, Text |
Software Depot Online | Docsmartz PDF Creator | DOC, XLS, DOCX, XLSX, PPT, RTF, PPTX, VSD, VSDX and any application via printing function or right click | - |
Nuance Communications | PDF Converter 6 | - | DOC, DOCX, XML, XLS, XLSX, PPTX, WDP, XPS, PDF, MRC-PDF |
Consumer format conversion applications:
Company | Product | Input formats | Output formats |
---|---|---|---|
Cobynsoft | Cobynsoft's Review | TXT, RTF, DOC, DOCX, ODT, ePUB | TXT, RTF, DOC, DOCX, ODT, ePUB, PDF |