Comparison of optical character recognition software
This comparison of optical character recognition software includes:
- OCR engines, that do the actual character identification
- Layout analysis software, that divide scanned documents into zones suitable for OCR
- Graphical interfaces to one or more OCR engines
- Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discovery systems, records management solutions)
Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tesseract | 1985 | 3.04 | 2015 | Apache | No | Yes | Yes | Yes | Yes | C++, C | Yes | 100+[1] | ? | Text, hOCR,[2] PDF, others with different user interfaces[3] or the API | Created by Hewlett-Packard; under further development by Google[4] |
Screenworm | 2013 | 1.0 | 2014 | Proprietary | No | No | Yes | No | No | Objective-C++ | No | 57 | ? | TXT | Product of Funchip. Uses the Tesseract OCR-engine. |
ExperVision[5] TypeReader & RTK | 1987 | 7.1.170.1125 | 2010 | Proprietary | Yes | Yes | Yes | Yes | Yes | C/C++ | Yes | 21 | 2618 | Has a Mobile and Embedded System version for iOS/Android/etc. | |
AliusDoc AD-SCI[6] | 2005 | 2.1 | 2015 | Proprietary | No | Yes | No | No | No | VB.Net | For Extensions | All ASCII-compatible languages | ? | XML, PlainText, any other thru SDK extensions | Minimal need for post-sale Professional Services. Works with structured, semi-structured, and unstructured documents. |
ABBYY FineReader | 1989 | 12 | 2014 | Proprietary | Yes | Yes | Yes | Yes | Yes | C/C++ | Yes | 198[7] | ? | DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[8] | ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[9] |
Asprise OCR SDK | 1998 | 15 | 2015 | Proprietary | Yes | Yes | Yes | Yes | Yes | Java, C#,VB.NET, C/C++/Delphi | Yes | 20+[10] | ? | Plain text, searchable PDF, XML[11] | Java, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix.[12] |
Nicomsoft OCR SDK | 1999 | 5.5 | 2015 | Proprietary | No | Yes | No | Yes | No | C#, VB.NET, C++, Delphi, Java | Yes | 25+[13] | ? | Searchable PDF, Text, RTF | C#, VB.NET, C++, Delphi, Java OCR tool for Windows and Linux.[14] |
AnyDoc Software | 1989 | ? | ? | Proprietary | No | Yes | No | No | No | VBScript | ? | ? | ? | Works with structured, semi-structured, and unstructured documents. | |
LEADTOOLS[15] | 1990[16] | 19.0 | 2014 | Proprietary | Yes | Yes | Yes | Yes | No | C/C++, .NET, Objective-C, Java, JavaScript | Yes | 56[17] | Any printed font | PDF, PDF/A, DOC, DOCX, XLS, XPS, RTF, HTML, ANSI Text, Unicode Text, CSV[18] | Supports Latin, Asian, Arabic, and MICR character sets.[15] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[19] ICR (handwritten text recognition) is supported.[20] |
CuneiForm | 1996 | 12 | 2007 | BSD variant | No | Yes | Yes | Yes | Yes | C/C++ | Yes | 28 | Any printed font | HTML, hOCR, native, RTF, TeX, TXT[21] | Enterprise-class system, can save text formatting and recognizes complicated tables of any structure |
(a9t9)FreeOCR | 2015 | 1.022 | 2015 | GPL | Yes | Yes | No | No | No | C# | Yes | 23 | Any printed font | TXT | Windows desktop software, Windows Store application and online web app - converts scanned documents to editable text documents using OCR. |
SimpleOCR | 2002 | 3.5 | 2008 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | ||
Dynamsoft OCR SDK | 2003 | 8.2 | 2012 | Proprietary | Yes | Yes | No | No | No | C/C++ | Yes | 40+[22] | ? | PDF, TXT | Dynamsoft is the leading provider of image capture SDKs and version control tools. |
OmniPage | 1970's | 19 | 2013 | Proprietary | Yes | Yes | Yes | Yes | No | C/C++, C#[23] | Yes | 125[24] | Machine and handprinted fonts | DOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3 | Product of Nuance Communications |
Microsoft Office OneNote 2007 | 2007 | ? | 2007 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | ||
FreeOCR | ? | 4.2 | August 2012 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | [25] | |
GOCR | 2000 | 0.50 | 2013 | GPL | Yes[26] | Yes | Yes | Yes | Yes | C | ? | ? | ? | ||
Ocrad | ? | 0.22[27] | 2013 | GPL | Yes | Yes | Yes | Yes | Yes | C++ | Yes | Latin alphabet | ? | Command line | |
SmartScore | ? | ? | ? | Proprietary | No | Yes | Yes | No | No | ? | ? | ? | ? | For musical scores | |
Microsoft Office Document Imaging | ? | Office 2007 | 2007 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | Uses OmniPage | |
Puma.NET | ? | ? | ? | BSD | No | Yes | No | No | No | C# | Yes | 28 | Any printed font | .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications | |
ReadSoft | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes. | |
Scantron | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | For working with localized interfaces, corresponding language support is required. | |
OCRFeeder | ? | 0.7.11 | 2009 | GPL | No | No | No | Yes | No | Python | ? | ? | ? | Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad | |
OCRopus | ? | 0.6 | 2012 | Apache | No | No | No | Yes | No | Python | ? | ? | ? | hOCR, HTML, TXT[28] | Pluggable framework under active development, used for Google Books |
MathOCR | 2014 | 0.0.3 | 2015 | GPL | No | Yes | Yes | Yes | Yes | Java | ? | ? | ? | HTML, LaTeX | Features mathematical formula recognition and logical layout analysis, can use OCR engines like Tesseract or Ocrad as back-end. |
MeOCR | 2012 | 1.0.0 | 2012 | Free | No | Yes | No | No | No | C/C++/C# | Yes | 28 | Any printed font | HTML, hOCR, native, RTF, TeX, TXT | Windows application. Converts scanned documents to editable text documents using OCR and exports them to Microsoft Word with one click. Features a full user interface and also has a .NET Interface library[29] for developers. |
Yunmai OCR SDK | 2002 | 1.0 | 2013 | Proprietary | Yes | Yes | Yes | Yes | Yes | Java, C++, C, object pascal, objective-C | Yes | 14 | Any printed font | TXT, PDF | Has the advantage of Chinese characters recognition.[30] |
Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |
References
- ↑ Based on count of language training files for version 3.04. Available at the download page.
- ↑ Usage explained in the Tesseract Readme and FAQ
- ↑ Such as ODF with OCRFeeder
- ↑ "GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)". Retrieved 2016-03-08.
- ↑ "OpenRTK – ExperVision OCR SDK | OCR Software, OCR SDK & Toolkit, OCR Service – ExperVision OCR". Expervision.com. Retrieved 2013-09-12.
- ↑ "AliusDoc AD-SCI". AliusDoc.com. Retrieved 2015-10-16.
- ↑ "ABBYY FineReader 11: Full Feature List". Finereader.abbyy.com. Retrieved 2013-09-12.
- ↑ "ABBYY FineReader 11: Technical Specifications". Finereader.abbyy.com. Retrieved 2013-09-12.
- ↑ "Top OCR Software". Ocrworld.com. 2010-03-30. Retrieved 2013-09-12.
- ↑ "Asprise OCR SDK Features". asprise.com. Retrieved 2014-06-21.
- ↑ "Asprise Java OCR Library Features". asprise.com. Retrieved 2014-06-21.
- ↑ "Asprise Java, C#/VB.NET OCR API". asprise.com. 2015-11-19. Retrieved 2015-11-19.
- ↑ "Nicomsoft OCR SDK Features". nicomsoft.com. Retrieved 2015-01-08.
- ↑ "Nicomsoft OCR, C#/VB.NET OCR API". nicomsoft.com. 2015-01-08. Retrieved 2015-01-08.
- 1 2 "Ocr Sdk". Leadtools. Retrieved 2013-09-12.
- ↑ "LEAD Technologies, Inc. Corporate Information". Leadtools.com. Retrieved 2013-09-12.
- ↑ "Ocr Sdk". Leadtools. Retrieved 2013-09-12.
- ↑ "OCR SDK Output Formats". Leadtools. Retrieved 2013-09-12.
- ↑ "LEADTOOLS Recognition Imaging Developer Toolkit". Leadtools.com. Retrieved 2013-09-12.
- ↑ "Icr Sdk". Leadtools. Retrieved 2013-09-12.
- ↑ Debian manual page for Cuneiform for Linux version 1.1.0
- ↑ "OCR SDK Language Packages Download". Dynamsoft.com. Retrieved 2013-09-12.
- ↑ "OmniPage CSDK - OCR Document Capture Toolkit | Document Imaging & OCR". Nuance. Retrieved 2013-09-12.
- ↑ "OmniPage Standard Document Conversion". Nuance. Retrieved 2014-02-25.
- ↑ "Free OCR Software - Optical Character Recognition Software for Windows import from PDF and Twain Scanners". Paperfile.net. Retrieved 2013-09-12.
- ↑ "GOCR". Jocr.sourceforge.net. Retrieved 2013-09-12.
- ↑ Diaz, Antonio (2013-07-12). "Version 0.22 of GNU Ocrad released" (Mailing list). info-gnu.
- ↑ OCRopus includes the ocropus-hocr tool which produces hOCR from the recognition results.
- ↑ "MeOCR .NET Library".
- ↑ "List of Yunmai OCR SDKs". yunmai.com. Retrieved 2015-07-12.
This article is issued from Wikipedia - version of the Monday, April 04, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.