Comparison of optical character recognition software

This comparison of optical character recognition software includes:

Sortable table
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Output Formats Notes
Tesseract 1985 3.04 2015 Apache No Yes Yes Yes Yes C++, C Yes 100+[1] ? Text, hOCR,[2] PDF, others with different user interfaces[3] or the API Created by Hewlett-Packard; under further development by Google[4]
Screenworm 2013 1.0 2014 Proprietary No No Yes No No Objective-C++ No 57 ? TXT Product of Funchip. Uses the Tesseract OCR-engine.
ExperVision[5] TypeReader & RTK 1987 7.1.170.1125 2010 Proprietary Yes Yes Yes Yes Yes C/C++ Yes 21 2618 Has a Mobile and Embedded System version for iOS/Android/etc.
AliusDoc AD-SCI[6] 2005 2.1 2015 Proprietary No Yes No No No VB.Net For Extensions All ASCII-compatible languages ? XML, PlainText, any other thru SDK extensions Minimal need for post-sale Professional Services. Works with structured, semi-structured, and unstructured documents.
ABBYY FineReader 1989 12 2014 Proprietary Yes Yes Yes Yes Yes C/C++ Yes 198[7] ? DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[8] ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[9]
Asprise OCR SDK 1998 15 2015 Proprietary Yes Yes Yes Yes Yes Java, C#,VB.NET, C/C++/Delphi Yes 20+[10] ? Plain text, searchable PDF, XML[11] Java, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix.[12]
Nicomsoft OCR SDK 1999 5.5 2015 Proprietary No Yes No Yes No C#, VB.NET, C++, Delphi, Java Yes 25+[13] ? Searchable PDF, Text, RTF C#, VB.NET, C++, Delphi, Java OCR tool for Windows and Linux.[14]
AnyDoc Software 1989 ? ? Proprietary No Yes No No No VBScript ? ? ? Works with structured, semi-structured, and unstructured documents.
LEADTOOLS[15] 1990[16] 19.0 2014 Proprietary Yes Yes Yes Yes No C/C++, .NET, Objective-C, Java, JavaScript Yes 56[17] Any printed font PDF, PDF/A, DOC, DOCX, XLS, XPS, RTF, HTML, ANSI Text, Unicode Text, CSV[18] Supports Latin, Asian, Arabic, and MICR character sets.[15] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[19] ICR (handwritten text recognition) is supported.[20]
CuneiForm 1996 12 2007 BSD variant No Yes Yes Yes Yes C/C++ Yes 28 Any printed font HTML, hOCR, native, RTF, TeX, TXT[21] Enterprise-class system, can save text formatting and recognizes complicated tables of any structure
(a9t9)FreeOCR 2015 1.022 2015 GPLYes Yes No No No C# Yes 23 Any printed font TXTWindows desktop software, Windows Store application and online web app - converts scanned documents to editable text documents using OCR.
SimpleOCR 2002 3.5 2008 Proprietary No Yes No No No ? ? ? ?
Dynamsoft OCR SDK 2003 8.2 2012 Proprietary Yes Yes No No No C/C++ Yes 40+[22] ? PDF, TXT Dynamsoft is the leading provider of image capture SDKs and version control tools.
OmniPage 1970's 19 2013 Proprietary Yes Yes Yes Yes No C/C++, C#[23] Yes 125[24] Machine and handprinted fonts DOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3 Product of Nuance Communications
Microsoft Office OneNote 2007 2007 ? 2007 Proprietary No Yes No No No ? ? ? ?
FreeOCR ? 4.2 August 2012 Proprietary No Yes No No No ? ? ? ? [25]
GOCR 2000 0.50 2013 GPL Yes[26] Yes Yes Yes Yes C ? ? ?
Ocrad ? 0.22[27] 2013 GPL Yes Yes Yes Yes Yes C++ Yes Latin alphabet ? Command line
SmartScore ? ? ? Proprietary No Yes Yes No No ? ? ? ? For musical scores
Microsoft Office Document Imaging ? Office 2007 2007 Proprietary No Yes No No No ? ? ? ? Uses OmniPage
Puma.NET ? ? ? BSD No Yes No No No C# Yes 28 Any printed font .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications
ReadSoft ? ? ? Proprietary No Yes No No No ? ? ? ? Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
Scantron ? ? ? Proprietary No Yes No No No ? ? ? ? For working with localized interfaces, corresponding language support is required.
OCRFeeder ? 0.7.11 2009 GPL No No No Yes No Python ? ? ? Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad
OCRopus ? 0.6 2012 Apache No No No Yes No Python ? ? ? hOCR, HTML, TXT[28] Pluggable framework under active development, used for Google Books
MathOCR 2014 0.0.3 2015 GPL No Yes Yes Yes Yes Java ? ? ? HTML, LaTeX Features mathematical formula recognition and logical layout analysis, can use OCR engines like Tesseract or Ocrad as back-end.
MeOCR 2012 1.0.0 2012 Free No Yes No No No C/C++/C# Yes 28 Any printed font HTML, hOCR, native, RTF, TeX, TXT Windows application. Converts scanned documents to editable text documents using OCR and exports them to Microsoft Word with one click. Features a full user interface and also has a .NET Interface library[29] for developers.
Yunmai OCR SDK 2002 1.0 2013 Proprietary Yes Yes Yes Yes Yes Java, C++, C, object pascal, objective-C Yes 14 Any printed font TXT, PDF Has the advantage of Chinese characters recognition.[30]
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Output Formats Notes

References

  1. Based on count of language training files for version 3.04. Available at the download page.
  2. Usage explained in the Tesseract Readme and FAQ
  3. Such as ODF with OCRFeeder
  4. "GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)". Retrieved 2016-03-08.
  5. "OpenRTK – ExperVision OCR SDK | OCR Software, OCR SDK & Toolkit, OCR Service – ExperVision OCR". Expervision.com. Retrieved 2013-09-12.
  6. "AliusDoc AD-SCI". AliusDoc.com. Retrieved 2015-10-16.
  7. "ABBYY FineReader 11: Full Feature List". Finereader.abbyy.com. Retrieved 2013-09-12.
  8. "ABBYY FineReader 11: Technical Specifications". Finereader.abbyy.com. Retrieved 2013-09-12.
  9. "Top OCR Software". Ocrworld.com. 2010-03-30. Retrieved 2013-09-12.
  10. "Asprise OCR SDK Features". asprise.com. Retrieved 2014-06-21.
  11. "Asprise Java OCR Library Features". asprise.com. Retrieved 2014-06-21.
  12. "Asprise Java, C#/VB.NET OCR API". asprise.com. 2015-11-19. Retrieved 2015-11-19.
  13. "Nicomsoft OCR SDK Features". nicomsoft.com. Retrieved 2015-01-08.
  14. "Nicomsoft OCR, C#/VB.NET OCR API". nicomsoft.com. 2015-01-08. Retrieved 2015-01-08.
  15. 1 2 "Ocr Sdk". Leadtools. Retrieved 2013-09-12.
  16. "LEAD Technologies, Inc. Corporate Information". Leadtools.com. Retrieved 2013-09-12.
  17. "Ocr Sdk". Leadtools. Retrieved 2013-09-12.
  18. "OCR SDK Output Formats". Leadtools. Retrieved 2013-09-12.
  19. "LEADTOOLS Recognition Imaging Developer Toolkit". Leadtools.com. Retrieved 2013-09-12.
  20. "Icr Sdk". Leadtools. Retrieved 2013-09-12.
  21. Debian manual page for Cuneiform for Linux version 1.1.0
  22. "OCR SDK Language Packages Download". Dynamsoft.com. Retrieved 2013-09-12.
  23. "OmniPage CSDK - OCR Document Capture Toolkit | Document Imaging & OCR". Nuance. Retrieved 2013-09-12.
  24. "OmniPage Standard Document Conversion". Nuance. Retrieved 2014-02-25.
  25. "Free OCR Software - Optical Character Recognition Software for Windows import from PDF and Twain Scanners". Paperfile.net. Retrieved 2013-09-12.
  26. "GOCR". Jocr.sourceforge.net. Retrieved 2013-09-12.
  27. Diaz, Antonio (2013-07-12). "Version 0.22 of GNU Ocrad released" (Mailing list). info-gnu.
  28. OCRopus includes the ocropus-hocr tool which produces hOCR from the recognition results.
  29. "MeOCR .NET Library".
  30. "List of Yunmai OCR SDKs". yunmai.com. Retrieved 2015-07-12.
This article is issued from Wikipedia - version of the Monday, April 04, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.