申明:本文非笔者原创,原文转载自:http://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_software
This comparison of optical character recognition software includes:
- OCR engines, that do the actual character identification
- Layout analysis software, that divide scanned documents into zones suitable for OCR
- Graphical interfaces to one or more OCR engines
- Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discoverysystems, records management solutions)
Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tesseract | 1985 | 3.02 | Oct 2012 | Apache | No | Yes | Yes | Yes | Yes | C++, C | Yes | 35+[1] | ? | Text,hOCR,[2]others with different user interfaces[3]or the API | Created by Hewlett-Packard; under further development by Google[4] It was one of the top 3 engines in the 1995 UNLV Accuracy test. |
ExperVision[5]TypeReader & RTK | 1987 | 7.1.170.1125 | 2010 | Proprietary | Yes | Yes | Yes | Yes | Yes | C/C++ | Yes | 21 | 2618 | Won the highest marks in the independent testing performed byUNLV for X consecutive years (in 1994).[6][citation needed]
| |
ABBYY FineReader | 1989 | 11 | 2011 | Proprietary | Yes | Yes | Yes | Yes | Yes | C/C++ | Yes | 198[9] | ? | DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[10] | ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[11] |
AnyDoc Software | 1989 | ? | ? | Proprietary | No | Yes | No | No | No | VBScript | ? | ? | ? | Works with structured, semi-structured, and unstructured documents. | |
Aquaforest OCR SDK | 2001 | 1.41 | 2013 | Proprietary | Yes[12] | Yes | No | No | No | C#, VB.NET, ASP.NET | Yes | 23 | OmniFont (Extended Module available, including support for over 100 languages)[13] | PDF, PDF/A, RTF, TXT | Aquaforest's[14] OCR SDK for .NET[15]enables developers to directly make use of the Aquaforest OCR engine in their own applications and create searchable PDFs, RTF or text files from TIFFs, Bitmaps and Image-Only PDFs. |
LEADTOOLS[16] | 1990[17] | 18.0 | 2013 | Proprietary | Yes | Yes | Yes | Yes | No | C/C++, .NET, Objective-C, Java, JavaScript | Yes | 56[18] | Any printed font | PDF, PDF/A, DOC, DOCX, XLS, XPS, RTF, HTML, ANSI Text, Unicode Text, CSV[19] | Supports Latin, Asian, Arabic, and MICR character sets.[16] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[20] ICR (handwritten text recognition) is supported.[21] |
CuneiForm/OpenOCR | 1996 | 12 | 2007 | BSD variant | No | Yes | Yes | Yes | Yes | C/C++ | Yes | 28 | Any printed font | HTML, hOCR, native, RTF,TeX, TXT[22] | Enterprise-class system, can save text formatting and recognizes complicated tables of any structure |
Transym OCR | 2000 | 3.3 | 2011 | Proprietary | No | Yes | No | No | No | C#, C/C++, VB, VB.NET | Yes | 11 | ? | ||
Image to OCR Converter | 2010[23] | 1.2[24] | 2012 | Proprietary | No | Yes | No | No | No | C/C++, VB and .NET | Command Line | 40 | ? | SearchablePDF, Text-Only PDF, Word, HTML, Text[25] | It can read most image formats and pdf files, and can scan images from scanner or camera.[26][27] |
SimpleOCR | 2002 | 3.5 | 2008 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | ||
Dynamsoft OCR SDK | 2003 | 8.2 | 2012 | Proprietary | Yes | Yes | No | No | No | C/C++ | Yes | 40+[28] | ? | PDF, TXT | Dynamsoft is the leading provider of image capture SDKs and version control tools. |
OmniPage | 2005 | 18 | 2011 | Proprietary | No | Yes | Yes | Yes | No | C/C++, C#[29] | Yes | ? | ? | Product of Nuance Communications | |
Microsoft Office OneNote 2007 | 2007 | ? | 2007 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | ||
FreeOCR | ? | 4.2 | August 2012 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | [30] | |
GOCR | ? | 0.49 | 2010 | GPL | Yes[31] | Yes | Yes | Yes | Yes | C | ? | ? | ? | ||
Ocrad | ? | 0.21[32] | 2011 | GPL | Yes | Yes | Yes | Yes | Yes | C++ | Yes | Latin alphabet | ? | Command line | |
SmartScore | ? | ? | ? | Proprietary | No | Yes | Yes | No | No | ? | ? | ? | ? | For musical scores | |
Microsoft Office Document Imaging | ? | Office 2007 | 2007 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | Uses OmniPage[citation needed] | |
Puma.NET | ? | ? | ? | BSD | No | Yes | No | No | No | C# | Yes | 28 | Any printed font | .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications | |
ReadSoft | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes. | |
Scantron | ?Cognition | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | For working with localized interfaces, corresponding language support is required. | |
OCRFeeder | ? | 0.7.11 | 2009 | GPL | No | No | No | Yes | No | Python | ? | ? | ? | Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines likeTesseract or Ocrad | |
OCRopus | ? | 0.6 | 2012 | Apache | No | No | No | Yes | No | Python | ? | ? | ? | hOCR, HTML, TXT[33] | Pluggable framework under active development, used forGoogle Books |
Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |