原先是HP写的,现在Open source了。支持英文字母和数字。据说辨识程度是世界排名第三的。http://sourceforge.net/projects/tesseract-ocr
Linux下编译:
configure
make
make install
发现错误:
分为2中错误
第一种是关于符号转换的bug,const char* 转换 char* 错误,经常发生在str××××相关函数,解决方法--将第一个参数用(char*) 强制转换一下。
第二中错误是发生在C++代码引用C代码的问题上,解决方法如下
11111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111
# diff -C 3 ./cutil/globals.h~ ./cutil/globals.h
*** ./cutil/globals.h~ 2007-05-15 20:13:26.000000000 -0500
--- ./cutil/globals.h 2007-06-16 04:27:42.000000000 -0500
***************
*** 45,53 ****
extern int debugs[MAXPROC]; /*debug flags */
extern int plots[MAXPROC]; /*plot flags */
extern int corners[4]; /*corners of scan window */
extern int optind; /*option index */
extern char *optarg; /*option argument */
! /*image file name */
extern char imagefile[FILENAMESIZE];
/* main directory */
extern char directory[FILENAMESIZE];
--- 45,58 ----
extern int debugs[MAXPROC]; /*debug flags */
extern int plots[MAXPROC]; /*plot flags */
extern int corners[4]; /*corners of scan window */
+ #ifdef __cplusplus
+ extern "C" {
+ #endif
extern int optind; /*option index */
extern char *optarg; /*option argument */
! #ifdef __cplusplus
! }
! #endif /*image file name */
extern char imagefile[FILENAMESIZE];
/* main directory */
extern char directory[FILENAMESIZE];
2222222222222222222222222222222222222222222222222222222222222222
2222222222222222222222222222222222222222222222222222222222222222
# diff -C 3 ./cutil/tordvars.h~ ./cutil/tordvars.h
*** ./cutil/tordvars.h~ 2007-05-16 16:33:53.000000000 -0500
--- ./cutil/tordvars.h 2007-06-16 04:25:43.000000000 -0500
***************
*** 39,44 ****
--- 39,46 ----
extern FILE *correct_fp; //correct text
extern FILE *matcher_fp;
+ extern "C"
+ {
extern int blob_skip; /* Skip to next selection */
extern int num_word_choices; /* How many words to keep */
extern int similarity_enable; /* Switch for Similarity */
***************
*** 50,55 ****
--- 52,58 ----
extern int show_bold; /* Use bold text */
extern int display_text; /* Show word text */
extern int display_blocks; /* Show word as boxes */
+ }
extern float overlap_threshold; /* Overlap Threshold */
extern float certainty_threshold; /* When to quit looking */
测试:
执行例子图像文件tesseract.exe phototest.tif abc batch
输出结果在abc.txt,识别率竟然是100%。当然你自己做的图片就不一定有这么高。
编译Tesseract OCR 1.03
最新推荐文章于 2024-07-21 22:13:29 发布