tesseract训练

 https://github.com/tesseract-ocr/tesseract

下载路径:https://github.com/tesseract-ocr/tesseract/wiki

exe:http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.00dev.exe

tesseract训练

1.下载jTessBoxEditor(jre使用的是jre7),用TIFF/BoxGenerator添加常用的宋体中文,Output:zhong chi_sim.exp0.tif ->Generate,生成

zhong.chi_sim.exp0.tif和zhong.chi_sim.exp0.box2个文件

如果要合并tif文件,用jTessBoxEditor合并成单个tif,然后用命令转成box文件

tesseract.exe decs.exp0.tif decs.exp0 batch.nochop makebox  

2.创建文件font_properties,内容:chi_sim 0 0 0 0 0

3.创建bat文件start.bat,内容:

rem 执行改批处理前先要目录下创建font_properties文件  

  

echo Run Tesseract for Training..  

D:\app\Tesseract-OCR\tesseract.exe zhong.chi_sim.exp0.tif zhong.chi_sim.exp0 nobatch box.train  

  

echo Compute the Character Set..  

D:\app\Tesseract-OCR\unicharset_extractor.exe zhong.chi_sim.exp0.box  

D:\app\Tesseract-OCR\mftraining.exe -F font_properties -U unicharset -O zhong.unicharset zhong.chi_sim.exp0.tr  

  

echo Clustering..  

D:\app\Tesseract-OCR\cntraining.exe zhong.chi_sim.exp0.tr  

 

echo Rename Files..  

rename normproto zhong.normproto  

rename inttemp zhong.inttemp  

rename pffmtable zhong.pffmtable  

rename shapetable zhong.shapetable   

  

echo Create Tessdata..  

D:\app\Tesseract-OCR\combine_tessdata.exe zhong.

pause

 

4.运行start.bat,等待命令行结果:1,3,4,5,13不为-1就是成功了!

TessdataManager combined tesseract data files.

Offset for type 0 is -1

Offset for type 1 is 140

Offset for type 2 is -1

Offset for type 3 is 509098

Offset for type 4 is 42657207

Offset for type 5 is 42726936

Offset for type 6 is -1

Offset for type 7 is -1

Offset for type 8 is -1

Offset for type 9 is -1

Offset for type 10 is -1

Offset for type 11 is -1

Offset for type 12 is -1

Offset for type 13 is 43579530

Offset for type 14 is -1

Offset for type 15 is -1

Offset for type 16 is -1



 

 

 

5.生成zhong.traineddata,copy到tesseract的tessdata文件夹下



 

6.运行命令tesseract.exe E:\temp\image\y.jpg E:\temp\image\y -l zhong,可以在y.txt中查看识别的结果

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值