Tesseract4.1.0 LSTM训练报错 英文数字 jTessBoxEditor无法显示框框

4、利用.tif和.box文件,生成.lstmf文件用于lstm训练

tesseract nml.num.exp0.tif nml.num.exp0 -l eng --psm 6 lstm.train

5用已有的或官方下载的.traineddata文件中提取.lstm文件

https://github.com/tesseract-ocr/tessdata_best 从该链接中下载所需语言的.traineddata文件

注:一定要用从上述链接中下载的.traineddata文件,其他的.traineddata文件中提取.lstm文件无法进行训练。

将下载好的.traineddata文件拷贝到训练文件夹下

combine_tessdata -e eng.traineddata eng.lstm

训练语句

tesseract eng_my.font.exp0.tif eng_my.font.exp0 -l eng --psm 7 batch.nochop makebox

这条语句不适用于最新的tesseract4.1.0 其中的-l eng -psm 7 多余,会导致训练软件jTessBoxEditor无法显示识别框

tesseract eng_my.font.exp0.tif eng_my.font.exp0  batch.nochop makebox
tesseract  eng_my.font.exp0.tif eng_my.font.exp0  nobatch box.train

tesseract eng2_my.font.exp0 eng2_my.font.exp0 -l eng --psm 6 lstm.train

下面的调用命令行有很大大大大的bug。命令行的顺序是乱的,根本运行不起来,报各种错。

lstmtraining 
--traineddata = “C:\jTessBoxEditorFX\samples\trainedsrc\eng.traineddata”
--model_output = “C:\jTessBoxEditorFX\samples\trainingoutput”
--continue_from = “C:\jTessBoxEditorFX\samples\trainedsrc\eng.lstm”
--train_listfile = “C:\jTessBoxEditorFX\samples\trainedsrc\eng.training_files.txt”

--debug_interval -1 --max_iterations 2000 

lstmtraining --model_output="C:\jTessBoxEditorFX\samples\trainingoutput" --continue_from="C:\jTessBoxEditorFX\samples\trainedsrc\eng.lstm" 
--train_listfile="C:\jTessBoxEditorFX\samples\trainedsrc\eng.training_files.txt" --traineddata="C:\jTessBoxEditorFX\samples\trainedsrc\eng.traineddata" 
--debug_interval -1 --max_iterations 800

输入lstmtraining 看看命令行的顺序是什么

lstmtraining
USAGE: lstmtraining -v | --version | lstmtraining [.tr files ...]
  --debug_level  Level of Trainer debugging  (type:int default:0)
  --load_images  Load images with tr files  (type:int default:0)
  --debug_interval  How often to display the alignment.  (type:int default:0)
  --net_mode  Controls network behavior.  (type:int default:192)
  --perfect_sample_delay  How many imperfect samples between perfect ones.  (type:int default:0)
  --max_image_MB  Max memory to use for images.  (type:int default:6000)
  --append_index  Index in continue_from Network at which to attach the new network defined by net_spec  (type:int default:-1)
  --max_iterations  If set, exit after this many iterations  (type:int default:0)
  --clusterconfig_min_samples_fraction  Min number of samples per proto as % of total  (type:double default:0.625)
  --clusterconfig_max_illegal  Max percentage of samples in a cluster which have more than 1 feature in that cluster  (type:double default:0.05)
  --clusterconfig_independence  Desired independence between dimensions  (type:double default:1)
  --clusterconfig_confidence  Desired confidence in prototypes created  (type:double default:1e-06)
  --target_error_rate  Final error rate in percent.  (type:double default:0.01)
  --weight_range  Range of initial random weights.  (type:double default:0.1)
  --learning_rate  Weight factor for new deltas.  (type:double default:0.001)
  --momentum  Decay factor for repeating deltas.  (type:double default:0.5)
  --adam_beta  Decay factor for repeating deltas.  (type:double default:0.999)
  --stop_training  Just convert the training model to a runtime model.  (type:bool default:false)
  --convert_to_int  Convert the recognition model to an integer model.  (type:bool default:false)
  --sequential_training  Use the training files sequentially instead of round-robin.  (type:bool default:false)
  --debug_network  Get info on distribution of weight values  (type:bool default:false)
  --randomly_rotate  Train OSD and randomly turn training samples upside-down  (type:bool default:false)
  --configfile  File to load more configs from  (type:string default:)
  --D  Directory to write output files to  (type:string default:)
  --F  File listing font properties  (type:string default:font_properties)
  --X  File listing font xheights  (type:string default:)
  --U  File to load unicharset from  (type:string default:unicharset)
  --O  File to write unicharset to  (type:string default:)
  --output_trainer  File to write trainer to  (type:string default:)
  --test_ch  UTF8 test character string  (type:string default:)
  --net_spec  Network specification  (type:string default:)
  --continue_from  Existing model to extend  (type:string default:)
  --model_output  Basename for output models  (type:string default:lstmtrain)
  --train_listfile  File listing training files in lstmf training format.  (type:string default:)
  --eval_listfile  File listing eval files in lstmf training format.  (type:string default:)
  --traineddata  Combined Dawgs/Unicharset/Recoder for language model  (type:string default:)
  --old_traineddata  When changing the character set, this specifies the old character set that is to be replaced  (type:string default:)

根据上面的重新调整顺序

lstmtraining 
--debug_interval -1 --max_iterations 800
--continue_from="C:\jTessBoxEditorFX\samples\trainedsrc\eng.lstm"
--model_output="C:\jTessBoxEditorFX\samples\trainingoutput" 
--train_listfile="C:\jTessBoxEditorFX\samples\trainedsrc\eng.training_files.txt"
 --traineddata="C:\jTessBoxEditorFX\samples\trainedsrc\eng.traineddata" 

发现运行起来了
在这里插入图片描述

合成训练结果

lstmtraining --stop_training --continue_from="C:\jTessBoxEditorFX\samples\trainingoutput_checkpoint" 
--model_output="C:\jTessBoxEditorFX\samples\trainingoutput\eng2_my.traineddata"
--traineddata="C:\jTessBoxEditorFX\samples\trainedsrc\eng.traineddata" 

图像中没有框,如何增添框

选中整张图像,然后点击inset,你会发现有新的框进来。

两个字符挨得的太近,被框到了一个框,点击split,即可拆分成两个。

调整框太慢,调整x的话,选中x的调整(变蓝了),下面选中调整的框(变红的框),此时见证奇迹的时候到了,按住键盘的up(向上的箭头)键或down键(向下的箭头),你会发现选中的框在飞速的移动。

在这里插入图片描述

整张图像会识别为~符号,注意删掉

在这里插入图片描述

输入字符的时候敲击两次enter

在这里插入图片描述

整张图像都没有标注框怎么办?

  1. 将.box文件用notepad++打开,复制再贴全部内容到新建的txt文件中
  2. 按照.box里每行的顺序手动添加标示框
    | 识别内容 | 识别框左上角的x坐标 |识别框左上角的y坐标|识别框的宽|识别框的高|tif图像页码|
    | B | 16 |92|138|261|0|
    3.找到图像页码对应的行数,然后插入一行,在txt中添加好后复制到.box文件中。最后重新打开.box文件校对就可以了。
  • 2
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 17
    评论
评论 17
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值