vb tesseract 持续训练
上篇说过了图片的截取,接下来用vb写了一个tesseract在动态,持续训练程序.
用鼠标框选文字,然后输入相应在字符.
右边是对应的box属性,去掉了页码,因为页码是要后面动态添加的.
文件路径是写死的.”d:\train\”文件格式也是固定的,tif
谁如果需要别的自己改吧.
点击重新训练会自动合并tif为一个文件并训练出 名为num的语言包并自动覆盖到tesseract数据文件夹(路径也是写死的,使用的时候 tesseract 文件名 输出文件 -l num 即可.
自我感觉合并TIF挺有意思的.我写的灵活度并不大,有兴趣的同学可以研究研究.
付frm源码:另有一个cls,主要用来储存合并TIF时保存TIF信息用的.
首先注明:可能是我的方法不对,或者是我的样本不好,或者是我对tesseract的理解不够深,训练后的效果并不好.
'IDH.cls
Option Explicit
Public desc$ 'II未逆序字节排序(先小头,但是在程序里面可以直接转换),MM为顺序(先大头)
Public Data_Offset_Offiset_Offset As Long '指向数据偏移的偏移的指针
Public Data_Offset_Offiset As Long '指向数据偏移的偏移
Public Data_Offset_Length_Offiset As Long '指向数据个数的偏移
Public Data_Offset_Length_Offiset_Offset As Long '指向数据个数的偏移的指针
Public NextPage_Offset As Long '指向下一个图片的偏移
Public Data_length As Long '数据长度,这里只做合并用,所以直接拿总长度-8
'public Head_length as Long '恒等8
Public Tag_Offset As Long '标签偏移
Public Tag_length As Long '标签长度
VERSION 5.00
Begin VB.Form Form1
Caption = "Form1"
ClientHeight = 5055
ClientLeft = 120
ClientTop = 450
ClientWidth = 7245
LinkTopic = "Form1"
ScaleHeight = 337
ScaleMode = 3 'Pixel
ScaleWidth = 483
StartUpPosition = 3 '窗口缺省
Begin VB.CheckBox Check1
Caption = "只看没调整的"
Height = 375
Left = 960
TabIndex = 5
Top = 4560
Value = 1 'Checked
Width = 1455
End
Begin VB.CommandButton Command2
Caption = "下一张"
Height = 495
Left = 3000
TabIndex = 4
Top = 4440
Width = 1215
End
Begin VB.CommandButton Command1
Caption = "重新训练"
Height = 495
Left = 5280
TabIndex = 3
Top = 4440
Width = 1335
End
Begin VB.TextBox t
Appearance = 0 'Flat
Height = 3015
Left = 3960
MultiLine = -1 'True
TabIndex = 2
Top = 240
Width = 2775
End
Begin VB.PictureBox p
Appearance = 0 'Flat
AutoRedraw = -1 'True
AutoSize = -1 'True
BackColor = &H80000005&
ForeColor = &H80000008&
Height = 3015
Left = 120
ScaleHeight = 199
ScaleMode = 3 'Pixel
ScaleWidth = 247
TabIndex = 0
Top = 240
Width = 3735
Begin VB.Label l
Appearance = 0 'Flat
BackColor = &H80000005&
BackStyle = 0 'Transparent
BorderStyle = 1 'Fixed Single
ForeColor = &H80000008&
Height = 255
Index = 0
Left = 1080
TabIndex = 1
Top = 720
Visible = 0 'False
Width = 255
End
End
End
Attribute VB_Name = "Form1"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Option Explicit
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As Long