PS C:\Users\xyw\Desktop> tesseract --help-extra
1.Usage:
选项 | 描述 |
---|---|
–help | --help-extra | --help-psm | --help-oem | --version | 帮助 |
–list-langs [–tessdata-dir PATH] | 查看支持的语言 |
–print-parameters [options…] [configfile…] | 查看配置 |
imagename|imagelist|stdin outputbase|stdout [options…] [configfile…] | 查看文件 |
Usage:
D:\program\Tesseract-OCR\tesseract.exe --help | --help-extra | --help-psm | --help-oem | --version
D:\program\Tesseract-OCR\tesseract.exe --list-langs [–tessdata-dir PATH]
D:\program\Tesseract-OCR\tesseract.exe --print-parameters [options…] [configfile…]
D:\program\Tesseract-OCR\tesseract.exe imagename|imagelist|stdin outputbase|stdout [options…] [configfile…]
2.OCR 配置
选项 | 描述 |
---|---|
–tessdata-dir PATH | 指定tessdata路径 |
–user-words PATH | 指定本地使用者字符文件 |
–user-patterns PATH | 指定本地使用者模式文件 |
–dpi VALUE | 设置dpi |
-l LANG[+LANG] | 设置OCR识别的语言 |
-c VAR=VALUE | 设置配置文件的变量 |
–psm NUM | 设置page segmentation mode |
–oem NUM | 设置OCR Engine mode |
OCR options:
–tessdata-dir PATH Specify the location of tessdata path.
–user-words PATH Specify the location of user words file.
–user-patterns PATH Specify the location of user patterns file.
–dpi VALUE Specify DPI for input image.
-l LANG[+LANG] Specify language(s) used for OCR.
-c VAR=VALUE Set value for config variables.
Multiple -c arguments are allowed.
–psm NUM Specify page segmentation mode.
–oem NUM Specify OCR Engine mode.
NOTE: These options must occur before any configfile.
3.psm模式设置
参数 | 描述 |
---|---|
0 | 方向和脚本检测(OSD) |
1 | 使用OSD自动分页 |
2 | 自动分页,但没有OSD或OCR |
3 | 全自动页面分割,但没有OSD(默认) |
4 | 假设一列可变大小的文本 |
5 | 假定一个统一的垂直排列文本块 |
6 | 假设一个统一的文本块 |
7 | 将图像视为单个文本行 |
8 | 将图像视为一个单词 |
9 | 将图像视为一个圆圈中的单个单词 |
10 | 将图像视为单个字符 |
11 | |
12 | |
13 |
Page segmentation modes:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.
4.OCR引擎模式选择
参数 | 描述 |
---|---|
0 | Legacy |
1 | LSTM |
2 | Legacy+LSTM |
3 | 使用可用的模式(default) |
OCR Engine modes:
0 Legacy engine only.
1 Neural nets LSTM engine only.
2 Legacy + LSTM engines.
3 Default, based on what is available.
5.单选项
参数 | 描述 |
---|---|
-h, --help | 简要帮助 |
–help-extra | 额外帮助 |
–help-psm | psm配置 |
–help-oem | oem配置 |
-v, --version | 版本信息 |
–list-langs | 支持的语言 |
–print-parameters | 打印相关的参数 |
Single options:
-h, --help Show minimal help message.
–help-extra Show extra help for advanced users.
–help-psm Show page segmentation modes.
–help-oem Show OCR Engine modes.
-v, --version Show version information.
–list-langs List available languages for tesseract engine.
–print-parameters Print tesseract parameters.