tesseract 使用说明

PS C:\Users\xyw\Desktop> tesseract --help-extra

1.Usage:

选项描述
–help | --help-extra | --help-psm | --help-oem | --version帮助
–list-langs [–tessdata-dir PATH]查看支持的语言
–print-parameters [options…] [configfile…]查看配置
imagename|imagelist|stdin outputbase|stdout [options…] [configfile…]查看文件

Usage:
D:\program\Tesseract-OCR\tesseract.exe --help | --help-extra | --help-psm | --help-oem | --version
D:\program\Tesseract-OCR\tesseract.exe --list-langs [–tessdata-dir PATH]
D:\program\Tesseract-OCR\tesseract.exe --print-parameters [options…] [configfile…]
D:\program\Tesseract-OCR\tesseract.exe imagename|imagelist|stdin outputbase|stdout [options…] [configfile…]

2.OCR 配置

选项描述
–tessdata-dir PATH指定tessdata路径
–user-words PATH指定本地使用者字符文件
–user-patterns PATH指定本地使用者模式文件
–dpi VALUE设置dpi
-l LANG[+LANG]设置OCR识别的语言
-c VAR=VALUE设置配置文件的变量
–psm NUM设置page segmentation mode
–oem NUM设置OCR Engine mode

OCR options:
–tessdata-dir PATH Specify the location of tessdata path.
–user-words PATH Specify the location of user words file.
–user-patterns PATH Specify the location of user patterns file.
–dpi VALUE Specify DPI for input image.
-l LANG[+LANG] Specify language(s) used for OCR.
-c VAR=VALUE Set value for config variables.
Multiple -c arguments are allowed.
–psm NUM Specify page segmentation mode.
–oem NUM Specify OCR Engine mode.
NOTE: These options must occur before any configfile.

3.psm模式设置

参数描述
0方向和脚本检测(OSD)
1使用OSD自动分页
2自动分页,但没有OSD或OCR
3全自动页面分割,但没有OSD(默认)
4假设一列可变大小的文本
5假定一个统一的垂直排列文本块
6假设一个统一的文本块
7将图像视为单个文本行
8将图像视为一个单词
9将图像视为一个圆圈中的单个单词
10将图像视为单个字符
11
12
13

Page segmentation modes:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.

4.OCR引擎模式选择

参数描述
0Legacy
1LSTM
2Legacy+LSTM
3使用可用的模式(default)

OCR Engine modes:
0 Legacy engine only.
1 Neural nets LSTM engine only.
2 Legacy + LSTM engines.
3 Default, based on what is available.

5.单选项

参数描述
-h, --help简要帮助
–help-extra额外帮助
–help-psmpsm配置
–help-oemoem配置
-v, --version版本信息
–list-langs支持的语言
–print-parameters打印相关的参数

Single options:
-h, --help Show minimal help message.
–help-extra Show extra help for advanced users.
–help-psm Show page segmentation modes.
–help-oem Show OCR Engine modes.
-v, --version Show version information.
–list-langs List available languages for tesseract engine.
–print-parameters Print tesseract parameters.

参考文献:
https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#tsv-output-currently-available-in-305-dev-in-master-branch-on-github

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值