tesseract 使用说明

最新推荐文章于 2024-07-30 17:30:49 发布

Claroja

最新推荐文章于 2024-07-30 17:30:49 发布

阅读量3.2k

点赞数

分类专栏：图像识别文章标签： tesseract

本文链接：https://blog.csdn.net/claroja/article/details/82980056

版权

图像识别专栏收录该内容

94 篇文章 2 订阅

订阅专栏

PS C:\Users\xyw\Desktop> tesseract --help-extra

1.Usage:

选项	描述
–help \| --help-extra \| --help-psm \| --help-oem \| --version	帮助
–list-langs [–tessdata-dir PATH]	查看支持的语言
–print-parameters [options…] [configfile…]	查看配置
imagename\|imagelist\|stdin outputbase\|stdout [options…] [configfile…]	查看文件

Usage:
D:\program\Tesseract-OCR\tesseract.exe --help | --help-extra | --help-psm | --help-oem | --version
D:\program\Tesseract-OCR\tesseract.exe --list-langs [–tessdata-dir PATH]
D:\program\Tesseract-OCR\tesseract.exe --print-parameters [options…] [configfile…]
D:\program\Tesseract-OCR\tesseract.exe imagename|imagelist|stdin outputbase|stdout [options…] [configfile…]

2.OCR 配置

选项	描述
–tessdata-dir PATH	指定tessdata路径
–user-words PATH	指定本地使用者字符文件
–user-patterns PATH	指定本地使用者模式文件
–dpi VALUE	设置dpi
-l LANG[+LANG]	设置OCR识别的语言
-c VAR=VALUE	设置配置文件的变量
–psm NUM	设置page segmentation mode
–oem NUM	设置OCR Engine mode

OCR options:
–tessdata-dir PATH Specify the location of tessdata path.
–user-words PATH Specify the location of user words file.
–user-patterns PATH Specify the location of user patterns file.
–dpi VALUE Specify DPI for input image.
-l LANG[+LANG] Specify language(s) used for OCR.
-c VAR=VALUE Set value for config variables.
Multiple -c arguments are allowed.
–psm NUM Specify page segmentation mode.
–oem NUM Specify OCR Engine mode.
NOTE: These options must occur before any configfile.

3.psm模式设置

参数	描述
0	方向和脚本检测（OSD）
1	使用OSD自动分页
2	自动分页，但没有OSD或OCR
3	全自动页面分割，但没有OSD（默认）
4	假设一列可变大小的文本
5	假定一个统一的垂直排列文本块
6	假设一个统一的文本块
7	将图像视为单个文本行
8	将图像视为一个单词
9	将图像视为一个圆圈中的单个单词
10	将图像视为单个字符
11
12
13

Page segmentation modes:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.

4.OCR引擎模式选择

参数	描述
0	Legacy
1	LSTM
2	Legacy+LSTM
3	使用可用的模式(default)

OCR Engine modes:
0 Legacy engine only.
1 Neural nets LSTM engine only.
2 Legacy + LSTM engines.
3 Default, based on what is available.

5.单选项

参数	描述
-h, --help	简要帮助
–help-extra	额外帮助
–help-psm	psm配置
–help-oem	oem配置
-v, --version	版本信息
–list-langs	支持的语言
–print-parameters	打印相关的参数

Single options:
-h, --help Show minimal help message.
–help-extra Show extra help for advanced users.
–help-psm Show page segmentation modes.
–help-oem Show OCR Engine modes.
-v, --version Show version information.
–list-langs List available languages for tesseract engine.
–print-parameters Print tesseract parameters.

参考文献：
https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#tsv-output-currently-available-in-305-dev-in-master-branch-on-github