ipython tesseract_python的tesseract库几个重要的命令

最新推荐文章于 2021-12-21 20:27:32 发布

weixin_39939276

最新推荐文章于 2021-12-21 20:27:32 发布

阅读量79

点赞数

文章标签： ipython tesseract

在调用tesseract时，最重要的三个参数是 -l， -oem 和 -psm

-l 参数控制识别文本的语言。可以通过命令 tesseract --list-langs 查看已经安装的字库。

支持中文：下载中文扩展 https://github.com/tesseract-ocr/tessdata，把里面的 chi_sim.traineddata 复制到 **\Tesseract-OCR\tessdata 的路径。

-oem参数控制OCR的引擎模式，控制由超正方体使用的算法类型。可以通过命令tesseract --help-oem查看可用的引擎模式，一般有四种模式，默认第四种，可以用--oem 1表示只希望用深度学习LSTM引擎。

OCR Engine modes:

0 Legacy engine only.

1 Neural nets LSTM engine only.

2 Legacy + LSTM engines.

3 Default, based on what is available.

-psm 参数控制tesseract使用的自动页面分割模式。使用 tesseract --help-psm 查看模式，我发现对于小文本，模式6和7运行良好，如果是大块文本，可以试试默认的3模式。

Page segmentation modes:

0 Orientation and script detection (OSD) only.

1 Automatic page segmentation with OSD.

2 Automatic page segmentation, but no OSD, or OCR.

3 Fully automatic page segmentation, but no OSD. (Default)

4 Assume a single column of text of variable sizes.

5 Assume a single uniform block of vertically aligned text.

6 Assume a single uniform block of text.

7 Treat the image as a single text line.

8 Treat the image as a single word.

9 Treat the image as a single word in a circle.

10 Treat the image as a single character.

11 Sparse text. Find as much text as possible in no particular order.

12 Sparse text with OSD.

13 Raw line. Treat the image as a single text line,

bypassing hacks that are Tesseract-specific.

使用：

img = Image.open('./img.png')

config= ("-l chi_sim --oem 1 --psm 7")

text= pytesseract.image_to_string(img, config=config)

weixin_39939276

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
ipython tesseract_python的tesseract库几个重要的命令

在调用tesseract时，最重要的三个参数是 -l， -oem 和 -psm-l 参数控制识别文本的语言。可以通过命令tesseract --list-langs 查看已经安装的字库。支持中文：下载中文扩展https://github.com/tesseract-ocr/tessdata，把里面的 chi_sim.traineddata 复制到**\Tesseract-OCR\tessd...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。