Python之pytesseract模块-实现OCR

最新推荐文章于 2024-06-05 20:06:36 发布

虚坏叔叔

最新推荐文章于 2024-06-05 20:06:36 发布

阅读量725

点赞数

分类专栏：经验文章标签： svn linux 服务器

本文链接：https://blog.csdn.net/biggbang/article/details/121392163

版权

经验专栏收录该内容

173 篇文章 54 订阅

订阅专栏

欢迎关注原创视频教程

Python微信订餐小程序课程视频

https://edu.csdn.net/course/detail/36074

Python实战量化交易理财系统

https://edu.csdn.net/course/detail/35475
在给PC端应用做自动化测试时，某些情况下无法定位界面上的控件，但我们又想获得界面上的文字，则可以通过截图后从图片上去获取该文字信息。那么，Python中有没有对应的工具来实现OCR呢？答案是有的，它叫pytesseract。官方给它的定义如下，一起来了解和使用吧。

Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images.

Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine. It is also useful as a stand-alone invocation script to tesseract, as it can read all image types supported by the Pillow and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others. Additionally, if used as a script, Python-tesseract will print the recognized text instead of writing it to a file.

安装

1.首先下载并安装teseseract安装包，下载地址：https://digi.bib.uni-mannheim.de/tesseract/

2.安装完成后，添加系统环境变量。

3.安装对应的Python库。在实践过程中，单独安装pytesseract时会报错，需要与pillow一起安装。

pip install pillow
pip install pytesseract

4.根据需要解析的文字语言，下载对应的语言包，下载地址：https://github.com/tesseract-ocr/tessdata ，拿中文语言包举例，下载chi_sim.traineddata后，将其放入Teseseract-OCR安装目录下的tessdata目录即可。

使用

举个例子，想要提取图片中的“酌三巡”三个字。

使用方法非常简单，调用pytesseract.image_to_string()方法即可。

from PIL import Image
import pytesseract

img = Image.open("demo.png")
ocr_text = pytesseract.image_to_string(img, lang="chi_sim")
print("提取结果为：", ocr_text)

运行结果：

参考资料

https://github.com/madmaze/pytesseract
https://github.com/tesseract-ocr/tesseract

虚坏叔叔

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python之pytesseract模块-实现OCR

在给PC端应用做自动化测试时，某些情况下无法定位界面上的控件，但我们又想获得界面上的文字，则可以通过截图后从图片上去获取该文字信息。那么，Python中有没有对应的工具来实现OCR呢？答案是有的，它叫pytesseract。官方给它的定义如下，一起来了解和使用吧。Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text
复制链接

扫一扫