图像文字识别python+tesseract-ocr+pillow

古墙白

已于 2022-10-11 18:05:05 修改

阅读量386

点赞数

文章标签： python pillow 人工智能

于 2022-10-08 16:22:40 首次发布

本文链接：https://blog.csdn.net/qq_43540385/article/details/127210824

版权

图像文字识别python+tesseract-ocr+pillow

图像文字识别python+tesseract-ocr+pillow

图像文字识别python+tesseract-ocr+pillow

查了一些资料，关于文字识别，基本上都是基于光学+训练，可以用于工程机器学习
而本人只是简单的用python做图片英文字母识别，不涉及训练。

工具：python+tesseract-ocr+pillow

1. 安装tesseract-ocr

win10-64位：
下载地址：https://digi.bib.uni-mannheim.de/tesseract/
官方安装指引：https://github.com/tesseract-ocr/tessdoc/blob/main/Installation.md
语言包：https://tesseract-ocr.github.io/tessdoc/Data-Files
本人只识别英文安装包自带英文：eng English eng.traineddata 不额外下载语言包
常用语言包：

chi_sim	Chinese - Simplified	chi_sim.traineddata    
chi_tra	Chinese - Traditional	chi_tra.traineddata

下载最新的exe文件
在这里插入图片描述
这里安装一个最新的：
[ ] tesseract-ocr-w64-setup-v5.2.0.20220712.exe 2022-07-12 14:26 54M

常规安装即可，记住安装路径 D:\Tesseract

2. 下载python api pytesseract

pip install pytesseract
#顺便安装pillow
pip install pillow
pip list

3. 配置环境变量

tesseract.exe文件地址
cmd命令框的环境变量：右击我的电脑/此电脑->属性->高级系统设置->环境变量->Path->编辑->新建然后将我们的路径D:\Tesseract 复制进去即可。
pycharm环境变量：修改python里的 D:\Python310\Lib\site-packages\pytesseract\pytesseract.py脚本

#tesseract_cmd = ‘tesseract’ #原代码
tesseract_cmd = r’D:\Tesseract\tesseract.exe’ #修改为Tesseract绝对路径 r防止转义路径

4. 调试

4.1 cmd命令：

C:\Users\signway>tesseract -v
tesseract v5.2.0.20220712
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX512BW
 Found AVX512F
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5
 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0

C:\Users\signway>tesseract --list-langs
List of available languages in "D:\Tesseract/tessdata/" (2):
eng
osd

C:\Users\signway>tesseract D:\icon.png re
Estimating resolution as 488

打开re.txt 看到识别的文字

4.2 pycharm调试：

from PIL import Image
import pytesseract

text = pytesseract.image_to_string(Image.open('D:\\icon.png'))
print(text)

D:\Python310\python.exe D:/pythonProject/tesseract_debug.py 
se
SIGNWAY


Process finished with exit code 0

成功

古墙白

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
图像文字识别python+tesseract-ocr+pillow

图像文字识别python+tesseract-ocr+pillow
复制链接

扫一扫