Windows环境下使用pytesseract识别图片验证码

最新推荐文章于 2025-03-27 19:59:31 发布

yt1318519610

最新推荐文章于 2025-03-27 19:59:31 发布

阅读量281

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/yt1318519610/article/details/111799721

版权

python 专栏收录该内容

5 篇文章

订阅专栏

本文指导如何安装tesseract和tesserocr，包括从特定网址下载软件和语言包，并配置环境变量。接着介绍了如何通过pip安装pytesseract，以及使用tesseract命令行工具进行基本操作，如查看可用语言和帮助信息。最后展示了pytesseract在Python中的用法，用于识别图像中的文字，包括指定语言、获取边界框和详细数据等。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

安装tesserocr前应先安装tesseract，tesserocr和tesseract安装时，必须安装对应版本

参考博客：

https://blog.csdn.net/qq_41895190/article/details/82696550

https://www.cnblogs.com/zhangxinqi/p/9297292.html#_label2

1.tesseract下载地址：

https://digi.bib.uni-mannheim.de/tesseract/

2.下载语言包太慢，可以直接从Github上下载zip的语言包压缩文件，解压后将tessdata-master中的文件复制到Tesseract的安装目录中的tessdata下，tesseract-ocr语言包下载地址：

https://github.com/tesseract-ocr/tessdata

3.配置环境变量，将C:\Program Files (x86)\Tesseract-OCR添加到环境变量中

4.安装pytesseract

pip install pytesseract

5.tesseract命令

#显示安装的语言包
tesseract --list-langs

#显示帮助
tesseract --help
tesseract --help-extra
tesseract --version

6.使用pytessercat识别验证码

from PIL import Image
import pytesseract

# 如果PATH中没有tesseract可执行文件，请指定tesseract路径
pytesseract.pytesseract.tesseract_cmd = r"C:\Users\01403087\AppData\Local\Tesseract-OCR\tesseract.exe"

# 打印识别的图像的字符串
print(pytesseract.image_to_string(Image.open('test.png')))

# 指定语言识别图像字符串,eng为英语
print(pytesseract.image_to_string(Image.open('test.png'), lang='eng'))

# 获取图像边界框
print(pytesseract.image_to_boxes(Image.open('test.png')))

# 获取包含边界框，置信度，行和页码的详细数据
print(pytesseract.image_to_data(Image.open('test.png')))

# 获取方向和脚本检测
print(pytesseract.image_to_osd(Image.open('test.png')))