第五种：Python使用内置库pytesseract实现图片验证码的识别

平头哥-测试

已于 2024-09-07 15:52:03 修改

阅读量199

点赞数 2

分类专栏： Python 文章标签： python

于 2024-05-15 21:33:06 首次发布

本文链接：https://blog.csdn.net/hyq413950612/article/details/138924129

版权

Python 专栏收录该内容

57 篇文章 1 订阅

订阅专栏

一.安装Tesseract模块

1.Git文档地址：https://digi.bib.uni-mannheim.de/tesseract/

2.百度网盘下载地址：https://pan.baidu.com/s/16RoJ19WynWOKI4Zpr0bKzA
提取码：5hst

二.配置环境变量

1.编辑系统变量里面path，添加下面的安装路径：D:\Program Files\Tesseract-OCR(填写自己的实际安装路径)

三.安装python的第三方库

#一个python的图像处理库，pytesseract依赖
pip install pillow 
pip install pytesseract

1.修改pytesseract.py文件，指定tesseract.exe安装路径

2.编辑pytesseract.py文件(此步骤必须做，否则运行代码时会报错)
tesseract_cmd = 'D:\Program Files\Tesseract-OCR'

在这里插入图片描述

四.代码实现

import requests
from PIL import Image
import pytesseract

# 验证码地址
url = "http://cloud.xxxx.com/checkCode?0.7337270680854053"
response = requests.get(url).content
print(response)

# 将图片写入文件
with open('test.png','wb') as f:
    f.write(response)
    
# 识别验证码
# 第一步：通过内置模块PIL打开文件
image = Image.open('test.png')

#转化为灰度图
image = image.convert('L')  

#设定的二值化阈值
threshold = 160   

#table是设定的列表
# for循环一个规则，小于阈值的，就设定为0，大于阈值的，就设定为1          
table = []                  
for i in range(256):
    if i < threshold:
        table.append(0)
    else:
        table.append(1)

#对灰度图进行二值化处理，按照table的规则（也就是上面的for循环）
image = image.point(table,'1')  
image.show()

#对去噪后的图片进行识别
result = pytesseract.image_to_string(image) 
print('图片内容为:',result)