PIL工具对验证码处理,pytesseract进行数字识别

最新推荐文章于 2024-07-15 16:16:51 发布

^o^Danny^o^

最新推荐文章于 2024-07-15 16:16:51 发布

阅读量828

点赞数 3

分类专栏：验证码文章标签：计算机视觉 python 爬虫

本文链接：https://blog.csdn.net/weixin_49911659/article/details/110192951

版权

验证码专栏收录该内容

2 篇文章 0 订阅

订阅专栏

PIL工具对验证码处理,pytesseract进行数字识别

环境搭建

环境搭建

这里需要使用到python,pillow,tesseract

代码讲解

调用process_recognize(img_path)函数,传入函数的路径,返回识别后的数字验证码

插入链接与图片

链接: 暂无.

处理前图片: 未处理前图片

处理后图片:

代码展示

去博客设置页面，选择一款你喜欢的代码片高亮样式，下面展示同样高亮的 代码片.

from PIL import Image
import pytesseract
import re


def process_recognize(img_path):
    """

    :param img_path:  传入图片的保存路径
    :return:
    """
    # 打开图片
    image = Image.open(img_path)
    # 转化为灰度图
    imgry = image.convert('L')
    # 将图片进行二值化处理
    table = get_bin_table()
    binary = imgry.point(table, '1')

    # 收集所有的噪点
    noise_point_list = collect_noise_point(binary)
    # # 对相应位置进行像素修改，将噪声处的像素置为1（白色)
    remove_noise_pixel(binary, noise_point_list)


    # 仅识别图片中的数字
    result = pytesseract.image_to_string(binary, config='digits')
    result = re.match(r'[0-9]+', result)
    info = result.group()
    return info


def sum_9_region_new(img, x, y):
    '''确定噪点 '''
    cur_pixel = img.getpixel((x, y))  # 当前像素点的值
    width = img.width
    height = img.height

    if cur_pixel == 1:  # 如果当前点为白色区域,则不统计邻域值
        return 0

    # 因当前图片的四周都有黑点，所以周围的黑点可以去除
    if y < 3:  # 本例中，前两行的黑点都可以去除
        return 1
    elif y > height - 3:  # 最下面两行
        return 1
    else:  # y不在边界
        if x < 3:  # 前两列
            return 1
        elif x == width - 1:  # 右边非顶点
            return 1
        else:  # 具备9领域条件的
            sum = img.getpixel((x - 1, y - 1)) \
                  + img.getpixel((x - 1, y)) \
                  + img.getpixel((x - 1, y + 1)) \
                  + img.getpixel((x, y - 1)) \
                  + cur_pixel \
                  + img.getpixel((x, y + 1)) \
                  + img.getpixel((x + 1, y - 1)) \
                  + img.getpixel((x + 1, y)) \
                  + img.getpixel((x + 1, y + 1))
            return 9 - sum


def collect_noise_point(img):
    '''收集所有的噪点'''
    noise_point_list = []
    for x in range(img.width):
        for y in range(img.height):
            res_9 = sum_9_region_new(img, x, y)
            if (0 < res_9 < 3) and img.getpixel((x, y)) == 0:  # 找到孤立点
                pos = (x, y)
                noise_point_list.append(pos)
    return noise_point_list


def remove_noise_pixel(img, noise_point_list):
    '''根据噪点的位置信息，消除二值图片的黑点噪声'''
    for item in noise_point_list:
        img.putpixel((item[0], item[1]), 1)


# 按照阈值进行二值化处理
# threshold: 像素阈值
def get_bin_table(threshold=182):
    """获取灰度转二值的映射table,0表示黑色,1表示白色"""
    table = []
    for i in range(256):
        if i < threshold:
            table.append(0)
        else:
            table.append(1)
    return table

注意

每个不同的验证码需要不同的图片处理方式,还有opencv专门处理图片的模块,在这里主要调整threshold: 像素阈值.

^o^Danny^o^

关注

3
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
PIL工具对验证码处理,pytesseract进行数字识别

PIL工具对验证码处理,pytesseract进行数字识别环境搭建模块插入链接与图片代码展示注意环境搭建这里需要使用到python,pillow,tesseract模块插入链接与图片链接: 暂无.处理前图片: 处理后图片: 代码展示去博客设置页面，选择一款你喜欢的代码片高亮样式，下面展示同样高亮的代码片.from PIL import Imageimport pytesseractimport redef process_recognize(img_path): ""
复制链接

扫一扫