python验证码识别

最新推荐文章于 2024-10-05 12:27:06 发布

故厶

最新推荐文章于 2024-10-05 12:27:06 发布

阅读量769

点赞数 1

文章标签： python opencv 计算机视觉

本文链接：https://blog.csdn.net/2301_76620728/article/details/129940678

版权

文章介绍了如何使用Python的pytesseract库来识别图片中的文本，特别是针对验证码的识别。首先，通过灰度化和二值化去除噪点，然后利用pytesseract进行文字提取。此外，还展示了如何设计一个简单的UI界面，让用户能上传图片进行验证码识别。

摘要由CSDN通过智能技术生成

大一时候写的重新整理一下https://blog.csdn.net/m0_68198946/article/details/124366487?spm=1001.2014.3001.5501

1.首先了解一下python怎样识别图片文本内容：

这里用到python的pytessract库

1.首先了解一下python怎样识别图片文本内容：

这里用到python的pytessract库

图1.png

from PIL import Image
import pytesseract
image = Image.open('图1.png')
# 解析图片，lang='chi_sim'表示识别简体中文，默认为English
# 如果是只识别数字，可再加上参数config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'
content = pytesseract.image_to_string(image,config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789')
print(content)

输出结果：

1314521

显然验证码图片不能直接这样识别,要去除噪点

2.思路：

显然图片中最多的颜色就是背景色 其次就是验证码的颜色(第二多)

把图片二值化把验证码的颜色变成白色剩余全是黑色

➩ ➩

原图灰度化要识别的图片

识别部分

import pytesseract
from PIL import Image
import numpy as np
import cv2 as cv
import cv2
import pytesseract

a='xxx.png'
image = Image.open(a)
img = image.convert('L')   # 灰度化
im2=[]
cols,rows = img.size
for x in range(0,rows):
    for y in range(0,cols):
        img_array = np.array(img)
        v = img_array[x,y] # 获取该点像素值
        im2.append(v)#加入数组
while 0 in im2:
    im2.remove(0)#删除灰度0
a=max(im2, key=im2.count)#出现最多的数字   背景颜色
while a in im2:
    im2.remove(a)
a=max(im2, key=im2.count)
table = []
for i in range(256):
    if i ==a:
        table.append(1)
    else:
        table.append(0)
photo = img.point(table, '1')  #图片二值化
photo.save('02.jpg')
image = Image.open('02.jpg')
tessedit_char_whitelist=0123456789'
content = pytesseract.image_to_string(image,config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789')
print(content)

设计个UI

import tkinter as tk
import tkinter.filedialog
gu=tk.Tk()
gu.geometry("200x100")
gu.title("             验证码识别")
def tu():
    kuang.delete(0,"end")  #删除上次识别的数字
    ts=tkinter.filedialog.askopenfilename()  #图片目录
    v.set(识别结果)
an=tk.Button(text="选择",command=tu,bg='skyblue').place(x=130,y=35)
test=tk.Label(text="识别结果",font=("宋体")).place(x=8,y=10)
v = tk.StringVar()
kuang=tk.Entry(width=15,textvariable=v)
kuang.place(x=15,y=40)
gu.mainloop()

完整代码

import tkinter as tk
import tkinter.filedialog
import pytesseract
from PIL import Image
import numpy as np
import cv2 as cv
import cv2
import pytesseract
def shi(a):
    global jieguo
    image = Image.open(a)
    img = image.convert('L')  # 灰度化
    im2 = []
    cols, rows = img.size
    for x in range(0, rows):
        for y in range(0, cols):
            img_array = np.array(img)
            v = img_array[x, y]  # 获取该点像素值
            im2.append(v)  # 加入数组
    while 0 in im2:
        im2.remove(0)  # 删除灰度0
    a = max(im2, key=im2.count)  # 出现最多的数字   背景颜色
    while a in im2:
        im2.remove(a)
    a = max(im2, key=im2.count)
    table = []
    for i in range(256):
        if i == a:
            table.append(1)
        else:
            table.append(0)
    photo = img.point(table, '1')  # 图片二值化
    photo.save('02.jpg')
    image = Image.open('02.jpg')
    # # 解析图片，lang='chi_sim'表示识别简体中文，默认为English
    # # 如果是只识别数字，可再加上参数config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'
    jieguo = pytesseract.image_to_string(image, config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789')

gu=tk.Tk()
gu.geometry("200x100")
gu.title("             验证码识别")
def tu():
    kuang.delete(0,"end")  #删除上次识别的数字
    ts=tkinter.filedialog.askopenfilename()  #图片目录
    shi(ts)
    v.set(jieguo)
an=tk.Button(text="选择",command=tu,bg='skyblue').place(x=130,y=35)
test=tk.Label(text="识别结果",font=("宋体")).place(x=8,y=10)
v = tk.StringVar()
kuang=tk.Entry(width=15,textvariable=v)
kuang.place(x=15,y=40)
gu.mainloop()