Python爬虫实战爬取验证码
1.破解验证码常见的三种方法:
(1)把验证码下载到本地,手动输入进行破解
(2)Tesseract光学识别模块:能够自动识别验证码,准确率不高,只能识别一些简单验证码
代码测试
pip install pytesseract
pip install pillow
# 转化为灰度图片
img = img.convert('L')
img.show()
# 二值化处理
threshold = 140
table = []
for i in range(256):
if i < threshold:
table.append(0)
else:
table.append(1)
out = img.point(table, '1')
out.show()
img = img.convert('RGB')
enhancer = ImageEnhance.Color(img)
enhancer = enhancer.enhance(0)
enhancer = ImageEnhance.Brightness(enhancer)
enhancer = enhancer.enhance(2)
enhancer = ImageEnhance.Contrast(enhancer)
enhancer = enhancer.enhance(8)
enhancer = ImageEnhance.Sharpn