使用卷积神经网络CNN的网站验证码破解
项目背景及需求
在对 网站进行访问时,网站会弹出验证码,用来进行核实人机身份,验证码可能会出现在两种场景:
- 登录网站时
- 频繁访问时
由于要实现爬取数据,需要对验证码进行识别处理,可以采用以下方式:
- 暂停
- Cookie
- 人工识别
- 程序识别
暂停
优点:实现较简单,通用性好。
缺点:全程需要人工参与,无法实现数据爬取的自动化。
Cookie
优点:实现较简单,通用性好。
缺点:程序启动后,无需人工参与,但仅能处理登录时的验证码,局限较高。
人工识别
优点:实现非常简单,通用性很好。
缺点:需要花费费用,及时性、稳定性等无可控性。
程序识别
优点:无需人工参与,及时性与稳定性等具有可控性。
缺点:实现复杂,通用性差。
为了实现爬取数据的自动化及稳定性,这里使用卷积神经网络CNN来识别验证码程序。
生成验证码
这里使用captcha库模拟生成验证码(在实际项目中要爬取目标网站的验证码进行训练)。
安装
captcha的安装非常简单,执行如下的命令即可:
pip install captcha
captcha需要依赖于pillow库,如果没有安装pillow库,则pillow库会一同安装。
from captcha.image import ImageCaptcha
from PIL import Image
image = ImageCaptcha()
# 返回BytesIO类文件对象。生成参数指定的验证码。
bio = image.generate("abcd")
# 通过Pillow库的Image类的open方法,可以打开类文件对象,返回Pillow中的
# Image对象。
# Image.open(bio)
# 向指定的路径写入验证码文件。第1个参数:验证码的内容。第2个参数:验证码文件的路径。
image.write("abcd", "d:/test.png")
# sring模块的使用
import string
# 返回ascii字符集中所有的小写字母与大写字母。
print(string.ascii_lowercase)
print(string.ascii_uppercase)
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
执行步骤
首先,生成验证码数据集,包含验证集与测试集。
python datasets/gen_captcha.py -d --npi=4 -n 6
然后,进行训练。
python cnn_n_char.py --data_dir images/char-4-epoch-6/
训练次数
训练10000次,准确率如下:
step 100, training accuracy = 12.50%, testing accuracy = 9.00%
step 200, training accuracy = 10.00%, testing accuracy = 12.50%
step 300, training accuracy = 15.50%, testing accuracy = 9.00%
step 400, training accuracy = 17.50%, testing accuracy = 18.50%
step 500, training accuracy = 20.50%, testing accuracy = 19.75%
step 600, training accuracy = 21.00%, testing accuracy = 23.00%
step 700, training accuracy = 21.00%, testing accuracy = 24.75%
step 800, training accuracy = 29.50%, testing accuracy = 27.00%
step 900, training accuracy = 29.50%, testing accuracy = 32.25%
step 1000, training accuracy = 30.00%, testing accuracy = 30.50%
step 1100, training accuracy = 32.50%, testing accuracy = 31.00%
step 1200, training accuracy = 38.50%, testing accuracy = 38.00%
step 1300, training accuracy = 38.00%, testing accuracy = 33.75%
step 1400, training accuracy = 37.00%, testing accuracy = 36.50%
step 1500, training accuracy = 39.50%, testing accuracy = 33.75%
step 1600, training accuracy = 43.00%, testing accuracy = 34.25%
step 1700, training accuracy = 38.50%, testing accuracy = 37.75%
step 1800, training accuracy = 40.00%, testing accuracy = 38.25%
step 1900, training accuracy = 44.00%, testing accuracy = 40.00%
step 2000, training accuracy = 48.00%, testing accuracy = 41.75%
step 2100, training accuracy = 44.00%, testing accuracy = 46.25%
step 2200, training accuracy = 39.00%, testing accuracy = 46.00%
step 2300, training accuracy = 48.00%, testing accuracy = 45.00%
step 2400, training accuracy = 49.00%, testing accuracy = 45.75%
step 2500, training accuracy = 51.00%, testing accuracy = 47.25%
step 2600, training accuracy = 50.00%, testing accuracy = 47.25%
step 2700, training accuracy = 49.50%, testing accuracy = 44.25%
step 2800, training accuracy = 45.50%, testing accuracy = 51.50%
step 2900, training accuracy = 42.50%, testing accuracy = 41.25%
step 3000, training accuracy = 50.50%, testing accuracy = 47.00%
step 3100, training accuracy = 54.00%, testing accuracy = 49.25%
step 3200, training accuracy = 51.00%, testing accuracy = 48.00%
step 3300, training accuracy = 53.00%, testing accuracy = 47.00%
step 3400, training accuracy = 50.00%, testing accuracy = 52.75%
step 3500, training accuracy = 60.00%, testing accuracy = 49.75%
step 3600, training accuracy = 53.50%, testing accuracy = 46.50%
step 3700, training accuracy = 60.50%, testing accuracy = 53.75%
step 3800, training accuracy = 53.00%, testing accuracy = 54.00%
step 3900, training accuracy = 56.50%, testing accuracy = 48.75%
step 4000, training accuracy = 58.00%, testing accuracy = 50.50%
step 4100, training accuracy = 54.00%, testing accuracy = 49.25%
step 4200, training accuracy = 53.50%, testing accuracy = 46.00%
step 4300, training accuracy = 55.00%, testing accuracy = 56.00%
step 4400, training accuracy = 66.00%, testing accuracy = 52.25%
step 4500, training accuracy = 57.50%, testing accuracy = 60.50%
step 4600, training accuracy = 60.50%, testing accuracy = 54.25%
step 4700, training accuracy = 59.00%, testing accuracy = 58.75%
step 4800, training accuracy = 60.50%, testing accuracy = 57.00%
step 4900, training accuracy = 63.00%, testing accuracy = 57.75%
step 5000, training accuracy = 59.50%, testing accuracy = 61.75%
step 5100, training accuracy = 62.50%, testing accuracy = 55.25%
step 5200, training accuracy = 60.00%, testing accuracy = 60.50%
step 5300, training accuracy = 66.50%, testing accuracy = 57.75%
step 5400, training accuracy = 64.50%, testing accuracy = 53.50%
step 5500, training accuracy = 65.00%, testing accuracy = 56.75%
step 5600, training accuracy = 63.50%, testing accuracy = 51.00%
step 5700, training accuracy = 68.50%, testing accuracy = 62.25%
step 5800, training accuracy = 64.00%, testing accuracy = 57.50%
step 5900, training accuracy = 69.50%, testing accuracy = 59.75%
step 6000, training accuracy = 64.00%, testing accuracy = 54.50%
step 6100, training accuracy = 61.50%, testing accuracy = 62.50%
step 6200, training accuracy = 70.00%, testing accuracy = 60.50%
step 6300, training accuracy = 71.00%, testing accuracy = 54.75%
step 6400, training accuracy = 65.50%, testing accuracy = 57.00%
step 6500, training accuracy = 71.00%, testing accuracy = 62.75%
step 6600, training accuracy = 68.00%, testing accuracy = 64.75%
step 6700, training accuracy = 65.50%, testing accuracy = 59.75%
step 6800, training accuracy = 71.00%, testing accuracy = 61.50%
step 6900, training accuracy = 68.00%, testing accuracy = 62.00%
step 7000, training accuracy = 69.50%, testing accuracy = 60.00%
step 7100, training accuracy = 71.00%, testing accuracy = 60.00%
step 7200, training accuracy = 76.00%, testing accuracy = 62.25%
step 7300, training accuracy = 76.00%, testing accuracy = 66.50%
step 7400, training accuracy = 72.00%, testing accuracy = 62.00%
step 7500, training accuracy = 73.50%, testing accuracy = 64.75%
step 7600, training accuracy = 68.00%, testing accuracy = 62.00%
step 7700, training accuracy = 69.00%, testing accuracy = 63.00%
step 7800, training accuracy = 77.50%, testing accuracy = 66.00%
step 7900, training accuracy = 82.50%, testing accuracy = 65.75%
step 8000, training accuracy = 78.50%, testing accuracy = 66.25%
step 8100, training accuracy = 77.50%, testing accuracy = 66.00%
step 8200, training accuracy = 80.50%, testing accuracy = 58.00%
step 8300, training accuracy = 78.00%, testing accuracy = 66.00%
step 8400, training accuracy = 77.50%, testing accuracy = 65.00%
step 8500, training accuracy = 81.50%, testing accuracy = 63.75%
step 8600, training accuracy = 76.00%, testing accuracy = 63.25%
step 8700, training accuracy = 74.00%, testing accuracy = 63.75%
step 8800, training accuracy = 83.00%, testing accuracy = 60.75%
step 8900, training accuracy = 79.00%, testing accuracy = 62.50%
step 9000, training accuracy = 76.00%, testing accuracy = 63.50%
step 9100, training accuracy = 77.00%, testing accuracy = 66.50%
step 9200, training accuracy = 84.50%, testing accuracy = 64.00%
step 9300, training accuracy = 84.50%, testing accuracy = 68.25%
step 9400, training accuracy = 80.50%, testing accuracy = 67.25%
step 9500, training accuracy = 82.50%, testing accuracy = 69.00%
step 9600, training accuracy = 80.50%, testing accuracy = 65.00%
step 9700, training accuracy = 90.00%, testing accuracy = 70.00%
step 9800, training accuracy = 87.00%, testing accuracy = 63.25%
step 9900, training accuracy = 87.00%, testing accuracy = 65.75%
step 10000, training accuracy = 85.00%, testing accuracy = 68.75%
如果想要训练一个期望的准确率(可能达到的准确率),可以采用无限循环,然后在准确率达到期望值时停止训练。