卷积神经网络 CNN 验证码破解

使用卷积神经网络CNN的网站验证码破解

项目背景及需求
在对 网站进行访问时,网站会弹出验证码,用来进行核实人机身份,验证码可能会出现在两种场景:

  1. 登录网站时
  2. 频繁访问时

由于要实现爬取数据,需要对验证码进行识别处理,可以采用以下方式:

  • 暂停
  • Cookie
  • 人工识别
  • 程序识别
暂停

优点:实现较简单,通用性好。
缺点:全程需要人工参与,无法实现数据爬取的自动化。

Cookie

优点:实现较简单,通用性好。
缺点:程序启动后,无需人工参与,但仅能处理登录时的验证码,局限较高。

人工识别

优点:实现非常简单,通用性很好。
缺点:需要花费费用,及时性、稳定性等无可控性。

程序识别

优点:无需人工参与,及时性与稳定性等具有可控性。
缺点:实现复杂,通用性差。

为了实现爬取数据的自动化及稳定性,这里使用卷积神经网络CNN来识别验证码程序。

生成验证码

这里使用captcha库模拟生成验证码(在实际项目中要爬取目标网站的验证码进行训练)。

安装

captcha的安装非常简单,执行如下的命令即可:

pip install captcha

captcha需要依赖于pillow库,如果没有安装pillow库,则pillow库会一同安装。

from captcha.image import ImageCaptcha
from PIL import Image

image = ImageCaptcha()
# 返回BytesIO类文件对象。生成参数指定的验证码。
bio = image.generate("abcd")
# 通过Pillow库的Image类的open方法,可以打开类文件对象,返回Pillow中的
# Image对象。
# Image.open(bio)
# 向指定的路径写入验证码文件。第1个参数:验证码的内容。第2个参数:验证码文件的路径。
image.write("abcd", "d:/test.png")
# sring模块的使用
import string
# 返回ascii字符集中所有的小写字母与大写字母。
print(string.ascii_lowercase)
print(string.ascii_uppercase)
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ

执行步骤
首先,生成验证码数据集,包含验证集与测试集。

python datasets/gen_captcha.py -d --npi=4 -n 6

然后,进行训练。

python cnn_n_char.py --data_dir images/char-4-epoch-6/


训练次数
训练10000次,准确率如下:
step 100, training accuracy = 12.50%, testing accuracy = 9.00%
step 200, training accuracy = 10.00%, testing accuracy = 12.50%
step 300, training accuracy = 15.50%, testing accuracy = 9.00%
step 400, training accuracy = 17.50%, testing accuracy = 18.50%
step 500, training accuracy = 20.50%, testing accuracy = 19.75%
step 600, training accuracy = 21.00%, testing accuracy = 23.00%
step 700, training accuracy = 21.00%, testing accuracy = 24.75%
step 800, training accuracy = 29.50%, testing accuracy = 27.00%
step 900, training accuracy = 29.50%, testing accuracy = 32.25%
step 1000, training accuracy = 30.00%, testing accuracy = 30.50%
step 1100, training accuracy = 32.50%, testing accuracy = 31.00%
step 1200, training accuracy = 38.50%, testing accuracy = 38.00%
step 1300, training accuracy = 38.00%, testing accuracy = 33.75%
step 1400, training accuracy = 37.00%, testing accuracy = 36.50%
step 1500, training accuracy = 39.50%, testing accuracy = 33.75%
step 1600, training accuracy = 43.00%, testing accuracy = 34.25%
step 1700, training accuracy = 38.50%, testing accuracy = 37.75%
step 1800, training accuracy = 40.00%, testing accuracy = 38.25%
step 1900, training accuracy = 44.00%, testing accuracy = 40.00%
step 2000, training accuracy = 48.00%, testing accuracy = 41.75%
step 2100, training accuracy = 44.00%, testing accuracy = 46.25%
step 2200, training accuracy = 39.00%, testing accuracy = 46.00%
step 2300, training accuracy = 48.00%, testing accuracy = 45.00%
step 2400, training accuracy = 49.00%, testing accuracy = 45.75%
step 2500, training accuracy = 51.00%, testing accuracy = 47.25%
step 2600, training accuracy = 50.00%, testing accuracy = 47.25%
step 2700, training accuracy = 49.50%, testing accuracy = 44.25%
step 2800, training accuracy = 45.50%, testing accuracy = 51.50%
step 2900, training accuracy = 42.50%, testing accuracy = 41.25%
step 3000, training accuracy = 50.50%, testing accuracy = 47.00%
step 3100, training accuracy = 54.00%, testing accuracy = 49.25%
step 3200, training accuracy = 51.00%, testing accuracy = 48.00%
step 3300, training accuracy = 53.00%, testing accuracy = 47.00%
step 3400, training accuracy = 50.00%, testing accuracy = 52.75%
step 3500, training accuracy = 60.00%, testing accuracy = 49.75%
step 3600, training accuracy = 53.50%, testing accuracy = 46.50%
step 3700, training accuracy = 60.50%, testing accuracy = 53.75%
step 3800, training accuracy = 53.00%, testing accuracy = 54.00%
step 3900, training accuracy = 56.50%, testing accuracy = 48.75%
step 4000, training accuracy = 58.00%, testing accuracy = 50.50%
step 4100, training accuracy = 54.00%, testing accuracy = 49.25%
step 4200, training accuracy = 53.50%, testing accuracy = 46.00%
step 4300, training accuracy = 55.00%, testing accuracy = 56.00%
step 4400, training accuracy = 66.00%, testing accuracy = 52.25%
step 4500, training accuracy = 57.50%, testing accuracy = 60.50%
step 4600, training accuracy = 60.50%, testing accuracy = 54.25%
step 4700, training accuracy = 59.00%, testing accuracy = 58.75%
step 4800, training accuracy = 60.50%, testing accuracy = 57.00%
step 4900, training accuracy = 63.00%, testing accuracy = 57.75%
step 5000, training accuracy = 59.50%, testing accuracy = 61.75%
step 5100, training accuracy = 62.50%, testing accuracy = 55.25%
step 5200, training accuracy = 60.00%, testing accuracy = 60.50%
step 5300, training accuracy = 66.50%, testing accuracy = 57.75%
step 5400, training accuracy = 64.50%, testing accuracy = 53.50%
step 5500, training accuracy = 65.00%, testing accuracy = 56.75%
step 5600, training accuracy = 63.50%, testing accuracy = 51.00%
step 5700, training accuracy = 68.50%, testing accuracy = 62.25%
step 5800, training accuracy = 64.00%, testing accuracy = 57.50%
step 5900, training accuracy = 69.50%, testing accuracy = 59.75%
step 6000, training accuracy = 64.00%, testing accuracy = 54.50%
step 6100, training accuracy = 61.50%, testing accuracy = 62.50%
step 6200, training accuracy = 70.00%, testing accuracy = 60.50%
step 6300, training accuracy = 71.00%, testing accuracy = 54.75%
step 6400, training accuracy = 65.50%, testing accuracy = 57.00%
step 6500, training accuracy = 71.00%, testing accuracy = 62.75%
step 6600, training accuracy = 68.00%, testing accuracy = 64.75%
step 6700, training accuracy = 65.50%, testing accuracy = 59.75%
step 6800, training accuracy = 71.00%, testing accuracy = 61.50%
step 6900, training accuracy = 68.00%, testing accuracy = 62.00%
step 7000, training accuracy = 69.50%, testing accuracy = 60.00%
step 7100, training accuracy = 71.00%, testing accuracy = 60.00%
step 7200, training accuracy = 76.00%, testing accuracy = 62.25%
step 7300, training accuracy = 76.00%, testing accuracy = 66.50%
step 7400, training accuracy = 72.00%, testing accuracy = 62.00%
step 7500, training accuracy = 73.50%, testing accuracy = 64.75%
step 7600, training accuracy = 68.00%, testing accuracy = 62.00%
step 7700, training accuracy = 69.00%, testing accuracy = 63.00%
step 7800, training accuracy = 77.50%, testing accuracy = 66.00%
step 7900, training accuracy = 82.50%, testing accuracy = 65.75%
step 8000, training accuracy = 78.50%, testing accuracy = 66.25%
step 8100, training accuracy = 77.50%, testing accuracy = 66.00%
step 8200, training accuracy = 80.50%, testing accuracy = 58.00%
step 8300, training accuracy = 78.00%, testing accuracy = 66.00%
step 8400, training accuracy = 77.50%, testing accuracy = 65.00%
step 8500, training accuracy = 81.50%, testing accuracy = 63.75%
step 8600, training accuracy = 76.00%, testing accuracy = 63.25%
step 8700, training accuracy = 74.00%, testing accuracy = 63.75%
step 8800, training accuracy = 83.00%, testing accuracy = 60.75%
step 8900, training accuracy = 79.00%, testing accuracy = 62.50%
step 9000, training accuracy = 76.00%, testing accuracy = 63.50%
step 9100, training accuracy = 77.00%, testing accuracy = 66.50%
step 9200, training accuracy = 84.50%, testing accuracy = 64.00%
step 9300, training accuracy = 84.50%, testing accuracy = 68.25%
step 9400, training accuracy = 80.50%, testing accuracy = 67.25%
step 9500, training accuracy = 82.50%, testing accuracy = 69.00%
step 9600, training accuracy = 80.50%, testing accuracy = 65.00%
step 9700, training accuracy = 90.00%, testing accuracy = 70.00%
step 9800, training accuracy = 87.00%, testing accuracy = 63.25%
step 9900, training accuracy = 87.00%, testing accuracy = 65.75%
step 10000, training accuracy = 85.00%, testing accuracy = 68.75%

如果想要训练一个期望的准确率(可能达到的准确率),可以采用无限循环,然后在准确率达到期望值时停止训练。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值