前段时间实现了滑动验证码的破解,只是简单的利用opencv来进行缺口位置识别,然后用selenium进行人为拖动,实现方式上没有用到神经网络,没有凸显出深度学习的强大。赶巧,最近又接到一个字符型验证码破解的求助,于是想使用神经网络来实现。
字符型的验证码估计大家也都习以为常了,先来看下我接到的字符型验证码如下:
比较幸运的是,每张图片的名字就是图片里验证码的内容,这个对于我来说,就省了90%的工程(不需要去手工打标签,标注每张图片的内容了),这些就是我们的训练集了,我们的神经网络输入就是这些照片,而神经网络的输出就是这些照片的名称(忽略掉.jpg)。
谈到图像的处理与识别,首先想到的神经网络就是大名鼎鼎得CNN网络(卷积神经网络),它在图像处理领域有着独特的优势和广泛的应用。如果你对CNN了解不多,建议去看下CNN实现mnist手写体数据集的识别,这里就不做过多讲解了。
这里先简单介绍下用的框架和包,搭建神经网络,首先要用到的就是tensorflow,本文是基于tensorflow来实现的,对图像的处理,就不得不提到opencv,本文是用opencv来读取图片和对图片进行一些列的操作,最后用的是numpy包,这个用来做数组处理的。当然可视化部分需要用到matplotlib.pyplot。
import cv2
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as mlt
从图片中,我们可以看到验证码都是数字加小写的英文字母,这个我们需要建立一个字典,将小写字母转成数字,以便于神经网络的输出。
import string
label_dict={}
characters = string.digits + string.ascii_lowercase
for i,x in enumerate(characters):
label_dict[x]=i
接下来是定义一些卷积神经网络中的一些常用函数,卷积函数,池化函数,权重和偏置生成函数。至于卷积和池化操作,不懂的童鞋自行百度。
def conv2d(x,w):
return tf.nn.conv2d(x,w,strides=[1,1,1,1],padding='SAME',name='conv2d')
def max_pool(x):
return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME',name='maxpool')
def weight_variable(shape):
return tf.Variable(tf.truncated_normal(shape,stddev=0.1))
def bias_variable(shape):
return tf.Variable(tf.constant(0.1,shape=shape))
一、搭建神经网络
这里选择搭建输入层—>隐藏层1—>隐藏层2—>隐藏层3—>全连接层—>输出层的神经网络结构。
x的形状是[None,25x60],None代表每次输入多少张图片,每张图片是(25,60)形状的数组,将其所有内容放到1行里面,转成最后(1,25x60)形式。将验证码转换为训练时用的标签向量,维数是(4x36)例如验证码是‘2ax8’,则对应的输出是[0 0 1 0 0 0…0 0 0 0 0,
0 0 0 0 0 0…1 0 0 0 0 0,
0 0 0 0 0 0 …0 0 1 0 0 0,
0 0 0 0 0 0 0 1… 0 0 0 0 ]
数组中1的位置代表该行代表那个字母或者数字。
xs=tf.placeholder(tf.float32,shape=[None,25*60],name='x')
ys=tf.placeholder(tf.float32,shape=[None,36*4],name='label-input')
keep_prob = tf.placeholder(tf.float32, name='keep-prob')
卷积,采用的卷积核为5*5
#第一层卷积
w_conv1=weight_variable([5,5,1,32])
b_conv1=bias_variable([32])
x_image=tf.reshape(xs,[-1,25,60,1],name='x-input')
h_conv1=tf.nn.relu(conv2d(x_image,w_conv1)+b_conv1)
pool_1=max_pool(h_conv1)
pool_1=tf.nn.dropout(pool_1,keep_prob)
#第二层卷积
w_conv2=weight_variable([5,5,32,64])
b_conv2=bias_variable([64])
h_conv2=tf.nn.relu(conv2d(pool_1,w_conv2)+b_conv2)
pool_2=max_pool(h_conv2)
pool_2=tf.nn.dropout(pool_2,keep_prob)
#第三层卷积
w_conv3=weight_variable([5,5,64,64])
b_conv3=bias_variable([64])
h_conv3=tf.nn.relu(conv2d(pool_2,w_conv3)+b_conv3)
pool_3=max_pool(h_conv3)
pool_3=tf.nn.dropout(pool_3,keep_prob)
接下来就是实现全连接层和输出层
#全连接层
w_fc_1=weight_variable([4*8*64,1024])
b_fc_1=bias_variable([1024])
pool_2_flat=tf.reshape(pool_3,[-1,4*8*64])
h_fc_1=tf.nn.relu(tf.matmul(pool_2_flat,w_fc_1)+b_fc_1)
h_fc_prob=tf.nn.dropout(h_fc_1,keep_prob)
#输出层
w_fc_2=weight_variable([1024,36*4])
b_fc_2=bias_variable([36*4])
output = tf.add(tf.matmul(h_fc_prob, w_fc_2),b_fc_2)
设置损失函数以及优化器使得神经网络可以反馈调节
#定义损失以及优化器
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=ys, logits=output))
optimizer = tf.train.AdamOptimizer(0.001).minimize(loss)
predict = tf.reshape(output, [-1, 4, 36], name='predict')
labels = tf.reshape(ys, [-1, 4, 36], name='labels')
#计算准确率
predict_max_idx = tf.argmax(predict, axis=2, name='predict_max_idx')
labels_max_idx = tf.argmax(labels, axis=2, name='labels_max_idx')
predict_correct_vec = tf.equal(predict_max_idx, labels_max_idx)
accuracy = tf.reduce_mean(tf.cast(predict_correct_vec, tf.float32))
至此,已经搭建好了神经网络,但我们还没有对图像进行处理,也就是说,我们的神经网络还没输入数据。
接下来就是读取照片,生成神经网络所需要的输入和输出了,这里我们对照片进行了一些处理,再读取照片后进行灰度转化、二值化、高斯滤波后再次进行二值化,最终才是我们所需要的输入数据。
import os
imagepath='/home/frontsurf/zt/digit_yzm/ZTE/'
def get_x_data(filepath):
data=[]
image_name=[]
for filename in os.listdir(filepath):
image_name.append(filename)
im=cv2.imread(filepath+filename)
#灰度
im_gray=cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
#二值化
ret, im_inv = cv2.threshold(im_gray,127,255,cv2.THRESH_BINARY_INV)
#高斯滤波去除颗粒噪声
kernel = 1/16*np.array([[1,2,1], [2,4,2], [1,2,1]])
im_blur = cv2.filter2D(im_inv,-1,kernel)
#再次进行二值化
ret, im_res = cv2.threshold(im_blur,127,255,cv2.THRESH_BINARY)
im_data=im_res.reshape(1,25*60)/255
image_data=im_data.tolist()
data.append(image_data)
return data,image_name
data,image_name=get_x_data(imagepath)
这里我们得到的data是list形式,这里的data只是输入x,将其转成符合神经网络输入的格式[None,25*60]
m,k,n=np.array(data).shape
new_data=np.array(data).reshape(m,n)
标签也就是ys是图片的文件名,我们需要对其进行转化,生成[None,36*4]的格式
def get_y_data(imagename):
label_data=[]
for x in imagename:
labels=np.zeros([4,36])
result1,result2=x.split('.')
for i,chara in enumerate(result1.lower()):
col_index=label_dict[chara]
labels[i,col_index]=1
label_data.append(labels.reshape(1,-1).tolist())
return label_data
label_data=get_y_data(image_name)
m,k,n=np.array(label_data).shape
new_label=np.array(label_data).reshape(m,n)
显示下处理前图片:
处理后的照片:
我们就得了所需要的x和y。然后我们将整个数据集划分成训练集和测试集
from sklearn.model_selection import train_test_split
train_x,test_x,train_y,test_y=train_test_split(new_data,new_label,test_size=0.2)
在神经网络训练时,不是每次都把所有的数据的填进去,每批随机进入64个图片。生成一个list用来保存索引,用来保存64个数字。这里index=[0,63,127,…]
def get_next_batch(data,size):
m,n=data.shape
index=[]
for i in range(m):
if i %size==0:
index.append(i)
index.append(m-1)
return index
然后我们开始训练模型,我们每把所有的图片填到神经网络后,计算一次神经网络在测试集上的准确率,如果大于0.97,就终止训练,保存参数。
#训练模型
path = '/home/frontsurf/zt/digit_yzm/captcha/'
saver=tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for step in range(100):
index=get_next_batch(train_x,64)
order=np.random.permutation(train_x.shape[0])
for i in range(len(index)-1):
_,loss_=sess.run([optimizer,loss],feed_dict={xs:train_x[order[index[i]:index[i+1]],:],ys:train_y[order[index[i]:index[i+1]],:],keep_prob:0.75})
acc=sess.run(accuracy,feed_dict={xs:test_x,ys:test_y,keep_prob:1.0})
print('step=%d,iter=%d,accuracy=%f'%(step,i,acc))
if acc>0.97:
saver.save(sess,path+'digit_captcha.model',global_step=step)
break
有了测试集的准确率,还是有点小担心,找几张没用过的验证码实验一下,获取验证集的数据,得到x和y:
validpath='/home/frontsurf/zt/digit_yzm/ZTE_10/'
validata,validname=get_x_data(validpath)
m,k,n=np.array(validata).shape
valid_data=np.array(validata).reshape(m,n)
labeldata=get_y_data(validname)
m,k,n=np.array(labeldata).shape
label_data=np.array(labeldata).reshape(m,n)
从保存的参数中拿出训练的权重,来预测验证集:
saver=tf.train.import_meta_graph(path+'digit_captcha.model-21.meta')
graph=tf.get_default_graph()
input_holder=graph.get_tensor_by_name('x:0')
label_holder=graph.get_tensor_by_name('label-input:0')
keep_prob_holder=graph.get_tensor_by_name('keep-prob:0')
predict_max_idx=graph.get_tensor_by_name('predict_max_idx:0')
labels_max_idx =graph.get_tensor_by_name('labels_max_idx:0')
with tf.Session() as sess:
saver.restore(sess,tf.train.latest_checkpoint(path))
predict=sess.run(predict_max_idx,feed_dict={input_holder:valid_data,keep_prob_holder:1.0})
labels= sess.run(labels_max_idx,feed_dict={label_holder:label_data})
模型的输出都是数字,我们需要将数字转成小写字母:
digit_character = dict(zip(label_dict.values(), label_dict.keys()))
predcharacter=[]
for i in range(len(predict)):
yzm=''.join(str(digit_character[x]) for x in predict[i])
predcharacter.append(yzm)
展示下验证集的照片:
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(30,10))
for i,x in enumerate(validname):
plt.subplot(2,5,i+1)
plt.title('the label is %s,the pred is %s'%(x.lower()[:-4],predcharacter[i]),fontsize=18,color='r')
image=validpath+x
img=Image.open(image)
plt.imshow(img)
图片的标题就是真实的验证码和破解的验证码,虽然只显示了10张图片,从侧面说明了模型的效果还是值得信赖的。
最后希望各位大佬多多指教!!
完整代码:
import cv2
import numpy as np
from PIL import Image
import tensorflow as tf
def conv2d(x,w):
return tf.nn.conv2d(x,w,strides=[1,1,1,1],padding='SAME',name='conv2d')
def max_pool(x):
return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME',name='maxpool')
def weight_variable(shape):
return tf.Variable(tf.truncated_normal(shape,stddev=0.1))
def bias_variable(shape):
return tf.Variable(tf.constant(0.1,shape=shape))
import string
label_dict={}
characters = string.digits + string.ascii_lowercase
for i,x in enumerate(characters):
label_dict[x]=i
import os
imagepath='/home/frontsurf/zt/digit_yzm/ZTE/'
def get_x_data(filepath):
data=[]
image_name=[]
for filename in os.listdir(filepath):
image_name.append(filename)
im=cv2.imread(filepath+filename)
#灰度
im_gray=cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
#二值化
ret, im_inv = cv2.threshold(im_gray,127,255,cv2.THRESH_BINARY_INV)
#高斯滤波去除颗粒噪声
kernel = 1/16*np.array([[1,2,1], [2,4,2], [1,2,1]])
im_blur = cv2.filter2D(im_inv,-1,kernel)
#再次进行二值化
ret, im_res = cv2.threshold(im_blur,127,255,cv2.THRESH_BINARY)
im_data=im_res.reshape(1,25*60)/255
image_data=im_data.tolist()
data.append(image_data)
return data,image_name
data,image_name=get_x_data(imagepath)
m,k,n=np.array(data).shape
new_data=np.array(data).reshape(m,n)
def get_y_data(imagename):
label_data=[]
for x in imagename:
labels=np.zeros([4,36])
result1,result2=x.split('.')
for i,chara in enumerate(result1.lower()):
col_index=label_dict[chara]
labels[i,col_index]=1
label_data.append(labels.reshape(1,-1).tolist())
return label_data
label_data=get_y_data(image_name)
m,k,n=np.array(label_data).shape
new_label=np.array(label_data).reshape(m,n)
from sklearn.model_selection import train_test_split
train_x,test_x,train_y,test_y=train_test_split(new_data,new_label,test_size=0.2)
xs=tf.placeholder(tf.float32,shape=[None,25*60],name='x')
ys=tf.placeholder(tf.float32,shape=[None,36*4],name='label-input')
keep_prob = tf.placeholder(tf.float32, name='keep-prob')
#第一层卷积
w_conv1=weight_variable([5,5,1,32])
b_conv1=bias_variable([32])
x_image=tf.reshape(xs,[-1,25,60,1],name='x-input')
h_conv1=tf.nn.relu(conv2d(x_image,w_conv1)+b_conv1)
pool_1=max_pool(h_conv1)
pool_1=tf.nn.dropout(pool_1,keep_prob)
#第二层卷积
w_conv2=weight_variable([5,5,32,64])
b_conv2=bias_variable([64])
h_conv2=tf.nn.relu(conv2d(pool_1,w_conv2)+b_conv2)
pool_2=max_pool(h_conv2)
pool_2=tf.nn.dropout(pool_2,keep_prob)
#第三层卷积
w_conv3=weight_variable([5,5,64,64])
b_conv3=bias_variable([64])
h_conv3=tf.nn.relu(conv2d(pool_2,w_conv3)+b_conv3)
pool_3=max_pool(h_conv3)
pool_3=tf.nn.dropout(pool_3,keep_prob)
#全连接层
w_fc_1=weight_variable([4*8*64,1024])
b_fc_1=bias_variable([1024])
pool_2_flat=tf.reshape(pool_3,[-1,4*8*64])
h_fc_1=tf.nn.relu(tf.matmul(pool_2_flat,w_fc_1)+b_fc_1)
h_fc_prob=tf.nn.dropout(h_fc_1,keep_prob)
#输出层
w_fc_2=weight_variable([1024,36*4])
b_fc_2=bias_variable([36*4])
output = tf.add(tf.matmul(h_fc_prob, w_fc_2),b_fc_2)
def get_next_batch(data,size):
m,n=data.shape
index=[]
for i in range(m):
if i %size==0:
index.append(i)
index.append(m-1)
return index
#定义损失以及优化器
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=ys, logits=output))
optimizer = tf.train.AdamOptimizer(0.001).minimize(loss)
predict = tf.reshape(output, [-1, 4, 36], name='predict')
labels = tf.reshape(ys, [-1, 4, 36], name='labels')
#计算准确率
predict_max_idx = tf.argmax(predict, axis=2, name='predict_max_idx')
labels_max_idx = tf.argmax(labels, axis=2, name='labels_max_idx')
predict_correct_vec = tf.equal(predict_max_idx, labels_max_idx)
accuracy = tf.reduce_mean(tf.cast(predict_correct_vec, tf.float32))
#训练模型
path = '/home/frontsurf/zt/digit_yzm/captcha/'
saver=tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for step in range(100):
index=get_next_batch(train_x,64)
order=np.random.permutation(train_x.shape[0])
for i in range(len(index)-1):
_,loss_=sess.run([optimizer,loss],feed_dict={xs:train_x[order[index[i]:index[i+1]],:],ys:train_y[order[index[i]:index[i+1]],:],keep_prob:0.75})
acc=sess.run(accuracy,feed_dict={xs:test_x,ys:test_y,keep_prob:1.0})
print('step=%d,iter=%d,accuracy=%f'%(step,i,acc))
if acc>0.97:
saver.save(sess,path+'digit_captcha.model',global_step=step)
break
validpath='/home/frontsurf/zt/digit_yzm/ZTE_10/'
validata,validname=get_x_data(validpath)
m,k,n=np.array(validata).shape
valid_data=np.array(validata).reshape(m,n)
labeldata=get_y_data(validname)
m,k,n=np.array(labeldata).shape
label_data=np.array(labeldata).reshape(m,n)
saver=tf.train.import_meta_graph(path+'digit_captcha.model-21.meta')
graph=tf.get_default_graph()
input_holder=graph.get_tensor_by_name('x:0')
label_holder=graph.get_tensor_by_name('label-input:0')
keep_prob_holder=graph.get_tensor_by_name('keep-prob:0')
predict_max_idx=graph.get_tensor_by_name('predict_max_idx:0')
labels_max_idx =graph.get_tensor_by_name('labels_max_idx:0')
with tf.Session() as sess:
saver.restore(sess,tf.train.latest_checkpoint(path))
predict=sess.run(predict_max_idx,feed_dict={input_holder:valid_data,keep_prob_holder:1.0})
labels= sess.run(labels_max_idx,feed_dict={label_holder:label_data})
digit_character = dict(zip(label_dict.values(), label_dict.keys()))
predcharacter=[]
for i in range(len(predict)):
yzm=''.join(str(digit_character[x]) for x in predict[i])
predcharacter.append(yzm)
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(30,10))
for i,x in enumerate(validname):
plt.subplot(2,5,i+1)
plt.title('the label is %s,the pred is %s'%(x.lower()[:-4],predcharacter[i]),fontsize=18,color='r')
image=validpath+x
img=Image.open(image)
plt.imshow(img)