python3 19.tensorflow进行多任务学习之破解图像验证码 学习笔记

前言

     计算机视觉系列之学习笔记主要是本人进行学习人工智能(计算机视觉方向)的代码整理。本系列所有代码是用python3编写,在平台Anaconda中运行实现,在使用代码时,默认你已经安装相关的python库,这方面不做多余的说明。本系列所涉及的所有代码和资料可在我的github上下载到,gitbub地址:https://github.com/mcyJacky/DeepLearning-CV,如有问题,欢迎指出。

一、生成验证码

     在使用多任务学习图形验证码时,我们首先要生成验证码图片,这里我们使用captcha验证码生成库,具体的使用方法如下:

# 验证码生成库
# pip install captcha
from captcha.image import ImageCaptcha  
import numpy as np
from PIL import Image
import random
import sys

number = ['0','1','2','3','4','5','6','7','8','9']

def random_captcha_text(char_set=number, captcha_size=4):
    # 验证码列表
    captcha_text = []
    for i in range(captcha_size):
        # 随机选择
        c = random.choice(char_set)
        # 加入验证码列表
        captcha_text.append(c)
    return captcha_text

# 生成字符对应的验证码
def gen_captcha_text_and_image():
    image = ImageCaptcha()
    # 获得随机生成的验证码
    captcha_text = random_captcha_text()
    # 把验证码列表转为字符串
    captcha_text = ''.join(captcha_text)
    # 生成验证码
    captcha = image.generate(captcha_text)
    image.write(captcha_text, 'captcha/images/' + captcha_text + '.jpg')  # 写到文件

# 数量少于6000,因为重名
num = 6000
if __name__ == '__main__':
    for i in range(num):
        gen_captcha_text_and_image()
        sys.stdout.write('\r>> Creating image %d/%d' % (i+1, num))
        sys.stdout.flush()
    sys.stdout.write('\n')
    sys.stdout.flush()
                        
    print("生成完毕")

     执行完上述程序,就能在路径‘./captcha/images/’路径下生成验证码图片,图片样式如下图1.1所示,每张验证码图片的内容是从0-9之间的取4位数字,图片的命名为数字的名称,也是训练的标签。

图1.1 生成的验证码图片

二、进行多任务学习之验证码识别模型

     这边与之前篇章的单个输出的训练任务不同,验证码是由4个数字组成,即我们要输出4多个不同的任务,即多任务学习。例如验证码的标签是0782,那就是由四个标签组合而成(one-hot独热编码),第一个label0:100000000,第二个label1:000000100,第三个label2:000000010,第四个label3:010000000。

     下面我们使用AlexNet经典网络为基础进行多任务学习的训练(训练过程与AlexNet不完全一样),AlexNet是ImageNet2012年的冠军,它的网络结构如图2.1所示:

图2.1 AlexNet网络结构

     具体的实现如下:

import os
import tensorflow as tf 
import numpy as np
import tensorflow.contrib.slim as slim

# 数据集路径
dataset_dir = "./captcha/images/"
# 测试集占比
num_test = 0.2
# 批次大小
batch_size = 32
# 周期大小
epochs = 100
# 分类数(4个任务,每个任务是10种)
num_classes = 10
# 学习率
lr = tf.Variable(0.001, dtype=tf.float32)
# 是否是训练状态
is_training = tf.placeholder(tf.bool)

# 获取所有验证码图片路径和标签
def get_filenames_and_classes(dataset_dir):
    photo_filenames = []
    labels = []
    for filename in os.listdir(dataset_dir):
        # 获取文件路径
        path = os.path.join(dataset_dir, filename)
        photo_filenames.append(path)
        label = filename[0:4]
        num_labels = []
        for i in range(4):
            num_labels.append(int(label[i]))
        labels.append(num_labels)
    return photo_filenames, labels

# 获取图片路径和标签
photo_filenames,labels = get_filenames_and_classes(dataset_dir)
photo_filenames = np.array(photo_filenames)
labels = np.array(labels)

# 打乱数据
np.random.seed(10)
shuffle_indices = np.random.permutation(np.arange(len(photo_filenames)))
photo_filenames_shuffled = photo_filenames[shuffle_indices]
labels_shuffled = labels[shuffle_indices]


# 切分训练集和测试集
test_sample_index = -1 * int(num_test * float(len(photo_filenames)))
x_train, x_test = photo_filenames_shuffled[:test_sample_index], photo_filenames_shuffled[test_sample_index:]
y_train, y_test = labels_shuffled[:test_sample_index], labels_shuffled[test_sample_index:]

# 图像处理函数
def parse_function(filenames, labels=None):
    image = tf.read_file(filenames)
    # 将图像解码
    image = tf.image.decode_jpeg(image, channels=3)   
    # resize图片大小
    image = tf.image.resize_images(image, [224, 224])
    # 图片预处理[-1.1]
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0)
    return image, labels

# 进行模型训练
# 定义两个placeholder
features_placeholder = tf.placeholder(photo_filenames_shuffled.dtype, [None])
labels_placeholder = tf.placeholder(labels_shuffled.dtype, [None, 4])

# 创建dataset对象
dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))
# 处理图片,用函数parse_function 处理数据
dataset = dataset.map(parse_function)
# 训练周期
dataset = dataset.repeat(1)
# 批次大小
dataset = dataset.batch(batch_size)

# 初始化迭代器
iterator = dataset.make_initializable_iterator()
# 获得一个批次数据和标签
data_batch, label_batch = iterator.get_next()

# 定义alexNet模型
def alexnet(inputs, is_training=True):
    with slim.arg_scope([slim.conv2d, slim.fully_connected],
                         activation_fn=tf.nn.relu,
                         weights_initializer=tf.glorot_uniform_initializer(),
                         biases_initializer=tf.constant_initializer(0)):
        
        net = slim.conv2d(inputs, 64, [11, 11], 4)
        net = slim.max_pool2d(net, [3, 3])
        net = slim.conv2d(net, 192, [5, 5])
        net = slim.max_pool2d(net, [3, 3])
        net = slim.conv2d(net, 384, [3, 3])
        net = slim.conv2d(net, 384, [3, 3])
        net = slim.conv2d(net, 256, [3, 3])
        net = slim.max_pool2d(net, [3, 3])
        
        # 数据扁平化
        net = slim.flatten(net)
        net = slim.fully_connected(net, 1024)
        net = slim.dropout(net, is_training=is_training)
        
        net0 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
        net1 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
        net2 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
        net3 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)

    return net0,net1,net2,net3

# 定义会话
with tf.Session() as sess:
    # 传入数据得到结果
    logits0,logits1,logits2,logits3 = alexnet(data_batch, is_training)
    # 定义loss
    # sparse_softmax_cross_entropy:标签为整数
    # softmax_cross_entropy:标签为one-hot独热编码
    loss0 = tf.losses.sparse_softmax_cross_entropy(label_batch[:,0], logits0)
    loss1 = tf.losses.sparse_softmax_cross_entropy(label_batch[:,1], logits1)
    loss2 = tf.losses.sparse_softmax_cross_entropy(label_batch[:,2], logits2)
    loss3 = tf.losses.sparse_softmax_cross_entropy(label_batch[:,3], logits3)
    # 计算总的loss
    total_loss = (loss0+loss1+loss2+loss3)/4.0
    # 优化total_loss
    optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(total_loss) 
    
    # 计算准确率
    correct0 = tf.nn.in_top_k(logits0, label_batch[:,0], 1) 
    accuracy0 = tf.reduce_mean(tf.cast(correct0, tf.float32))
    correct1 = tf.nn.in_top_k(logits1, label_batch[:,1], 1) 
    accuracy1 = tf.reduce_mean(tf.cast(correct1, tf.float32))
    correct2 = tf.nn.in_top_k(logits2, label_batch[:,2], 1) 
    accuracy2 = tf.reduce_mean(tf.cast(correct2, tf.float32))
    correct3 = tf.nn.in_top_k(logits3, label_batch[:,3], 1) 
    accuracy3 = tf.reduce_mean(tf.cast(correct3, tf.float32))
    # 总的准确率
    total_correct = tf.cast(correct0, tf.float32)*tf.cast(correct1, tf.float32)*tf.cast(correct2, tf.float32)*tf.cast(correct3, tf.float32)
    total_accuracy = tf.reduce_mean(tf.cast(total_correct, tf.float32))
  
    # 所有变量初始化  
    sess.run(tf.global_variables_initializer()) 
    # 定义saver保存模型
    saver = tf.train.Saver()
    
    # 训练epochs个周期
    for i in range(epochs):
        if i%30 == 0:
            # 学习率的调整
            sess.run(tf.assign(lr, lr/3))
        # 训练集传入迭代器中
        sess.run(iterator.initializer, feed_dict={features_placeholder: x_train,
                                                  labels_placeholder: y_train})
        # 训练模型
        while True:
            try:
                sess.run(optimizer,feed_dict={is_training:True})
            except tf.errors.OutOfRangeError:
                # 所有数据训练完毕后跳出循环
                break
        
        # 测试集放入迭代器中
        sess.run(iterator.initializer, feed_dict={features_placeholder: x_test,
                                                  labels_placeholder: y_test})
        # 测试结果
        while True:
            try:
                # 获得准确率和loss值
                acc0,acc1,acc2,acc3,total_acc,l = \
                    sess.run([accuracy0,accuracy1,accuracy2,accuracy3,total_accuracy,total_loss],feed_dict={is_training:False})
                # loss值统计
                tf.add_to_collection('sum_losses', l)
                # 准确率统计
                tf.add_to_collection('accuracy0', acc0)
                tf.add_to_collection('accuracy1', acc1)
                tf.add_to_collection('accuracy2', acc2)
                tf.add_to_collection('accuracy3', acc3)
                tf.add_to_collection('total_acc', total_acc)
            except tf.errors.OutOfRangeError:
                # loss值求平均
                avg_loss = sess.run(tf.reduce_mean(tf.get_collection('sum_losses')))
                # 准确率求平均
                avg_acc0 = sess.run(tf.reduce_mean(tf.get_collection('accuracy0')))
                avg_acc1 = sess.run(tf.reduce_mean(tf.get_collection('accuracy1')))
                avg_acc2 = sess.run(tf.reduce_mean(tf.get_collection('accuracy2')))
                avg_acc3 = sess.run(tf.reduce_mean(tf.get_collection('accuracy3')))
                avg_total_acc = sess.run(tf.reduce_mean(tf.get_collection('total_acc')))
                print('%d:loss=%.3f acc0=%.3f acc1=%.3f acc2=%.3f acc3=%.3f total_acc=%.3f' % 
                      (i,avg_loss,avg_acc0,avg_acc1,avg_acc2,avg_acc3,avg_total_acc))
                # 清空loss统计
                temp = tf.get_collection_ref('sum_losses')
                del temp[:]
                # 清空准确率统计
                temp = tf.get_collection_ref('accuracy0')
                del temp[:]
                # 清空准确率统计
                temp = tf.get_collection_ref('accuracy1')
                del temp[:]
                # 清空准确率统计
                temp = tf.get_collection_ref('accuracy2')
                del temp[:]
                # 清空准确率统计
                temp = tf.get_collection_ref('accuracy3')
                del temp[:]
                # 清空准确率统计
                temp = tf.get_collection_ref('total_acc')
                del temp[:]
                # 所有数据测试完毕后跳出循环
                break
        
    # 保存模型
    saver.save(sess, 'models/model.ckpt', global_step = epochs)

# 部分输出结果:
# 0:loss=2.303 acc0=0.104 acc1=0.101 acc2=0.094 acc3=0.112 total_acc=0.001
# 1:loss=2.303 acc0=0.111 acc1=0.101 acc2=0.099 acc3=0.102 total_acc=0.001
# 2:loss=2.241 acc0=0.224 acc1=0.179 acc2=0.187 acc3=0.230 total_acc=0.001
# 3:loss=2.181 acc0=0.317 acc1=0.226 acc2=0.245 acc3=0.306 total_acc=0.005
# 4:loss=2.127 acc0=0.400 acc1=0.283 acc2=0.256 acc3=0.366 total_acc=0.014
# ...
# 93:loss=1.509 acc0=0.975 acc1=0.948 acc2=0.922 acc3=0.965 total_acc=0.842
# 94:loss=1.510 acc0=0.976 acc1=0.948 acc2=0.919 acc3=0.966 total_acc=0.838
# 95:loss=1.509 acc0=0.976 acc1=0.948 acc2=0.919 acc3=0.965 total_acc=0.836
# 96:loss=1.509 acc0=0.976 acc1=0.947 acc2=0.922 acc3=0.963 total_acc=0.839
# 97:loss=1.509 acc0=0.976 acc1=0.948 acc2=0.918 acc3=0.965 total_acc=0.840
# 98:loss=1.509 acc0=0.977 acc1=0.948 acc2=0.921 acc3=0.964 total_acc=0.843
# 99:loss=1.508 acc0=0.978 acc1=0.949 acc2=0.923 acc3=0.965 total_acc=0.842

      通过上述程序,就能训练完成。并将训练结果保存为checkpoint文件。下面我们用训练结果,进行验证码识别模型的测试。

三、验证码识别模型测试

     下面我们通过对checkpoint文件进行恢复,并对验证码图片进行测试:

import os
import tensorflow as tf 
import numpy as np
import tensorflow.contrib.slim as slim
import matplotlib.pyplot as plt
from random import choice
from PIL import Image

# 数据集路径
dataset_dir = "./captcha/images/"
# 数据输入
inputs = tf.placeholder(tf.float32,[1,224,224,3])
# 分类数
num_classes = 10

# 获取所有验证码图片路径
def get_filenames(dataset_dir):
    photo_filenames = []
    for filename in os.listdir(dataset_dir):
        # 获取文件路径
        path = os.path.join(dataset_dir, filename)
        photo_filenames.append(path)
    return photo_filenames

# 获取图片路径
photo_filenames = get_filenames(dataset_dir)

# 定义alexnet网络
def alexnet(inputs, is_training=True):
    with slim.arg_scope([slim.conv2d, slim.fully_connected],
                         activation_fn=tf.nn.relu,
                         weights_initializer=tf.glorot_uniform_initializer(),
                         biases_initializer=tf.constant_initializer(0)):
        
        net = slim.conv2d(inputs, 64, [11, 11], 4)
        net = slim.max_pool2d(net, [3, 3])
        net = slim.conv2d(net, 192, [5, 5])
        net = slim.max_pool2d(net, [3, 3])
        net = slim.conv2d(net, 384, [3, 3])
        net = slim.conv2d(net, 384, [3, 3])
        net = slim.conv2d(net, 256, [3, 3])
        net = slim.max_pool2d(net, [3, 3])
        
        # 数据扁平化
        net = slim.flatten(net)
        net = slim.fully_connected(net, 1024)
        net = slim.dropout(net, is_training=is_training)
        
        net0 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
        net1 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
        net2 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
        net3 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)

    return net0,net1,net2,net3

# 定义会话
with tf.Session() as sess:

    # 传入数据得到结果
    logits0,logits1,logits2,logits3 = alexnet(inputs, False)
    
    # 预测值 
    predict0 = tf.argmax(logits0, 1)    
    predict1 = tf.argmax(logits1, 1)  
    predict2 = tf.argmax(logits2, 1)  
    predict3 = tf.argmax(logits3, 1)  
  
    # 所有变量初始化  
    sess.run(tf.global_variables_initializer()) 
    # 定义saver载入模型
    saver = tf.train.Saver()
    saver.restore(sess,'models/model.ckpt-100')
    
    for i in range(10):
        filename = choice(photo_filenames)
        # 读取图片
        image = Image.open(filename)  
        # 根据模型的结构resize
        image = image.resize((224, 224))
        image = np.array(image)
        # 图片预处理
        image_data = image/255.0
        image_data = image_data-0.5
        image_data = image_data*2
        # 变成4维数据
        image_data = image_data.reshape((1,224,224,3))
        # 获得预测结果
        pre0,pre1,pre2,pre3 = sess.run([predict0,predict1,predict2,predict3], feed_dict={inputs:image_data})
        # 数据标签
        label = filename.split('/')[-1][0:4]
        plt.imshow(image)
        plt.axis('off')
        plt.title('predict:'+ str(pre0[0])+str(pre1[0])+str(pre2[0])+str(pre3[0]) + '\n' + 'label:' + label)
        plt.show()

      上述程序的部分输出结果如下图3.1所示,predict:6321为预测结果,label:6321为本身图片的标签。

图3.1 验证码预测结果

     
     
     
     
【参考】:
     1. 城市数据团课程《AI工程师》计算机视觉方向
     2. deeplearning.ai 吴恩达《深度学习工程师》
     3. 《机器学习》作者:周志华
     4. 《深度学习》作者:Ian Goodfellow


转载声明:
版权声明:非商用自由转载-保持署名-注明出处
署名 :mcyJacky
文章出处:https://blog.csdn.net/mcyJacky

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值