tensorflow-神经网络识别验证码（数字+小写字母）

最新推荐文章于 2022-02-14 10:57:15 发布

wyply115

最新推荐文章于 2022-02-14 10:57:15 发布

阅读量1.9k

点赞数 2

分类专栏：人工智能文章标签：机器学习 tensorflow

本文链接：https://blog.csdn.net/wyply115/article/details/86410650

版权

人工智能专栏收录该内容

15 篇文章 0 订阅

订阅专栏

分析

数据样例：假设给出如下数据1000张60*186*3大小的图片（电脑太慢，数据就不弄多了）
数据下载连接：https://download.csdn.net/download/wyply115/10913733

单个验证码样式如下：
识别分析
1. 识别流程分析

我们通过倒推顺做的方式去考虑，首先分析目标值，目标是识别验证码，而验证码里面包含数字和字母，并且是多个（训练的数据集是四个字母或数字的组合），假设一个验证码为"w2cm"，我们拆开来讲，是"w"得概率是多少，"2"得概率是多少，"c"的概率是多少，"m"的概率是多少。但一个验证码是一个整体，我们无法很好得分割开每个字母，因此我们采用整体识别方法。很明显目标值的总集合为"abcdefghijklmnopqrstuvwxyz0123456789"一共36个类别，那么识别验证码其实就是对验证码里面的每个字母或数字在这36个类别里进行分类而已。因此将这36个离散得类别进行one-hot编码。对一个样本（四个类别）编码后的目标值应该是[4，36]，则"w2cm"目标值应该转化为类似[[0,0,0…1,0,0…0],[0,0,0…1,0,0…0],[0,0,0…1,0,0…0],[0,0,0…1,0,0…0]]，则1000个样本数据应该是[1000,4,36]，由于一个验证码为4个字母或数字，我们当其为整体，并且进行softmax计算时，需要进行二维矩阵运算，因此目标值reshape为[1000,4*36]。因为需要对最终预测值和输出值进行softmax计算相对概率，因此输出值也是[1000,4*36]。其次分析输入值，由于图片大小为60*180*3，因此1000个样本为[1000,60,180,3]，因为输出值为[1000，4*36]，因此将输入reshape为[1000,60*180*3]。输入*权重 + 偏置 = 输出因此权重为[60*180*3， 4*36]，偏置为[4*36]。
最后分析，输入、权重、偏置、输出、目标值的数据都确定好了，在数据处理完毕后按流程走就可以了，为了方便运算，我们将输入和目标值存入tfrecords里。

分布计算

根据上述分析：
第一步先获取图片数据，准备好输入值。
第二步，获取目标值，对目标值数字化，然后进行one-hot编码，处理为需要的结构。
存入tfrecords里方便运算
第三步，定义模型（主要是隐层，即卷积层池化层等，这里重点不在这，就不定义卷积和池化，直接全连接层）
第四步，全连接层需要定义权重和偏置，上述分析后，权重和偏置形状已经确定，直接定义即可
第五步，softmax运算，交叉熵损失计算（tensorflow里是一个api，所以一起计算了），然后计算平均交叉熵损失
第六步，梯度下降求解最小损失
第七步，可以计算准确率了。
最后，收集变量，保存模型，进行测试数据的验证即可。

代码

读取图片数据、目标值标签数据、写入tfrecords

import tensorflow as tf
import os

FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string("tfrecords_dir", "./data/tfrecords/captcha.tfrecords", "验证码tfrecords文件")
tf.app.flags.DEFINE_string("letter", "abcdefghijklmnopqrstuvwxyz0123456789", "验证码字符的种类")


def captcha_read(filelist):
    """
    读取验证码图片内容 转为张量
    :param filelist: 路径+文件名列表
    :return:
    """

    # 1. 构造图片文件队列
    file_queue = tf.train.string_input_producer(filelist)

    # 2. 构造阅读器读取内容
    reader = tf.WholeFileReader()
    key, value = reader.read(file_queue)
    # 3. 对图片解码
    image = tf.image.decode_jpeg(value)

    # 因为在批处理时所有样本形状必须定义，而image_resize目前的通道还未固定，固定大小到[60,180,3]。
    image.set_shape([60, 180, 3])

    # 4. 进行批处理 [1000,60,180,3   ]
    image_batch = tf.train.batch([image], num_threads=1, capacity=1000, batch_size=1000)

    return image_batch


def dealwithlabel(label_str):
    """
    将目标值数字化
    :param label_str:目标值数据 [['2a2j'],['w2cm']...]
    :return:
    """
    # 构建字符索引 {0：'a', 1:'b'......}
    num_letter = dict(enumerate(list(FLAGS.letter)))

    # 键值对反转 {'a':0, 'b':1......}
    letter_num = dict(zip(num_letter.values(), num_letter.keys()))

    # 构建标签的列表 [[13, 25, 15, 15], [22, 10, 7, 10],..]
    array = []

    # 给标签数据进行处理[["2a2j"]......]
    for string in label_str:

        letter_list = []  # [1,2,3,4]

        # 循环找到每张验证码的字符对应的数字标记
        for letter in string:
            letter_list.append(letter_num[letter])

        array.append(letter_list)

    # 将array转换成tensor类型
    label = tf.constant(array)

    return label


def write_to_tfrecords(image_batch, label_batch):
    """
    将图片内容和标签写入到tfrecords文件当中
    :param image_batch: 特征值
    :param label_batch: 标签值
    :return: None
    """
    # 转换类型
    label_batch = tf.cast(label_batch, tf.uint8)

    # 建立TFRecords 存储器
    writer = tf.python_io.TFRecordWriter(FLAGS.tfrecords_dir)

    # 循环将每一个图片上的数据构造example协议块，序列化后写入
    for i in range(1000):
        # 取出第i个图片数据，转换相应类型,图片的特征值要转换成字符串形式
        image_string = image_batch[i].eval().tostring()

        # 标签值，转换成整型
        label_string = label_batch[i].eval().tostring()

        # 构造协议块
        example = tf.train.Example(features=tf.train.Features(feature={
            "image": tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_string])),
            "label": tf.train.Feature(bytes_list=tf.train.BytesList(value=[label_string]))
        }))

        writer.write(example.SerializeToString())

    # 关闭文件
    writer.close()

    return None


if __name__ == '__main__':
    # 1. 构造路径文件名列表
    namelist = os.listdir("./data/captcha/")
    filelist = [os.path.join("./data/captcha/", name) for name in namelist]

    # 2. 根据图片名称获取目标标签数据 （去掉后缀.jpg）
    labellist = [name[:-4] for name in namelist]

    # 3. 读取图片转换为张量
    image_batch = captcha_read(filelist)

    with tf.Session() as sess:
        # 定义一个线程协调器
        coord = tf.train.Coordinator()

        # 开启读取文件的线程
        threads = tf.train.start_queue_runners(sess, coord=coord)

        # 处理字符串标签到数字张量
        label_batch = dealwithlabel(labellist)

        # 将图片数据和内容写入到tfrecords文件当中
        write_to_tfrecords(image_batch, label_batch)

        # 回收子线程
        coord.request_stop()
        coord.join(threads)

定义模型，softmax及交叉熵运算，计算准确率

import tensorflow as tf

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string("captcha_dir", "./data/tfrecords/captcha.tfrecords", "验证码数据的路径")
tf.app.flags.DEFINE_integer("batch_size", 100, "每批次训练的样本数")
tf.app.flags.DEFINE_integer("label_num", 4, "每个样本的目标值数量")
tf.app.flags.DEFINE_integer("letter_num", 36, "每个目标值取的字母的可能性个数")


# 定义一个初始化权重的函数
def weight_variables(shape):
    w = tf.Variable(tf.random_normal(shape=shape, mean=0.0, stddev=1.0))
    return w


# 定义一个初始化偏置的函数
def bias_variables(shape):
    b = tf.Variable(tf.constant(0.0, shape=shape))
    return b


def read_and_decode():
    """
    读取验证码数据API
    :return: image_batch, label_batch
    """
    # 1、构建文件队列
    file_queue = tf.train.string_input_producer([FLAGS.captcha_dir])

    # 2、构建阅读器，读取文件内容，默认一个样本
    reader = tf.TFRecordReader()

    # 读取内容
    key, value = reader.read(file_queue)

    # tfrecords格式example,需要解析
    features = tf.parse_single_example(value, features={
        "image": tf.FixedLenFeature([], tf.string),
        "label": tf.FixedLenFeature([], tf.string),
    })

    # 解码内容，字符串内容
    # 1、先解析图片的特征值
    image = tf.decode_raw(features["image"], tf.uint8)
    # 1、先解析图片的目标值
    label = tf.decode_raw(features["label"], tf.uint8)

    # print(image, label)

    # 改变形状
    image_reshape = tf.reshape(image, [60, 180, 3])

    label_reshape = tf.reshape(label, [4])

    print(image_reshape, label_reshape)

    # 进行批处理,每批次读取的样本数 100, 也就是每次训练时候的样本
    image_batch, label_btach = tf.train.batch([image_reshape, label_reshape], batch_size=FLAGS.batch_size,
                                              num_threads=1, capacity=FLAGS.batch_size)

    print(image_batch, label_btach)
    return image_batch, label_btach


def fc_model(image):
    """
    进行预测结果
    :param image: 100图片特征值[100, 60, 180, 3]
    :return: y_predict预测值[100, 4 * 36]
    """
    with tf.variable_scope("model"):
        # 将图片数据形状转换成二维的形状
        image_reshape = tf.reshape(image, [-1, 60 * 180 * 3])

        # 1、随机初始化权重偏置
        # matrix[100, 60 * 180 * 3] * [60 * 180 * 3, 4 * 36] + [4 * 36] = [100, 4 * 36]
        weights = weight_variables([60 * 180 * 3, 4 * 36])
        bias = bias_variables([4 * 36])

        # 进行全连接层计算[100, 4 * 36]
        y_predict = tf.matmul(tf.cast(image_reshape, tf.float32), weights) + bias

    return y_predict


def predict_to_onehot(label):
    """
    将读取文件当中的目标值转换成one-hot编码
    :param label: [100, 4]      [[13, 25, 15, 15], [19, 23, 20, 16]......]
    :return: one-hot
    """
    # 进行one_hot编码转换，提供给交叉熵损失计算，准确率计算[100, 4, 36]
    label_onehot = tf.one_hot(label, depth=FLAGS.letter_num, on_value=1.0, axis=2)

    print(label_onehot)

    return label_onehot


def captcharec():
    """
    验证码识别程序
    :return:
    """
    # 1、读取验证码的数据文件 label_btch [100 ,4]
    image_batch, label_batch = read_and_decode()

    # 2、通过输入图片特征数据，建立模型，得出预测结果
    # 一层，全连接神经网络进行预测
    # matrix [100, 60 * 180 * 3] * [60 * 180 * 3, 4 * 36] + [104] = [100, 4 * 36]
    y_predict = fc_model(image_batch)

    #  [100, 4 * 36]
    print(y_predict)

    # 3、先把目标值转换成one-hot编码 [100, 4, 36]
    y_true = predict_to_onehot(label_batch)

    # 4、softmax计算, 交叉熵损失计算
    with tf.variable_scope("soft_cross"):
        # 求平均交叉熵损失 ,y_true [100, 4, 36]--->[100, 4*36]
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
            labels=tf.reshape(y_true, [FLAGS.batch_size, FLAGS.label_num * FLAGS.letter_num]),
            logits=y_predict))
    # 5、梯度下降优化损失
    with tf.variable_scope("optimizer"):
        train_op = tf.train.GradientDescentOptimizer(0.0003).minimize(loss)

    # 6、求出样本的每批次预测的准确率是多少 三维比较
    with tf.variable_scope("acc"):
        # 比较每个预测值和目标值是否位置(4)一样    y_predict [100, 4 * 36]---->[100, 4, 36]
        equal_list = tf.equal(tf.argmax(y_true, 2),
                              tf.argmax(tf.reshape(y_predict, [FLAGS.batch_size, FLAGS.label_num, FLAGS.letter_num]),
                                        2))

        # equal_list  100个样本   [1, 0, 1, 0, 1, 1,..........]
        accuracy = tf.reduce_mean(tf.cast(equal_list, tf.float32))

    # 定义一个初始化变量的op
    init_op = tf.global_variables_initializer()

    # 开启会话训练
    with tf.Session() as sess:
        sess.run(init_op)

        # 定义线程协调器和开启线程（有数据在文件当中读取提供给模型）
        coord = tf.train.Coordinator()

        # 开启线程去运行读取文件操作
        threads = tf.train.start_queue_runners(sess, coord=coord)

        # 训练识别程序
        for i in range(1000):
            sess.run(train_op)

            print("第%d批次的准确率为：%f" % (i, accuracy.eval()))

        # 回收线程
        coord.request_stop()

        coord.join(threads)

    return None


if __name__ == "__main__":
    captcharec()