最近比较忙,之前一些博客没来得及完善,今天难得有时间,索性整理下之前的内容,让大家有个更加完整的学习过程。
如题,本篇介绍的是tensorflow实现验证码的识别,之前我们已经生成了数据集,并且转换成了tfrecord格式的文件,现在我们开始利用这个文件来进行训练及识别。
完整代码已开源到captcha识别
补充一点,我们可以有两种方法进行验证码识别,其一,把标签转为向量,向量长度为40,比如一个验证码为0782,它的标签可以转为长度为40的向量 1000000000 0000000100 0000000010 0010000000,接下来,训练方法和手写数字识别类似。其二,也是我们今天重点要实现的方法,使用的是多任务的学习方法,拆分为4个标签,依然拿0782举例,拆分为4个标签,
label0:1000000000
label1:0000000100
label2:0000000010
label3:0010000000
采用multi-task learning 多任务学习。
多任务学习是一种联合学习,多个任务并行学习,结果相互影响。所谓多任务学习,就是同时求解多个问题。个性化问题就是一种典型的多任务学习问题,它同时学习多个用户的兴趣偏好。拿大家经常使用的school data做个简单的对比,school data是用来预测学生成绩的回归问题的数据集,总共有139个中学的15362个学生,其中每一个中学都可以看作是一个预测任务。单任务学习就是忽略任务之间可能存在的关系分别学习139个回归函数进行分数的预测,或者直接将139个学校的所有数据放到一起学习一个回归函数进行预测。而多任务学习则看重 任务之间的联系,通过联合学习,同时对139个任务学习不同的回归函数,既考虑到了任务之间的差别,又考虑到任务之间的联系,这也是多任务学习最重要的思想之一。
多任务学习有交替训练和联合训练。由于数据集相同,我们采用的是多任务学习中的联合训练。
言归正传,我们下面用代码实现这个多任务学习。假设已经按照之前的步骤生成好了tfrecord文件,我们使用迁移学习利用alexnet_v2模型来完成。注意需要修改alexnet代码,该代码位于slim/nets文件夹下:
我们将nets拷贝到当前工程目录下,
修改后其完整代码如下:
- # Copyright 2016 The TensorFlow Authors. All Rights Reserved.
- #
- # Licensed under the Apache License, Version 2.0 (the “License”);
- # you may not use this file except in compliance with the License.
- # You may obtain a copy of the License at
- #
- # http://www.apache.org/licenses/LICENSE-2.0
- #
- # Unless required by applicable law or agreed to in writing, software
- # distributed under the License is distributed on an “AS IS” BASIS,
- # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- # See the License for the specific language governing permissions and
- # limitations under the License.
- # ==============================================================================
- ”“”Contains a model definition for AlexNet.
- This work was first described in:
- ImageNet Classification with Deep Convolutional Neural Networks
- Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton
- and later refined in:
- One weird trick for parallelizing convolutional neural networks
- Alex Krizhevsky, 2014
- Here we provide the implementation proposed in “One weird trick” and not
- ”ImageNet Classification”, as per the paper, the LRN layers have been removed.
- Usage:
- with slim.arg_scope(alexnet.alexnet_v2_arg_scope()):
- outputs, end_points = alexnet.alexnet_v2(inputs)
- @@alexnet_v2
- ”“”
- from __future__ import absolute_import
- from __future__ import division
- from __future__ import print_function
- import tensorflow as tf
- slim = tf.contrib.slim
- trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
- def alexnet_v2_arg_scope(weight_decay=0.0005):
- with slim.arg_scope([slim.conv2d, slim.fully_connected],
- activation_fn=tf.nn.relu,
- biases_initializer=tf.constant_initializer(0.1),
- weights_regularizer=slim.l2_regularizer(weight_decay)):
- with slim.arg_scope([slim.conv2d], padding=’SAME’):
- with slim.arg_scope([slim.max_pool2d], padding=’VALID’) as arg_sc:
- return arg_sc
- def alexnet_v2(inputs,
- num_classes=1000,
- is_training=True,
- dropout_keep_prob=0.5,
- spatial_squeeze=True,
- scope=’alexnet_v2’,
- global_pool=False):
- ”“”AlexNet version 2.
- Described in: http://arxiv.org/pdf/1404.5997v2.pdf
- Parameters from:
- github.com/akrizhevsky/cuda-convnet2/blob/master/layers/
- layers-imagenet-1gpu.cfg
- Note: All the fully_connected layers have been transformed to conv2d layers.
- To use in classification mode, resize input to 224x224 or set
- global_pool=True. To use in fully convolutional mode, set
- spatial_squeeze to false.
- The LRN layers have been removed and change the initializers from
- random_normal_initializer to xavier_initializer.
- Args:
- inputs: a tensor of size [batch_size, height, width, channels].
- num_classes: the number of predicted classes. If 0 or None, the logits layer
- is omitted and the input features to the logits layer are returned instead.
- is_training: whether or not the model is being trained.
- dropout_keep_prob: the probability that activations are kept in the dropout
- layers during training.
- spatial_squeeze: whether or not should squeeze the spatial dimensions of the
- logits. Useful to remove unnecessary dimensions for classification.
- scope: Optional scope for the variables.
- global_pool: Optional boolean flag. If True, the input to the classification
- layer is avgpooled to size 1x1, for any input size. (This is not part
- of the original AlexNet.)
- Returns:
- net: the output of the logits layer (if num_classes is a non-zero integer),
- or the non-dropped-out input to the logits layer (if num_classes is 0
- or None).
- end_points: a dict of tensors with intermediate activations.
- ”“”
- with tf.variable_scope(scope, ’alexnet_v2’, [inputs]) as sc:
- end_points_collection = sc.original_name_scope + ’_end_points’
- # Collect outputs for conv2d, fully_connected and max_pool2d.
- with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
- outputs_collections=[end_points_collection]):
- net = slim.conv2d(inputs, 64, [11, 11], 4, padding=‘VALID’,
- scope=’conv1’)
- net = slim.max_pool2d(net, [3, 3], 2, scope=‘pool1’)
- net = slim.conv2d(net, 192, [5, 5], scope=‘conv2’)
- net = slim.max_pool2d(net, [3, 3], 2, scope=‘pool2’)
- net = slim.conv2d(net, 384, [3, 3], scope=‘conv3’)
- net = slim.conv2d(net, 384, [3, 3], scope=‘conv4’)
- net = slim.conv2d(net, 256, [3, 3], scope=‘conv5’)
- net = slim.max_pool2d(net, [3, 3], 2, scope=‘pool5’)
- # Use conv2d instead of fully_connected layers.
- with slim.arg_scope([slim.conv2d],
- weights_initializer=trunc_normal(0.005),
- biases_initializer=tf.constant_initializer(0.1)):
- net = slim.conv2d(net, 4096, [5, 5], padding=‘VALID’,
- scope=’fc6’)
- net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
- scope=’dropout6’)
- net = slim.conv2d(net, 4096, [1, 1], scope=‘fc7’)
- # ++ add by yourself++
- net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
- scope=’dropout7’)
- # ++ add multi-task learing
- net0 = slim.conv2d(net,num_classes,[1,1],
- activation_fn=None,
- normalizer_fn=None,
- biases_initializer=tf.zeros_initializer(),
- scope=’fc8_0’)
- net1 = slim.conv2d(net, num_classes, [1, 1],
- activation_fn=None,
- normalizer_fn=None,
- biases_initializer=tf.zeros_initializer(),
- scope=’fc8_1’)
- net2 = slim.conv2d(net, num_classes, [1, 1],
- activation_fn=None,
- normalizer_fn=None,
- biases_initializer=tf.zeros_initializer(),
- scope=’fc8_2’)
- net3 = slim.conv2d(net, num_classes, [1, 1],
- activation_fn=None,
- normalizer_fn=None,
- biases_initializer=tf.zeros_initializer(),
- scope=’fc8_3’)
- # Convert end_points_collection into a end_point dict.
- end_points = slim.utils.convert_collection_to_dict(
- end_points_collection)
- if global_pool:
- net = tf.reduce_mean(net, [1, 2], keep_dims=True, name=‘global_pool’)
- end_points[’global_pool’] = net
- if num_classes:
- net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
- scope=’dropout7’)
- net = slim.conv2d(net, num_classes, [1, 1],
- activation_fn=None,
- normalizer_fn=None,
- biases_initializer=tf.zeros_initializer(),
- scope=’fc8’)
- if spatial_squeeze:
- net0 = tf.squeeze(net0, [1, 2], name=‘fc8_0/squeezed’)
- end_points[sc.name + ’/fc8_0’] = net0
- net1 = tf.squeeze(net1, [1, 2], name=‘fc8_1/squeezed’)
- end_points[sc.name + ’/fc8_1’] = net1
- net2 = tf.squeeze(net2, [1, 2], name=‘fc8_2/squeezed’)
- end_points[sc.name + ’/fc8_2’] = net2
- net3 = tf.squeeze(net3, [1, 2], name=‘fc8_3/squeezed’)
- end_points[sc.name + ’/fc8_3’] = net3
- return net0, net1,net2,net3,end_points
- alexnet_v2.default_image_size = 224
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
#
http://www.apache.org/licenses/LICENSE-2.0
#
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================
"""Contains a model definition for AlexNet.
This work was first described in:
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton
and later refined in:
One weird trick for parallelizing convolutional neural networks
Alex Krizhevsky, 2014
Here we provide the implementation proposed in "One weird trick" and not
"ImageNet Classification", as per the paper, the LRN layers have been removed.
Usage:
with slim.arg_scope(alexnet.alexnet_v2_arg_scope()):
outputs, end_points = alexnet.alexnet_v2(inputs)
@@alexnet_v2
"""
from future import absolute_import
from future import division
from future import print_function
import tensorflow as tf
slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
def alexnet_v2_arg_scope(weight_decay=0.0005):
with slim.arg_scope([slim.conv2d, slim.fully_connected],
activation_fn=tf.nn.relu,
biases_initializer=tf.constant_initializer(0.1),
weights_regularizer=slim.l2_regularizer(weight_decay)):
with slim.arg_scope([slim.conv2d], padding='SAME'):
with slim.arg_scope([slim.max_pool2d], padding='VALID') as arg_sc:
return arg_sc
def alexnet_v2(inputs,
num_classes=1000,
is_training=True,
dropout_keep_prob=0.5,
spatial_squeeze=True,
scope='alexnet_v2',
global_pool=False):
"""AlexNet version 2.
Described in: http://arxiv.org/pdf/1404.5997v2.pdf
Parameters from:
github.com/akrizhevsky/cuda-convnet2/blob/master/layers/
layers-imagenet-1gpu.cfg
Note: All the fully_connected layers have been transformed to conv2d layers.
To use in classification mode, resize input to 224x224 or set
global_pool=True. To use in fully convolutional mode, set
spatial_squeeze to false.
The LRN layers have been removed and change the initializers from
random_normal_initializer to xavier_initializer.
Args:
inputs: a tensor of size [batch_size, height, width, channels].
num_classes: the number of predicted classes. If 0 or None, the logits layer
is omitted and the input features to the logits layer are returned instead.
is_training: whether or not the model is being trained.
dropout_keep_prob: the probability that activations are kept in the dropout
layers during training.
spatial_squeeze: whether or not should squeeze the spatial dimensions of the
logits. Useful to remove unnecessary dimensions for classification.
scope: Optional scope for the variables.
global_pool: Optional boolean flag. If True, the input to the classification
layer is avgpooled to size 1x1, for any input size. (This is not part
of the original AlexNet.)
Returns:
net: the output of the logits layer (if num_classes is a non-zero integer),
or the non-dropped-out input to the logits layer (if num_classes is 0
or None).
end_points: a dict of tensors with intermediate activations.
"""
with tf.variable_scope(scope, 'alexnet_v2', [inputs]) as sc:
end_points_collection = sc.original_name_scope + '_end_points'
# Collect outputs for conv2d, fully_connected and max_pool2d.
with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
outputs_collections=[end_points_collection]):
net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID',
scope='conv1')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool1')
net = slim.conv2d(net, 192, [5, 5], scope='conv2')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool2')
net = slim.conv2d(net, 384, [3, 3], scope='conv3')
net = slim.conv2d(net, 384, [3, 3], scope='conv4')
net = slim.conv2d(net, 256, [3, 3], scope='conv5')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool5')
# Use conv2d instead of fully_connected layers.
with slim.arg_scope([slim.conv2d],
weights_initializer=trunc_normal(0.005),
biases_initializer=tf.constant_initializer(0.1)):
net = slim.conv2d(net, 4096, [5, 5], padding='VALID',
scope='fc6')
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
scope='dropout6')
net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
# ++ add by yourself++
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
scope='dropout7')
# ++ add multi-task learing
net0 = slim.conv2d(net,num_classes,[1,1],
activation_fn=None,
normalizer_fn=None,
biases_initializer=tf.zeros_initializer(),
scope='fc8_0')
net1 = slim.conv2d(net, num_classes, [1, 1],
activation_fn=None,
normalizer_fn=None,
biases_initializer=tf.zeros_initializer(),
scope='fc8_1')
net2 = slim.conv2d(net, num_classes, [1, 1],
activation_fn=None,
normalizer_fn=None,
biases_initializer=tf.zeros_initializer(),
scope='fc8_2')
net3 = slim.conv2d(net, num_classes, [1, 1],
activation_fn=None,
normalizer_fn=None,
biases_initializer=tf.zeros_initializer(),
scope='fc8_3')
# Convert end_points_collection into a end_point dict.
end_points = slim.utils.convert_collection_to_dict(
end_points_collection)
if global_pool:
net = tf.reduce_mean(net, [1, 2], keep_dims=True, name='global_pool')
end_points['global_pool'] = net
if num_classes:
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
scope='dropout7')
net = slim.conv2d(net, num_classes, [1, 1],
activation_fn=None,
normalizer_fn=None,
biases_initializer=tf.zeros_initializer(),
scope='fc8')
if spatial_squeeze:
net0 = tf.squeeze(net0, [1, 2], name='fc8_0/squeezed')
end_points[sc.name + '/fc8_0'] = net0
net1 = tf.squeeze(net1, [1, 2], name='fc8_1/squeezed')
end_points[sc.name + '/fc8_1'] = net1
net2 = tf.squeeze(net2, [1, 2], name='fc8_2/squeezed')
end_points[sc.name + '/fc8_2'] = net2
net3 = tf.squeeze(net3, [1, 2], name='fc8_3/squeezed')
end_points[sc.name + '/fc8_3'] = net3
return net0, net1,net2,net3,end_points
alexnet_v2.default_image_size = 224
接下来进行训练,详细讲解见代码:
- # 验证码识别(训练)
- import os
- import tensorflow as tf
- from PIL import Image
- from nets import nets_factory
- import numpy as np
- # 不同字符数量
- CHAR_SET_LEN = 10
- # 图片高度
- IMAGE_HEIGHT = 60
- # 图片宽度
- IMAGE_WIDTH = 160
- # 批次
- BATCH_SIZE = 25
- # tfrecord文件存放路径
- TFRECORD_FILE = ’E:/SVN/Gavin/Learn/Python/pygame/captcha/train.tfrecords’
- # placeholder
- x = tf.placeholder(tf.float32,[None,224,224])
- y0 = tf.placeholder(tf.float32,[None])
- y1 = tf.placeholder(tf.float32,[None])
- y2 = tf.placeholder(tf.float32,[None])
- y3 = tf.placeholder(tf.float32,[None])
- # 学习率
- lr = tf.Variable(0.003,dtype=tf.float32)
- # 从tfrecord读出数据
- def read_and_decode(filename):
- # 根据文件名生成一个队列
- filename_queue = tf.train.string_input_producer([filename])
- # create a reader from file queue
- reader = tf.TFRecordReader()
- # reader从文件队列中读入一个序列化的样本,返回文件名和文件
- , serialized_example = reader.read(filename_queue)
- # get feature from serialized example
- # 解析符号化的样本
- features = tf.parse_single_example(
- serialized_example,
- features={
- ’image’: tf.FixedLenFeature([], tf.string),
- ’label0’: tf.FixedLenFeature([], tf.int64),
- ’label1’: tf.FixedLenFeature([], tf.int64),
- ’label2’: tf.FixedLenFeature([], tf.int64),
- ’label3’: tf.FixedLenFeature([], tf.int64),
- }
- )
- img = features[’image’]
- # 获取图片数据
- image = tf.decode_raw(img, tf.uint8)
- # 没有经过预处理的灰度图
- image = tf.reshape(image, [224,224])
- # 图片预处理
- image = tf.cast(image, tf.float32) /255.0
- image = tf.subtract(image,0.5)
- image = tf.multiply(image,2.0)
- # 获取label
- label0 = tf.cast(features[’label0’], tf.int32)
- label1 = tf.cast(features[’label1’], tf.int32)
- label2 = tf.cast(features[’label2’], tf.int32)
- label3 = tf.cast(features[’label3’], tf.int32)
- return image, label0,label1,label2,label3
- # 获取图片数据和标签
- image, label0, label1, label2, label3 = read_and_decode(TFRECORD_FILE)
- print(image,label0,label1,label2, label3)
- # 使用shuffle_batch可以随机打乱输入 next_batch挨着往下取
- # shuffle_batch才能实现[img,label]的同步,也即特征和label的同步,不然可能输入的特征和label不匹配
- # 比如只有这样使用,才能使img和label一一对应,每次提取一个image和对应的label
- # shuffle_batch返回的值就是RandomShuffleQueue.dequeue_many()的结果
- # Shuffle_batch构建了一个RandomShuffleQueue,并不断地把单个的[img,label],送入队列中
- img_batch, label_batch0,label_batch1,label_batch2,label_batch3 = tf.train.shuffle_batch(
- [image, label0,label1,label2,label3],
- batch_size=BATCH_SIZE, capacity=5000,
- min_after_dequeue=1000,num_threads=1)
- # 定义网络结构
- train_network_fn = nets_factory.get_network_fn(
- ’alexnet_v2’,
- num_classes=CHAR_SET_LEN,
- weight_decay=0.0005,
- is_training=True
- )
- with tf.Session() as sess:
- X = tf.reshape(x,[BATCH_SIZE,224,224,1])
- # 数据输入网络得到输出值
- logits0,logits1,logits2,logits3,end_points = train_network_fn(X)
- # 把标签转为one_hot的形式
- one_hot_labels0 = tf.one_hot(indices=tf.cast(y0,tf.int32),depth=CHAR_SET_LEN)
- one_hot_labels1 = tf.one_hot(indices=tf.cast(y1, tf.int32), depth=CHAR_SET_LEN)
- one_hot_labels2 = tf.one_hot(indices=tf.cast(y2, tf.int32), depth=CHAR_SET_LEN)
- one_hot_labels3 = tf.one_hot(indices=tf.cast(y3, tf.int32), depth=CHAR_SET_LEN)
- # 计算Loss
- loss0 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits0,
- labels=one_hot_labels0))
- loss1 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits1,
- labels=one_hot_labels1))
- loss2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits2,
- labels=one_hot_labels2))
- loss3 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits3,
- labels=one_hot_labels3))
- # 计算总的loss
- total_loss = (loss0 + loss1 +loss2 + loss3) / 4.0
- # 优化total_loss
- optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(total_loss)
- # 计算准确率
- correct_prediction0 = tf.equal(tf.argmax(one_hot_labels0,1),tf.argmax(logits0,1))
- accuracy0 = tf.reduce_mean(tf.cast(correct_prediction0,tf.float32))
- correct_prediction1 = tf.equal(tf.argmax(one_hot_labels1, 1), tf.argmax(logits1, 1))
- accuracy1 = tf.reduce_mean(tf.cast(correct_prediction1, tf.float32))
- correct_prediction2 = tf.equal(tf.argmax(one_hot_labels2, 1), tf.argmax(logits2, 1))
- accuracy2 = tf.reduce_mean(tf.cast(correct_prediction2, tf.float32))
- correct_prediction3 = tf.equal(tf.argmax(one_hot_labels3, 1), tf.argmax(logits3, 1))
- accuracy3 = tf.reduce_mean(tf.cast(correct_prediction3, tf.float32))
- # 用于保存模型
- saver = tf.train.Saver()
- # 初始化
- sess.run(tf.global_variables_initializer())
- # 创建一个协调器,管理线程
- coord = tf.train.Coordinator()
- # 启动队列
- threads = tf.train.start_queue_runners(sess=sess,coord=coord)
- for i in range(6001):
- # 获取一个批次的数据和标签
- b_image, b_label0,b_label1,b_label2,b_label3 = sess.run([img_batch,
- label_batch0, label_batch1, label_batch2, label_batch3])
- # 优化模型
- sess.run(optimizer,feed_dict={x:b_image,y0:b_label0,y1:b_label1,y2:b_label2,y3:b_label3})
- # 每迭代20次,计算一次loss和准确率
- if i % 20 == 0:
- # 每迭代2000次,降低一次学习率
- if i % 2000 == 0:
- sess.run(tf.assign(lr,lr/3))
- acc0,acc1,acc2,acc3,loss = sess.run([accuracy0,accuracy1,accuracy2,accuracy3,total_loss],
- feed_dict={x:b_image,y0:b_label0,y1:b_label1,y2:b_label2,y3:b_label3})
- learning_rate = sess.run(lr)
- print(“Iter:%d Loss:%.3f Accuracy: %.2f,%.2f,%.2f,%.2f Learning_rate:%.4f”
- % (i,loss_,acc0,acc1,acc2,acc3,learning_rate) )
- # 保存模型
- if i == 6000:
- saver.save(sess,’./captcha/crack_captcha.model’,global_step=i)
- break
- # 通知其他线程关闭
- coord.request_stop()
- # 其他所有线程关闭之后,这一函数才能返回
- coord.join(threads)
# 验证码识别(训练)
import os
import tensorflow as tf
from PIL import Image
from nets import nets_factory
import numpy as np
不同字符数量
CHAR_SET_LEN = 10
图片高度
IMAGE_HEIGHT = 60
图片宽度
IMAGE_WIDTH = 160
批次
BATCH_SIZE = 25
tfrecord文件存放路径
TFRECORD_FILE = 'E:/SVN/Gavin/Learn/Python/pygame/captcha/train.tfrecords'
placeholder
x = tf.placeholder(tf.float32,[None,224,224])
y0 = tf.placeholder(tf.float32,[None])
y1 = tf.placeholder(tf.float32,[None])
y2 = tf.placeholder(tf.float32,[None])
y3 = tf.placeholder(tf.float32,[None])
学习率
lr = tf.Variable(0.003,dtype=tf.float32)
从tfrecord读出数据
def read_and_decode(filename):
# 根据文件名生成一个队列
filename_queue = tf.train.string_input_producer([filename])
# create a reader from file queue
reader = tf.TFRecordReader()
# reader从文件队列中读入一个序列化的样本,返回文件名和文件
_, serialized_example = reader.read(filename_queue)
# get feature from serialized example
# 解析符号化的样本
features = tf.parse_single_example(
serialized_example,
features={
'image': tf.FixedLenFeature([], tf.string),
'label0': tf.FixedLenFeature([], tf.int64),
'label1': tf.FixedLenFeature([], tf.int64),
'label2': tf.FixedLenFeature([], tf.int64),
'label3': tf.FixedLenFeature([], tf.int64),
}
)
img = features['image']
# 获取图片数据
image = tf.decode_raw(img, tf.uint8)
# 没有经过预处理的灰度图
image = tf.reshape(image, [224,224])
# 图片预处理
image = tf.cast(image, tf.float32) /255.0
image = tf.subtract(image,0.5)
image = tf.multiply(image,2.0)
# 获取label
label0 = tf.cast(features['label0'], tf.int32)
label1 = tf.cast(features['label1'], tf.int32)
label2 = tf.cast(features['label2'], tf.int32)
label3 = tf.cast(features['label3'], tf.int32)
return image, label0,label1,label2,label3
获取图片数据和标签
image, label0, label1, label2, label3 = read_and_decode(TFRECORD_FILE)
print(image,label0,label1,label2, label3)
使用shuffle_batch可以随机打乱输入 next_batch挨着往下取
shuffle_batch才能实现[img,label]的同步,也即特征和label的同步,不然可能输入的特征和label不匹配
比如只有这样使用,才能使img和label一一对应,每次提取一个image和对应的label
shuffle_batch返回的值就是RandomShuffleQueue.dequeue_many()的结果
Shuffle_batch构建了一个RandomShuffleQueue,并不断地把单个的[img,label],送入队列中
img_batch, label_batch0,label_batch1,label_batch2,label_batch3 = tf.train.shuffle_batch(
[image, label0,label1,label2,label3],
batch_size=BATCH_SIZE, capacity=5000,
min_after_dequeue=1000,num_threads=1)
定义网络结构
train_network_fn = nets_factory.get_network_fn(
'alexnet_v2',
num_classes=CHAR_SET_LEN,
weight_decay=0.0005,
is_training=True
)
with tf.Session() as sess:
X = tf.reshape(x,[BATCH_SIZE,224,224,1])
# 数据输入网络得到输出值
logits0,logits1,logits2,logits3,end_points = train_network_fn(X)
# 把标签转为one_hot的形式
one_hot_labels0 = tf.one_hot(indices=tf.cast(y0,tf.int32),depth=CHAR_SET_LEN)
one_hot_labels1 = tf.one_hot(indices=tf.cast(y1, tf.int32), depth=CHAR_SET_LEN)
one_hot_labels2 = tf.one_hot(indices=tf.cast(y2, tf.int32), depth=CHAR_SET_LEN)
one_hot_labels3 = tf.one_hot(indices=tf.cast(y3, tf.int32), depth=CHAR_SET_LEN)
# 计算Loss
loss0 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits0,
labels=one_hot_labels0))
loss1 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits1,
labels=one_hot_labels1))
loss2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits2,
labels=one_hot_labels2))
loss3 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits3,
labels=one_hot_labels3))
# 计算总的loss
total_loss = (loss0 + loss1 +loss2 + loss3) / 4.0
# 优化total_loss
optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(total_loss)
# 计算准确率
correct_prediction0 = tf.equal(tf.argmax(one_hot_labels0,1),tf.argmax(logits0,1))
accuracy0 = tf.reduce_mean(tf.cast(correct_prediction0,tf.float32))
correct_prediction1 = tf.equal(tf.argmax(one_hot_labels1, 1), tf.argmax(logits1, 1))
accuracy1 = tf.reduce_mean(tf.cast(correct_prediction1, tf.float32))
correct_prediction2 = tf.equal(tf.argmax(one_hot_labels2, 1), tf.argmax(logits2, 1))
accuracy2 = tf.reduce_mean(tf.cast(correct_prediction2, tf.float32))
correct_prediction3 = tf.equal(tf.argmax(one_hot_labels3, 1), tf.argmax(logits3, 1))
accuracy3 = tf.reduce_mean(tf.cast(correct_prediction3, tf.float32))
# 用于保存模型
saver = tf.train.Saver()
# 初始化
sess.run(tf.global_variables_initializer())
# 创建一个协调器,管理线程
coord = tf.train.Coordinator()
# 启动队列
threads = tf.train.start_queue_runners(sess=sess,coord=coord)
for i in range(6001):
# 获取一个批次的数据和标签
b_image, b_label0,b_label1,b_label2,b_label3 = sess.run([img_batch,
label_batch0, label_batch1, label_batch2, label_batch3])
# 优化模型
sess.run(optimizer,feed_dict={x:b_image,y0:b_label0,y1:b_label1,y2:b_label2,y3:b_label3})
# 每迭代20次,计算一次loss和准确率
if i % 20 == 0:
# 每迭代2000次,降低一次学习率
if i % 2000 == 0:
sess.run(tf.assign(lr,lr/3))
acc0,acc1,acc2,acc3,loss_ = sess.run([accuracy0,accuracy1,accuracy2,accuracy3,total_loss],
feed_dict={x:b_image,y0:b_label0,y1:b_label1,y2:b_label2,y3:b_label3})
learning_rate = sess.run(lr)
print("Iter:%d Loss:%.3f Accuracy: %.2f,%.2f,%.2f,%.2f Learning_rate:%.4f"
% (i,loss_,acc0,acc1,acc2,acc3,learning_rate) )
# 保存模型
if i == 6000:
saver.save(sess,'./captcha/crack_captcha.model',global_step=i)
break
# 通知其他线程关闭
coord.request_stop()
# 其他所有线程关闭之后,这一函数才能返回
coord.join(threads)
经过漫长的训练后,结果如下:
- Tensor(“Mul:0”, shape=(224, 224), dtype=float32) Tensor(“Cast_1:0”, shape=(), dtype=int32) Tensor(“Cast_2:0”, shape=(), dtype=int32) Tensor(“Cast_3:0”, shape=(), dtype=int32) Tensor(“Cast_4:0”, shape=(), dtype=int32)
- 2018-03-12 14:28:06.847545: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
- Iter:0 Loss:1584.626 Accuracy: 0.22,0.20,0.16,0.16 Learning_rate:0.0010
- Iter:20 Loss:2.297 Accuracy: 0.10,0.12,0.12,0.14 Learning_rate:0.0010
- Iter:40 Loss:2.288 Accuracy: 0.18,0.12,0.16,0.08 Learning_rate:0.0010
- Iter:60 Loss:2.301 Accuracy: 0.12,0.14,0.06,0.14 Learning_rate:0.0010
- Iter:80 Loss:2.299 Accuracy: 0.10,0.18,0.08,0.18 Learning_rate:0.0010
- ……….
Tensor("Mul:0", shape=(224, 224), dtype=float32) Tensor("Cast_1:0", shape=(), dtype=int32) Tensor("Cast_2:0", shape=(), dtype=int32) Tensor("Cast_3:0", shape=(), dtype=int32) Tensor("Cast_4:0", shape=(), dtype=int32)
2018-03-12 14:28:06.847545: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
Iter:0 Loss:1584.626 Accuracy: 0.22,0.20,0.16,0.16 Learning_rate:0.0010
Iter:20 Loss:2.297 Accuracy: 0.10,0.12,0.12,0.14 Learning_rate:0.0010
Iter:40 Loss:2.288 Accuracy: 0.18,0.12,0.16,0.08 Learning_rate:0.0010
Iter:60 Loss:2.301 Accuracy: 0.12,0.14,0.06,0.14 Learning_rate:0.0010
Iter:80 Loss:2.299 Accuracy: 0.10,0.18,0.08,0.18 Learning_rate:0.0010
..........
可能有人会问,如何训练带字母字符的验证码呢?其实很简单,A-Z,一共26个字母,我们可以映射为11~35这26个数字,A:10,B:11,,,Z :35,那么,这种数字+字母的组合一共有10+26=36个字符,同样采用one-hot编码,label是一个36维的向量,只有1个值为1,其余为0,A:000000000010000…..000;其他都同上述方法二。
最后,我们可以在测试集上进行测试,稍后下一篇继续….