多任务学习

    最近比较忙,之前一些博客没来得及完善,今天难得有时间,索性整理下之前的内容,让大家有个更加完整的学习过程。

    如题,本篇介绍的是tensorflow实现验证码的识别,之前我们已经生成了数据集,并且转换成了tfrecord格式的文件,现在我们开始利用这个文件来进行训练及识别。

  完整代码已开源到captcha识别

   补充一点,我们可以有两种方法进行验证码识别,其一,把标签转为向量,向量长度为40,比如一个验证码为0782,它的标签可以转为长度为40的向量 1000000000 0000000100 0000000010 0010000000,接下来,训练方法和手写数字识别类似。其二,也是我们今天重点要实现的方法,使用的是多任务的学习方法,拆分为4个标签,依然拿0782举例,拆分为4个标签,

label0:1000000000 

label1:0000000100 

label2:0000000010 

label3:0010000000

采用multi-task learning 多任务学习。


    多任务学习是一种联合学习,多个任务并行学习,结果相互影响。所谓多任务学习,就是同时求解多个问题。个性化问题就是一种典型的多任务学习问题,它同时学习多个用户的兴趣偏好。拿大家经常使用的school data做个简单的对比,school data是用来预测学生成绩的回归问题的数据集,总共有139个中学的15362个学生,其中每一个中学都可以看作是一个预测任务。单任务学习就是忽略任务之间可能存在的关系分别学习139个回归函数进行分数的预测,或者直接将139个学校的所有数据放到一起学习一个回归函数进行预测。而多任务学习则看重 任务之间的联系,通过联合学习,同时对139个任务学习不同的回归函数,既考虑到了任务之间的差别,又考虑到任务之间的联系,这也是多任务学习最重要的思想之一。

    多任务学习有交替训练和联合训练。由于数据集相同,我们采用的是多任务学习中的联合训练。

言归正传,我们下面用代码实现这个多任务学习。假设已经按照之前的步骤生成好了tfrecord文件,我们使用迁移学习利用alexnet_v2模型来完成。注意需要修改alexnet代码,该代码位于slim/nets文件夹下:


我们将nets拷贝到当前工程目录下,


修改后其完整代码如下:


[python] view plain copy
print ?
  1. # Copyright 2016 The TensorFlow Authors. All Rights Reserved.  
  2. #  
  3. # Licensed under the Apache License, Version 2.0 (the “License”);  
  4. # you may not use this file except in compliance with the License.  
  5. # You may obtain a copy of the License at  
  6. #  
  7. # http://www.apache.org/licenses/LICENSE-2.0  
  8. #  
  9. # Unless required by applicable law or agreed to in writing, software  
  10. # distributed under the License is distributed on an “AS IS” BASIS,  
  11. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
  12. # See the License for the specific language governing permissions and  
  13. # limitations under the License.  
  14. # ==============================================================================  
  15. ”“”Contains a model definition for AlexNet. 
  16.  
  17. This work was first described in: 
  18.   ImageNet Classification with Deep Convolutional Neural Networks 
  19.   Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton 
  20.  
  21. and later refined in: 
  22.   One weird trick for parallelizing convolutional neural networks 
  23.   Alex Krizhevsky, 2014 
  24.  
  25. Here we provide the implementation proposed in “One weird trick” and not 
  26. ”ImageNet Classification”, as per the paper, the LRN layers have been removed. 
  27.  
  28. Usage: 
  29.   with slim.arg_scope(alexnet.alexnet_v2_arg_scope()): 
  30.     outputs, end_points = alexnet.alexnet_v2(inputs) 
  31.  
  32. @@alexnet_v2 
  33. ”“”  
  34.   
  35. from __future__ import absolute_import  
  36. from __future__ import division  
  37. from __future__ import print_function  
  38.   
  39. import tensorflow as tf  
  40.   
  41. slim = tf.contrib.slim  
  42. trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)  
  43.   
  44.   
  45. def alexnet_v2_arg_scope(weight_decay=0.0005):  
  46.   with slim.arg_scope([slim.conv2d, slim.fully_connected],  
  47.                       activation_fn=tf.nn.relu,  
  48.                       biases_initializer=tf.constant_initializer(0.1),  
  49.                       weights_regularizer=slim.l2_regularizer(weight_decay)):  
  50.     with slim.arg_scope([slim.conv2d], padding=’SAME’):  
  51.       with slim.arg_scope([slim.max_pool2d], padding=’VALID’) as arg_sc:  
  52.         return arg_sc  
  53.   
  54.   
  55. def alexnet_v2(inputs,  
  56.                num_classes=1000,  
  57.                is_training=True,  
  58.                dropout_keep_prob=0.5,  
  59.                spatial_squeeze=True,  
  60.                scope=’alexnet_v2’,  
  61.                global_pool=False):  
  62.   ”“”AlexNet version 2. 
  63.  
  64.   Described in: http://arxiv.org/pdf/1404.5997v2.pdf 
  65.   Parameters from: 
  66.   github.com/akrizhevsky/cuda-convnet2/blob/master/layers/ 
  67.   layers-imagenet-1gpu.cfg 
  68.  
  69.   Note: All the fully_connected layers have been transformed to conv2d layers. 
  70.         To use in classification mode, resize input to 224x224 or set 
  71.         global_pool=True. To use in fully convolutional mode, set 
  72.         spatial_squeeze to false. 
  73.         The LRN layers have been removed and change the initializers from 
  74.         random_normal_initializer to xavier_initializer. 
  75.  
  76.   Args: 
  77.     inputs: a tensor of size [batch_size, height, width, channels]. 
  78.     num_classes: the number of predicted classes. If 0 or None, the logits layer 
  79.     is omitted and the input features to the logits layer are returned instead. 
  80.     is_training: whether or not the model is being trained. 
  81.     dropout_keep_prob: the probability that activations are kept in the dropout 
  82.       layers during training. 
  83.     spatial_squeeze: whether or not should squeeze the spatial dimensions of the 
  84.       logits. Useful to remove unnecessary dimensions for classification. 
  85.     scope: Optional scope for the variables. 
  86.     global_pool: Optional boolean flag. If True, the input to the classification 
  87.       layer is avgpooled to size 1x1, for any input size. (This is not part 
  88.       of the original AlexNet.) 
  89.  
  90.   Returns: 
  91.     net: the output of the logits layer (if num_classes is a non-zero integer), 
  92.       or the non-dropped-out input to the logits layer (if num_classes is 0 
  93.       or None). 
  94.     end_points: a dict of tensors with intermediate activations. 
  95.   ”“”  
  96.   with tf.variable_scope(scope, ’alexnet_v2’, [inputs]) as sc:  
  97.     end_points_collection = sc.original_name_scope + ’_end_points’  
  98.     # Collect outputs for conv2d, fully_connected and max_pool2d.  
  99.     with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],  
  100.                         outputs_collections=[end_points_collection]):  
  101.       net = slim.conv2d(inputs, 64, [1111], 4, padding=‘VALID’,  
  102.                         scope=’conv1’)  
  103.       net = slim.max_pool2d(net, [33], 2, scope=‘pool1’)  
  104.       net = slim.conv2d(net, 192, [55], scope=‘conv2’)  
  105.       net = slim.max_pool2d(net, [33], 2, scope=‘pool2’)  
  106.       net = slim.conv2d(net, 384, [33], scope=‘conv3’)  
  107.       net = slim.conv2d(net, 384, [33], scope=‘conv4’)  
  108.       net = slim.conv2d(net, 256, [33], scope=‘conv5’)  
  109.       net = slim.max_pool2d(net, [33], 2, scope=‘pool5’)  
  110.   
  111.       # Use conv2d instead of fully_connected layers.  
  112.       with slim.arg_scope([slim.conv2d],  
  113.                           weights_initializer=trunc_normal(0.005),  
  114.                           biases_initializer=tf.constant_initializer(0.1)):  
  115.         net = slim.conv2d(net, 4096, [55], padding=‘VALID’,  
  116.                           scope=’fc6’)  
  117.         net = slim.dropout(net, dropout_keep_prob, is_training=is_training,  
  118.                            scope=’dropout6’)  
  119.         net = slim.conv2d(net, 4096, [11], scope=‘fc7’)  
  120.   
  121.   
  122.   
  123.         # ++ add by yourself++  
  124.         net = slim.dropout(net, dropout_keep_prob, is_training=is_training,  
  125.                            scope=’dropout7’)  
  126.         # ++ add  multi-task learing  
  127.         net0 = slim.conv2d(net,num_classes,[1,1],  
  128.                            activation_fn=None,  
  129.                            normalizer_fn=None,  
  130.                            biases_initializer=tf.zeros_initializer(),  
  131.                            scope=’fc8_0’)  
  132.         net1 = slim.conv2d(net, num_classes, [11],  
  133.                            activation_fn=None,  
  134.                            normalizer_fn=None,  
  135.                            biases_initializer=tf.zeros_initializer(),  
  136.                            scope=’fc8_1’)  
  137.         net2 = slim.conv2d(net, num_classes, [11],  
  138.                            activation_fn=None,  
  139.                            normalizer_fn=None,  
  140.                            biases_initializer=tf.zeros_initializer(),  
  141.                            scope=’fc8_2’)  
  142.         net3 = slim.conv2d(net, num_classes, [11],  
  143.                            activation_fn=None,  
  144.                            normalizer_fn=None,  
  145.                            biases_initializer=tf.zeros_initializer(),  
  146.                            scope=’fc8_3’)  
  147.   
  148.         # Convert end_points_collection into a end_point dict.  
  149.         end_points = slim.utils.convert_collection_to_dict(  
  150.             end_points_collection)  
  151.         if global_pool:  
  152.           net = tf.reduce_mean(net, [12], keep_dims=True, name=‘global_pool’)  
  153.           end_points[’global_pool’] = net  
  154.         if num_classes:  
  155.           net = slim.dropout(net, dropout_keep_prob, is_training=is_training,  
  156.                              scope=’dropout7’)  
  157.           net = slim.conv2d(net, num_classes, [11],  
  158.                             activation_fn=None,  
  159.                             normalizer_fn=None,  
  160.                             biases_initializer=tf.zeros_initializer(),  
  161.                             scope=’fc8’)  
  162.           if spatial_squeeze:  
  163.             net0 = tf.squeeze(net0, [12], name=‘fc8_0/squeezed’)  
  164.             end_points[sc.name + ’/fc8_0’] = net0  
  165.             net1 = tf.squeeze(net1, [12], name=‘fc8_1/squeezed’)  
  166.             end_points[sc.name + ’/fc8_1’] = net1  
  167.             net2 = tf.squeeze(net2, [12], name=‘fc8_2/squeezed’)  
  168.             end_points[sc.name + ’/fc8_2’] = net2  
  169.             net3 = tf.squeeze(net3, [12], name=‘fc8_3/squeezed’)  
  170.             end_points[sc.name + ’/fc8_3’] = net3  
  171.       return net0, net1,net2,net3,end_points  
  172. alexnet_v2.default_image_size = 224  
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#




Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

#

http://www.apache.org/licenses/LICENSE-2.0

#

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

==============================================================================

"""Contains a model definition for AlexNet.

This work was first described in:
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton

and later refined in:
One weird trick for parallelizing convolutional neural networks
Alex Krizhevsky, 2014

Here we provide the implementation proposed in "One weird trick" and not
"ImageNet Classification", as per the paper, the LRN layers have been removed.

Usage:
with slim.arg_scope(alexnet.alexnet_v2_arg_scope()):
outputs, end_points = alexnet.alexnet_v2(inputs)

@@alexnet_v2
"""

from future import absolute_import
from future import division
from future import print_function

import tensorflow as tf

slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)

def alexnet_v2_arg_scope(weight_decay=0.0005):
with slim.arg_scope([slim.conv2d, slim.fully_connected],
activation_fn=tf.nn.relu,
biases_initializer=tf.constant_initializer(0.1),
weights_regularizer=slim.l2_regularizer(weight_decay)):
with slim.arg_scope([slim.conv2d], padding='SAME'):
with slim.arg_scope([slim.max_pool2d], padding='VALID') as arg_sc:
return arg_sc

def alexnet_v2(inputs,
num_classes=1000,
is_training=True,
dropout_keep_prob=0.5,
spatial_squeeze=True,
scope='alexnet_v2',
global_pool=False):
"""AlexNet version 2.

Described in: http://arxiv.org/pdf/1404.5997v2.pdf
Parameters from:
github.com/akrizhevsky/cuda-convnet2/blob/master/layers/
layers-imagenet-1gpu.cfg

Note: All the fully_connected layers have been transformed to conv2d layers.
To use in classification mode, resize input to 224x224 or set
global_pool=True. To use in fully convolutional mode, set
spatial_squeeze to false.
The LRN layers have been removed and change the initializers from
random_normal_initializer to xavier_initializer.

Args:
inputs: a tensor of size [batch_size, height, width, channels].
num_classes: the number of predicted classes. If 0 or None, the logits layer
is omitted and the input features to the logits layer are returned instead.
is_training: whether or not the model is being trained.
dropout_keep_prob: the probability that activations are kept in the dropout
layers during training.
spatial_squeeze: whether or not should squeeze the spatial dimensions of the
logits. Useful to remove unnecessary dimensions for classification.
scope: Optional scope for the variables.
global_pool: Optional boolean flag. If True, the input to the classification
layer is avgpooled to size 1x1, for any input size. (This is not part
of the original AlexNet.)

Returns:
net: the output of the logits layer (if num_classes is a non-zero integer),
or the non-dropped-out input to the logits layer (if num_classes is 0
or None).
end_points: a dict of tensors with intermediate activations.
"""
with tf.variable_scope(scope, 'alexnet_v2', [inputs]) as sc:
end_points_collection = sc.original_name_scope + '_end_points'
# Collect outputs for conv2d, fully_connected and max_pool2d.
with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
outputs_collections=[end_points_collection]):
net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID',
scope='conv1')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool1')
net = slim.conv2d(net, 192, [5, 5], scope='conv2')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool2')
net = slim.conv2d(net, 384, [3, 3], scope='conv3')
net = slim.conv2d(net, 384, [3, 3], scope='conv4')
net = slim.conv2d(net, 256, [3, 3], scope='conv5')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool5')

  # Use conv2d instead of fully_connected layers.
  with slim.arg_scope([slim.conv2d],
                      weights_initializer=trunc_normal(0.005),
                      biases_initializer=tf.constant_initializer(0.1)):
    net = slim.conv2d(net, 4096, [5, 5], padding='VALID',
                      scope='fc6')
    net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                       scope='dropout6')
    net = slim.conv2d(net, 4096, [1, 1], scope='fc7')



    # ++ add by yourself++
    net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                       scope='dropout7')
    # ++ add  multi-task learing
    net0 = slim.conv2d(net,num_classes,[1,1],
                       activation_fn=None,
                       normalizer_fn=None,
                       biases_initializer=tf.zeros_initializer(),
                       scope='fc8_0')
    net1 = slim.conv2d(net, num_classes, [1, 1],
                       activation_fn=None,
                       normalizer_fn=None,
                       biases_initializer=tf.zeros_initializer(),
                       scope='fc8_1')
    net2 = slim.conv2d(net, num_classes, [1, 1],
                       activation_fn=None,
                       normalizer_fn=None,
                       biases_initializer=tf.zeros_initializer(),
                       scope='fc8_2')
    net3 = slim.conv2d(net, num_classes, [1, 1],
                       activation_fn=None,
                       normalizer_fn=None,
                       biases_initializer=tf.zeros_initializer(),
                       scope='fc8_3')

    # Convert end_points_collection into a end_point dict.
    end_points = slim.utils.convert_collection_to_dict(
        end_points_collection)
    if global_pool:
      net = tf.reduce_mean(net, [1, 2], keep_dims=True, name='global_pool')
      end_points['global_pool'] = net
    if num_classes:
      net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                         scope='dropout7')
      net = slim.conv2d(net, num_classes, [1, 1],
                        activation_fn=None,
                        normalizer_fn=None,
                        biases_initializer=tf.zeros_initializer(),
                        scope='fc8')
      if spatial_squeeze:
        net0 = tf.squeeze(net0, [1, 2], name='fc8_0/squeezed')
        end_points[sc.name + '/fc8_0'] = net0
        net1 = tf.squeeze(net1, [1, 2], name='fc8_1/squeezed')
        end_points[sc.name + '/fc8_1'] = net1
        net2 = tf.squeeze(net2, [1, 2], name='fc8_2/squeezed')
        end_points[sc.name + '/fc8_2'] = net2
        net3 = tf.squeeze(net3, [1, 2], name='fc8_3/squeezed')
        end_points[sc.name + '/fc8_3'] = net3
  return net0, net1,net2,net3,end_points

alexnet_v2.default_image_size = 224

接下来进行训练,详细讲解见代码:

[python] view plain copy
print ?
  1. # 验证码识别(训练)  
  2. import os  
  3. import tensorflow as tf  
  4. from PIL import Image  
  5. from nets import nets_factory  
  6. import numpy as np  
  7.   
  8. # 不同字符数量  
  9. CHAR_SET_LEN = 10  
  10. # 图片高度  
  11. IMAGE_HEIGHT = 60  
  12. # 图片宽度  
  13. IMAGE_WIDTH = 160  
  14. # 批次  
  15. BATCH_SIZE = 25  
  16. # tfrecord文件存放路径  
  17. TFRECORD_FILE = ’E:/SVN/Gavin/Learn/Python/pygame/captcha/train.tfrecords’  
  18.   
  19. # placeholder  
  20. x = tf.placeholder(tf.float32,[None,224,224])  
  21. y0 = tf.placeholder(tf.float32,[None])  
  22. y1 = tf.placeholder(tf.float32,[None])  
  23. y2 = tf.placeholder(tf.float32,[None])  
  24. y3 = tf.placeholder(tf.float32,[None])  
  25.   
  26. # 学习率  
  27. lr = tf.Variable(0.003,dtype=tf.float32)  
  28.   
  29. # 从tfrecord读出数据  
  30. def read_and_decode(filename):  
  31.     # 根据文件名生成一个队列  
  32.     filename_queue = tf.train.string_input_producer([filename])  
  33.     # create a reader from file queue  
  34.     reader = tf.TFRecordReader()  
  35.     # reader从文件队列中读入一个序列化的样本,返回文件名和文件  
  36.     , serialized_example = reader.read(filename_queue)  
  37.     # get feature from serialized example  
  38.     # 解析符号化的样本  
  39.     features = tf.parse_single_example(  
  40.         serialized_example,  
  41.         features={  
  42.             ’image’: tf.FixedLenFeature([], tf.string),  
  43.             ’label0’: tf.FixedLenFeature([], tf.int64),  
  44.             ’label1’: tf.FixedLenFeature([], tf.int64),  
  45.             ’label2’: tf.FixedLenFeature([], tf.int64),  
  46.             ’label3’: tf.FixedLenFeature([], tf.int64),  
  47.         }  
  48.     )  
  49.     img = features[’image’]  
  50.     # 获取图片数据  
  51.     image = tf.decode_raw(img, tf.uint8)  
  52.     # 没有经过预处理的灰度图  
  53.     image = tf.reshape(image, [224,224])  
  54.     # 图片预处理  
  55.     image = tf.cast(image, tf.float32) /255.0  
  56.     image = tf.subtract(image,0.5)  
  57.     image = tf.multiply(image,2.0)  
  58.     # 获取label  
  59.     label0 = tf.cast(features[’label0’], tf.int32)  
  60.     label1 = tf.cast(features[’label1’], tf.int32)  
  61.     label2 = tf.cast(features[’label2’], tf.int32)  
  62.     label3 = tf.cast(features[’label3’], tf.int32)  
  63.     return image, label0,label1,label2,label3  
  64.   
  65.   
  66. # 获取图片数据和标签  
  67. image, label0, label1, label2, label3 = read_and_decode(TFRECORD_FILE)  
  68. print(image,label0,label1,label2, label3)  
  69. # 使用shuffle_batch可以随机打乱输入 next_batch挨着往下取  
  70. # shuffle_batch才能实现[img,label]的同步,也即特征和label的同步,不然可能输入的特征和label不匹配  
  71. # 比如只有这样使用,才能使img和label一一对应,每次提取一个image和对应的label  
  72. # shuffle_batch返回的值就是RandomShuffleQueue.dequeue_many()的结果  
  73. # Shuffle_batch构建了一个RandomShuffleQueue,并不断地把单个的[img,label],送入队列中  
  74. img_batch, label_batch0,label_batch1,label_batch2,label_batch3 = tf.train.shuffle_batch(  
  75.                                          [image, label0,label1,label2,label3],  
  76.                                         batch_size=BATCH_SIZE, capacity=5000,  
  77.                                         min_after_dequeue=1000,num_threads=1)  
  78. # 定义网络结构  
  79. train_network_fn = nets_factory.get_network_fn(  
  80.     ’alexnet_v2’,  
  81.     num_classes=CHAR_SET_LEN,  
  82.     weight_decay=0.0005,  
  83.     is_training=True  
  84. )  
  85.   
  86. with tf.Session() as sess:  
  87.     X = tf.reshape(x,[BATCH_SIZE,224,224,1])  
  88.     # 数据输入网络得到输出值  
  89.     logits0,logits1,logits2,logits3,end_points = train_network_fn(X)  
  90.     # 把标签转为one_hot的形式  
  91.     one_hot_labels0 = tf.one_hot(indices=tf.cast(y0,tf.int32),depth=CHAR_SET_LEN)  
  92.     one_hot_labels1 = tf.one_hot(indices=tf.cast(y1, tf.int32), depth=CHAR_SET_LEN)  
  93.     one_hot_labels2 = tf.one_hot(indices=tf.cast(y2, tf.int32), depth=CHAR_SET_LEN)  
  94.     one_hot_labels3 = tf.one_hot(indices=tf.cast(y3, tf.int32), depth=CHAR_SET_LEN)  
  95.   
  96.     # 计算Loss  
  97.     loss0 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits0,  
  98.                                                                    labels=one_hot_labels0))  
  99.     loss1 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits1,  
  100.                                                                    labels=one_hot_labels1))  
  101.     loss2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits2,  
  102.                                                                    labels=one_hot_labels2))  
  103.     loss3 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits3,  
  104.                                                                    labels=one_hot_labels3))  
  105.   
  106.     # 计算总的loss  
  107.     total_loss = (loss0 + loss1 +loss2 + loss3) / 4.0  
  108.     # 优化total_loss  
  109.     optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(total_loss)  
  110.     # 计算准确率  
  111.     correct_prediction0 = tf.equal(tf.argmax(one_hot_labels0,1),tf.argmax(logits0,1))  
  112.     accuracy0 = tf.reduce_mean(tf.cast(correct_prediction0,tf.float32))  
  113.   
  114.     correct_prediction1 = tf.equal(tf.argmax(one_hot_labels1, 1), tf.argmax(logits1, 1))  
  115.     accuracy1 = tf.reduce_mean(tf.cast(correct_prediction1, tf.float32))  
  116.   
  117.     correct_prediction2 = tf.equal(tf.argmax(one_hot_labels2, 1), tf.argmax(logits2, 1))  
  118.     accuracy2 = tf.reduce_mean(tf.cast(correct_prediction2, tf.float32))  
  119.   
  120.     correct_prediction3 = tf.equal(tf.argmax(one_hot_labels3, 1), tf.argmax(logits3, 1))  
  121.     accuracy3 = tf.reduce_mean(tf.cast(correct_prediction3, tf.float32))  
  122.   
  123.     # 用于保存模型  
  124.     saver = tf.train.Saver()  
  125.   
  126.     # 初始化  
  127.     sess.run(tf.global_variables_initializer())  
  128.     # 创建一个协调器,管理线程  
  129.     coord = tf.train.Coordinator()  
  130.     # 启动队列  
  131.     threads = tf.train.start_queue_runners(sess=sess,coord=coord)  
  132.     for i in range(6001):  
  133.         # 获取一个批次的数据和标签  
  134.         b_image, b_label0,b_label1,b_label2,b_label3 = sess.run([img_batch,  
  135.                                                                  label_batch0, label_batch1, label_batch2, label_batch3])  
  136.         # 优化模型  
  137.         sess.run(optimizer,feed_dict={x:b_image,y0:b_label0,y1:b_label1,y2:b_label2,y3:b_label3})  
  138.         # 每迭代20次,计算一次loss和准确率  
  139.         if i % 20 == 0:  
  140.             # 每迭代2000次,降低一次学习率  
  141.             if i % 2000 == 0:  
  142.                 sess.run(tf.assign(lr,lr/3))  
  143.             acc0,acc1,acc2,acc3,loss = sess.run([accuracy0,accuracy1,accuracy2,accuracy3,total_loss],  
  144.                                                  feed_dict={x:b_image,y0:b_label0,y1:b_label1,y2:b_label2,y3:b_label3})  
  145.   
  146.             learning_rate = sess.run(lr)  
  147.             print(“Iter:%d Loss:%.3f Accuracy: %.2f,%.2f,%.2f,%.2f Learning_rate:%.4f”  
  148.                   % (i,loss_,acc0,acc1,acc2,acc3,learning_rate) )  
  149.             # 保存模型  
  150.             if i == 6000:  
  151.                 saver.save(sess,’./captcha/crack_captcha.model’,global_step=i)  
  152.                 break  
  153.         # 通知其他线程关闭  
  154.         coord.request_stop()  
  155.         # 其他所有线程关闭之后,这一函数才能返回  
  156.         coord.join(threads)  
# 验证码识别(训练) 
import os
import tensorflow as tf
from PIL import Image
from nets import nets_factory
import numpy as np

不同字符数量

CHAR_SET_LEN = 10

图片高度

IMAGE_HEIGHT = 60

图片宽度

IMAGE_WIDTH = 160

批次

BATCH_SIZE = 25

tfrecord文件存放路径

TFRECORD_FILE = 'E:/SVN/Gavin/Learn/Python/pygame/captcha/train.tfrecords'

placeholder

x = tf.placeholder(tf.float32,[None,224,224])
y0 = tf.placeholder(tf.float32,[None])
y1 = tf.placeholder(tf.float32,[None])
y2 = tf.placeholder(tf.float32,[None])
y3 = tf.placeholder(tf.float32,[None])

学习率

lr = tf.Variable(0.003,dtype=tf.float32)

从tfrecord读出数据

def read_and_decode(filename):
# 根据文件名生成一个队列
filename_queue = tf.train.string_input_producer([filename])
# create a reader from file queue
reader = tf.TFRecordReader()
# reader从文件队列中读入一个序列化的样本,返回文件名和文件
_, serialized_example = reader.read(filename_queue)
# get feature from serialized example
# 解析符号化的样本
features = tf.parse_single_example(
serialized_example,
features={
'image': tf.FixedLenFeature([], tf.string),
'label0': tf.FixedLenFeature([], tf.int64),
'label1': tf.FixedLenFeature([], tf.int64),
'label2': tf.FixedLenFeature([], tf.int64),
'label3': tf.FixedLenFeature([], tf.int64),
}
)
img = features['image']
# 获取图片数据
image = tf.decode_raw(img, tf.uint8)
# 没有经过预处理的灰度图
image = tf.reshape(image, [224,224])
# 图片预处理
image = tf.cast(image, tf.float32) /255.0
image = tf.subtract(image,0.5)
image = tf.multiply(image,2.0)
# 获取label
label0 = tf.cast(features['label0'], tf.int32)
label1 = tf.cast(features['label1'], tf.int32)
label2 = tf.cast(features['label2'], tf.int32)
label3 = tf.cast(features['label3'], tf.int32)
return image, label0,label1,label2,label3

获取图片数据和标签

image, label0, label1, label2, label3 = read_and_decode(TFRECORD_FILE)
print(image,label0,label1,label2, label3)

使用shuffle_batch可以随机打乱输入 next_batch挨着往下取

shuffle_batch才能实现[img,label]的同步,也即特征和label的同步,不然可能输入的特征和label不匹配

比如只有这样使用,才能使img和label一一对应,每次提取一个image和对应的label

shuffle_batch返回的值就是RandomShuffleQueue.dequeue_many()的结果

Shuffle_batch构建了一个RandomShuffleQueue,并不断地把单个的[img,label],送入队列中

img_batch, label_batch0,label_batch1,label_batch2,label_batch3 = tf.train.shuffle_batch(
[image, label0,label1,label2,label3],
batch_size=BATCH_SIZE, capacity=5000,
min_after_dequeue=1000,num_threads=1)

定义网络结构

train_network_fn = nets_factory.get_network_fn(
'alexnet_v2',
num_classes=CHAR_SET_LEN,
weight_decay=0.0005,
is_training=True
)

with tf.Session() as sess:
X = tf.reshape(x,[BATCH_SIZE,224,224,1])
# 数据输入网络得到输出值
logits0,logits1,logits2,logits3,end_points = train_network_fn(X)
# 把标签转为one_hot的形式
one_hot_labels0 = tf.one_hot(indices=tf.cast(y0,tf.int32),depth=CHAR_SET_LEN)
one_hot_labels1 = tf.one_hot(indices=tf.cast(y1, tf.int32), depth=CHAR_SET_LEN)
one_hot_labels2 = tf.one_hot(indices=tf.cast(y2, tf.int32), depth=CHAR_SET_LEN)
one_hot_labels3 = tf.one_hot(indices=tf.cast(y3, tf.int32), depth=CHAR_SET_LEN)

# 计算Loss
loss0 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits0,
                                                               labels=one_hot_labels0))
loss1 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits1,
                                                               labels=one_hot_labels1))
loss2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits2,
                                                               labels=one_hot_labels2))
loss3 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits3,
                                                               labels=one_hot_labels3))

# 计算总的loss
total_loss = (loss0 + loss1 +loss2 + loss3) / 4.0
# 优化total_loss
optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(total_loss)
# 计算准确率
correct_prediction0 = tf.equal(tf.argmax(one_hot_labels0,1),tf.argmax(logits0,1))
accuracy0 = tf.reduce_mean(tf.cast(correct_prediction0,tf.float32))

correct_prediction1 = tf.equal(tf.argmax(one_hot_labels1, 1), tf.argmax(logits1, 1))
accuracy1 = tf.reduce_mean(tf.cast(correct_prediction1, tf.float32))

correct_prediction2 = tf.equal(tf.argmax(one_hot_labels2, 1), tf.argmax(logits2, 1))
accuracy2 = tf.reduce_mean(tf.cast(correct_prediction2, tf.float32))

correct_prediction3 = tf.equal(tf.argmax(one_hot_labels3, 1), tf.argmax(logits3, 1))
accuracy3 = tf.reduce_mean(tf.cast(correct_prediction3, tf.float32))

# 用于保存模型
saver = tf.train.Saver()

# 初始化
sess.run(tf.global_variables_initializer())
# 创建一个协调器,管理线程
coord = tf.train.Coordinator()
# 启动队列
threads = tf.train.start_queue_runners(sess=sess,coord=coord)
for i in range(6001):
    # 获取一个批次的数据和标签
    b_image, b_label0,b_label1,b_label2,b_label3 = sess.run([img_batch,
                                                             label_batch0, label_batch1, label_batch2, label_batch3])
    # 优化模型
    sess.run(optimizer,feed_dict={x:b_image,y0:b_label0,y1:b_label1,y2:b_label2,y3:b_label3})
    # 每迭代20次,计算一次loss和准确率
    if i % 20 == 0:
        # 每迭代2000次,降低一次学习率
        if i % 2000 == 0:
            sess.run(tf.assign(lr,lr/3))
        acc0,acc1,acc2,acc3,loss_ = sess.run([accuracy0,accuracy1,accuracy2,accuracy3,total_loss],
                                             feed_dict={x:b_image,y0:b_label0,y1:b_label1,y2:b_label2,y3:b_label3})

        learning_rate = sess.run(lr)
        print("Iter:%d Loss:%.3f Accuracy: %.2f,%.2f,%.2f,%.2f Learning_rate:%.4f"
              % (i,loss_,acc0,acc1,acc2,acc3,learning_rate) )
        # 保存模型
        if i == 6000:
            saver.save(sess,'./captcha/crack_captcha.model',global_step=i)
            break
    # 通知其他线程关闭
    coord.request_stop()
    # 其他所有线程关闭之后,这一函数才能返回
    coord.join(threads)


经过漫长的训练后,结果如下:


  1. Tensor(“Mul:0”, shape=(224, 224), dtype=float32) Tensor(“Cast_1:0”, shape=(), dtype=int32) Tensor(“Cast_2:0”, shape=(), dtype=int32) Tensor(“Cast_3:0”, shape=(), dtype=int32) Tensor(“Cast_4:0”, shape=(), dtype=int32)  
  2. 2018-03-12 14:28:06.847545: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2  
  3. Iter:0 Loss:1584.626 Accuracy: 0.22,0.20,0.16,0.16 Learning_rate:0.0010  
  4. Iter:20 Loss:2.297 Accuracy: 0.10,0.12,0.12,0.14 Learning_rate:0.0010  
  5. Iter:40 Loss:2.288 Accuracy: 0.18,0.12,0.16,0.08 Learning_rate:0.0010  
  6. Iter:60 Loss:2.301 Accuracy: 0.12,0.14,0.06,0.14 Learning_rate:0.0010  
  7. Iter:80 Loss:2.299 Accuracy: 0.10,0.18,0.08,0.18 Learning_rate:0.0010  
  8.   
  9. ……….  
Tensor("Mul:0", shape=(224, 224), dtype=float32) Tensor("Cast_1:0", shape=(), dtype=int32) Tensor("Cast_2:0", shape=(), dtype=int32) Tensor("Cast_3:0", shape=(), dtype=int32) Tensor("Cast_4:0", shape=(), dtype=int32) 
2018-03-12 14:28:06.847545: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
Iter:0 Loss:1584.626 Accuracy: 0.22,0.20,0.16,0.16 Learning_rate:0.0010
Iter:20 Loss:2.297 Accuracy: 0.10,0.12,0.12,0.14 Learning_rate:0.0010
Iter:40 Loss:2.288 Accuracy: 0.18,0.12,0.16,0.08 Learning_rate:0.0010
Iter:60 Loss:2.301 Accuracy: 0.12,0.14,0.06,0.14 Learning_rate:0.0010
Iter:80 Loss:2.299 Accuracy: 0.10,0.18,0.08,0.18 Learning_rate:0.0010

..........




可能有人会问,如何训练带字母字符的验证码呢?其实很简单,A-Z,一共26个字母,我们可以映射为11~35这26个数字,A:10,B:11,,,Z :35,那么,这种数字+字母的组合一共有10+26=36个字符,同样采用one-hot编码,label是一个36维的向量,只有1个值为1,其余为0,A:000000000010000…..000;其他都同上述方法二。

最后,我们可以在测试集上进行测试,稍后下一篇继续….







  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值