6.1 图像识别问题简介及经典数据集
CIFAR 数据集就是一个影响力很大的图像分类数据集。CIFAR数据集分为了CIFAR-10 和 CIFAR-100 两个问题,它们都是图像词典项目( Visual Dictionary ) 中 800 万张图片的一个子集。CIFAR数据集中的图片为32×32的彩色图片。
CIFAR-10 问题收集了来自 10 个不同种类的 60000 张图片。每张图片大小固定且仅含一个种类的实体。与MNIST相比,最大区别是图片由黑白变成彩色,且分类难度更高。
无论是 MNIST 数据集还是 CIFAR 数据集,相比真实环境下的图像识别问题,有 2 个最大的问题。第一,现实生活中的图片分辨率要远高于 32 × 32,而且图像的分辨率也不会是固定的。第二,现实生活中的物体类别很多,无论是 10 种还是 100 种都远远不够,而且一张图片中不会只出现一个种类的物体。
ImageNet很大程度上解决了这两个问题,更加贴近真实环境下的图像识别问题。
ImageNet是一个基于WordNet的大型图像数据库。有将近1500万图片被关联到了WordNet的大约2000个名词同义词集上。每一个与ImageNet相关的WordNet同义词集都代表了现实世界中的一个实体,可以被认为是分类问题中的一个类别。一张图片中可能有多个同义词集所代表的实体。
ILSVRC2012图像分类数据集是ImageNet的子集,包含了来自1000个类别的120万张图片,其中每张图片只属于一个类别。图片是直接从网上爬取的,所以图片的大小从几千字节到几百万字节不等。
top-N正确率是指图像识别算法给出前N个答案中有一个是正确的概率。在图像分类问题上,很多学术论文都将前N个答案的正确率作为比较的方法,其中N的取值一般为3或5。
6.2 卷积神经网络简介
在全连接神经网络中,每相邻两层之间的节点都有边相连,于是一般会将每一层全连接层中的节点组织成一列,这样方便显示连接结构。而对于卷积神经网络,相邻两层之间只有部分节点相连,一般会将每一层卷积层的节点组织成一个三维矩阵。虽然直观上差异很大,实际上整体架构非常相似,而且输入输出、训练流程也基本一致。二者唯一的区别在于神经网络中相邻两层的连接方式。
使用全连接神经网络处理图像的最大问题在于全连接层的参数太多,参数增多除了导致计算速度减慢,还很容易导致过拟合问题。卷积神经网络的目的就是为了减少参数个数。
卷积神经网络中前几层中每个节点只和上一层中的部分节点相连。
卷积神经网络的五种结构:
1.输入层:一张图片的像素矩阵,长×宽×深度(色道)
2.卷积层:卷积层中每个节点的输入只是上一层神经网络的一小块,这个小块的常用大小有3×3或5×5。卷积层试图将神经网络中的每一小块进行更深入地分析从而得到抽象程度更高的特征。一般来说,通过卷积层处理过的节点矩阵深度会增加。
3.池化层:不改变三维矩阵的深度,但可以缩小矩阵的大小。池化操作可以认为是将一张分辨率较高的图片转化为分辨率较低的图片。通过池化层,可以进一步缩小最后全连接层中节点的个数,从而达到减少整个神经网络中参数的目的。
4.全连接层:经过几轮卷积层和池化层的处理之后,可以认为图像中的信息已经被抽象成了信息含量更高的特征。在卷积层和池化层完成自动图像特征提取之后,仍然需要全连接层完成分类任务。
5.Softmax层:用于分类问题。通过Softmax层,可以得到当前样例属于不同种类的概率分布情况。
6.3 卷积神经网络常用结构
卷积层神经网络结构中最重要的部分是过滤器(filter)或者内核(kernel),过滤器可以把当前层神经网络上的一个子节点矩阵转化为下一层神经网络上的一个单位节点矩阵,即长宽为1,深度不限的节点矩阵。
过滤器所处理的节点矩阵的长和宽都是人工指定的,这个节点矩阵的尺寸也被称为过滤器的尺寸。常用尺寸有3×3或5×5。因为过滤器处理的矩阵深度和当前层神经网络节点矩阵的深度是一致的,所以虽然节点矩阵是三维的,但过滤器的尺寸只需指定两个维度。
过滤器中另外一个需要人工指定的设置是处理得到的单位节点矩阵的深度,称为过滤器的深度。
(局部)过滤器的前向传播过程就是通过左侧小矩阵中的节点计算出右侧单位节点矩阵中节点的过程。与全连接层类似,也是权重和偏置项。如图6-8
(整体)卷积层的前向传播过程就是通过将一个过滤器从神经网络当前层的左上角移动到右下角,并且在移动中计算每一个对应的单位矩阵得到的。如图6-10
过滤器每移动一次,可以计算出一个值(当深度为 k 时会计算出 k 个值),将这些数值拼接成一个新的矩阵,就完成了卷积层前向传播的过程。
当过滤器的大小不为 1×1 时,卷积层前向传播得到的矩阵的尺寸要小于当前层矩阵的尺寸。
为了避免尺寸的变化,可以在当前层矩阵的边界上加入全0填充。如图6-11
除了使用全0填充,还可以通过设置过滤器移动的步长来调整结果矩阵的大小。图6-12显示了当移动步长为2且使用全0填充时,卷积层前向传播的过程。
输出层大小的确定:
使用全0填充时,向上取整
out_length =in_length / stride_length
out_width = in_width / stride_width
不使用全0填充时,向上取整
out_length = (in_length - filter_length +1) / stride_length
out_width = (in_width - filter_width + 1) / stride_width
卷积神经网络有一个非常重要的性质就是每一个卷积层中使用的过滤器中的参数相同,这可以使得图像上的内容不受位置的影响。以mnist手写体数字识别为例,无论数字“1”出现在左上角还是右下角,图片的种类都是不变的。而且,共享每个卷积层中过滤器的参数可以巨幅减少神经网络上的参数。以CIFAR-10问题为例,输入层矩阵的维度为32×32×3,假设卷积层使用的过滤器尺寸为5×5,深度为16,那么这个卷积层的参数个数为5*5*3*16+16=1216个(可以想象为输入层为5*5*3、输出层为16*1的全连接层)。而且,参数个数只与过滤器的尺寸、深度以及当前层节点矩阵的深度有关,而与图片大小无关,这使得卷积神经网络可以很好地扩展到更大的图像数据上。
通过tensorflow实现卷积层的前向传播,
1 x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name='x-input') 2 # shape分别为过滤器尺寸、当前层深度、过滤器深度 3 filter_weight = tf.get_variable( 4 'weights', shape=[5, 5, 3, 16], initializer=tf.truncated_normal_initializer(stddev=0.1) 5 ) 6 biases = tf.get_variable( 7 'biases', shape=[16], initializer=tf.constant_initializer(0.1) # shape为过滤器深度 8 ) 9 # 第一个输入为当前层的节点矩阵,该矩阵为四维矩阵,第一个维度对应一个输入batch, 后三个维度对应一个节点矩阵(长*宽*深) 10 # 第二个输入为卷积层的权重,也就是过滤器 11 # 第三个输入为不同维度上的步长,长度为4的数组,要求第一维度和第四维度一定为1, 因为卷积层的步长只对矩阵的长和宽有效 12 # 第四个输入为padding, 取值可以为SAME或VALID 13 conv = tf.nn.conv2d( 14 x, filter_weight, strides=[1, 1, 1, 1], padding='SAME' 15 ) 16 # print(conv.shape) # (?, 32, 32, 16) # 深度变成16, 根据公式,使用全0填充时为32, 不使用时为28 17 18 # 不能直接使用加法,因为矩阵上不同位置的节点都需要加上同样的偏置项。 19 # 例如图6-13所示, 虽然下一层神经网络的大小为 2×2, 但是偏置项只有一个数(因为深度为1), 而2×2矩阵中的每一个值都需要加上这个偏置项。 20 bias = tf.nn.bias_add(conv, biases) 21 actived_conv = tf.nn.relu(bias) 22 23 # 注意区分输入的四个维度、权重的四个维度、步长的四个维度。
6.3.2 池化层
池化层主要用于减小矩阵的尺寸,从而减少最后全连接层中的参数。使用池化层既可以加快计算速度也有防止过拟合问题的作用。
池化层的前向传播过程也是通过移动一个类似过滤器的结构完成的。但池化层过滤器中的计算不是节点的加权和,而是采用更简单的最大值或平均值运算。使用最大值操作的池化层称为最大池化层,使用平均值操作的池化层称为平均池化层。
与卷积层的过滤器类似,池化层的过滤器也需要人工设定过滤器的尺寸、是否使用全0填充以及过滤器移动的步长等。卷积层和池化层中过滤器的移动方式是相似的,唯一的区别在于卷积层使用的过滤器是横跨整个深度的,而池化层使用的过滤器只影响一个深度上的节点。所以池化层的过滤器除了在长和宽上移动,还需要在深度上移动。
卷积层,深度为3,3个相加
池化层,深度为2,分别处理
通过tensorflow实现池化层的前向传播,
1 # 第一个输入为当前层节点矩阵(四维矩阵) 2 # 第二个为过滤器尺寸,长度为4的数组,第一维度和第四维度必须为1,这意味着过滤器不可跨不同输入样例和节点矩阵深度,使用最多的是[1,2,2,1]或[1,3,3,1] # 与卷积层不同 3 # 第三个输入为步长,长度为4的数组,第一维度和第四维度必须为1,这意味着池化层不能减少节点矩阵的深度或输入样例的个数。 4 # 第四个输入为padding 5 pool = tf.nn.max_pool(actived_conv, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
卷积层和池化层的最大不同在于过滤器:[5, 5, 3, 16] [1, 3, 3, 1]
6.4 经典卷积网络模型
6.4.1 LeNet-5模型
第一个成功应用于数字识别问题的卷积神经网络。
LetNet-5模型接受的输入层大小为三维矩阵(长×宽×深)。
参数个数远远小于连接个数,但卷积层的连接个数??没搞懂为啥还要加1。
只有全连接层的权重需要加入正则化。
relu和dropout不在最后一层使用。
1 # mnist_inference.py 2 3 import tensorflow as tf 4 5 IMAGE_SIZE = 28 6 NUM_CHANNELS = 1 # 黑白 7 NUM_LABELS = 10 8 9 # 第一层卷积层的尺寸和深度 10 CONV1_SIZE = 5 11 CONV1_DEEP = 32 12 # 第二层卷积层的尺寸和深度 13 CONV2_SIZE = 5 14 CONV2_DEEP = 64 15 # 全连接层的节点个数 16 FC_SIZE = 512 17 18 def get_weight_variable(shape, regularizer): 19 weights = tf.get_variable('weight', shape, initializer=tf.truncated_normal_initializer(stddev=0.1)) 20 if regularizer: 21 tf.add_to_collection('losses', regularizer(weights)) 22 return weights 23 24 def inference(input_tensor, train, regularizer): 25 with tf.variable_scope('layer1-conv1'): 26 # 输入层为28×28×1,尺寸为5×5,深度为32,步长为1,输出层为28×28×32 27 conv1_weights = get_weight_variable([CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP], None) 28 conv1_biases = tf.get_variable('bias', [CONV1_DEEP], initializer=tf.constant_initializer(0.0)) 29 conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME') 30 relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases)) 31 32 with tf.name_scope('layer2-pool1'): 33 # 输出层为14*14*32 34 pool1 = tf.nn.max_pool(relu1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') 35 36 with tf.variable_scope('layer3-conv2'): 37 # 尺寸为5*5,深度为64,输出层为14*14*64 38 conv2_weights = get_weight_variable([CONV2_SIZE, CONV2_SIZE, CONV1_DEEP, CONV2_DEEP], None) 39 conv2_biases = tf.get_variable('bias', [CONV2_DEEP], initializer=tf.constant_initializer(0.0)) 40 conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1, 1, 1, 1], padding='SAME') 41 relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases)) 42 43 with tf.name_scope('layer4-pool2'): 44 # 输出层为7*7*64 45 pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') 46 47 # 全连接层的输入格式为特征向量,这就需要将三维矩阵拉直成一维向量。 48 pool_shape = pool2.get_shape().as_list() # 包含一个batch 49 nodes = pool_shape[1] * pool_shape[2] * pool_shape[3] # 3136 50 reshaped = tf.reshape(pool2, [pool_shape[0], nodes]) 51 52 # dropout在训练时会随机将部分节点的输出改为0。dropout方法可以进一步提升模型可靠性并防止过拟合,dropout过程只在训练时使用。 53 with tf.variable_scope('layer5-fc1'): 54 # 只有全连接层的权重需要加入正则化 55 fc1_weights = get_weight_variable([nodes, FC_SIZE], regularizer) 56 fc1_biases = tf.get_variable('bias', shape=[FC_SIZE], initializer=tf.constant_initializer(0.1)) 57 fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases) 58 if train: 59 fc1 = tf.nn.dropout(fc1, 0.5) 60 61 with tf.variable_scope('layer6-fc2'): 62 fc2_weights = get_weight_variable([FC_SIZE, NUM_LABELS], regularizer) 63 fc2_biases = tf.get_variable('bias', shape=[NUM_LABELS], initializer=tf.constant_initializer(0.1)) 64 logit = tf.matmul(fc1, fc2_weights) + fc2_biases 65 66 # relu和dropout不在最后一层使用。 后面会使用sparse_softmax_cross_entropy_with_logits计算交叉熵。 67 return logit 68 69 72 # mnist_train.py 73 74 #!coding:utf8 75 import tensorflow as tf 76 from tensorflow.examples.tutorials.mnist import input_data 77 import mnist_inference 78 import os 79 import numpy as np 80 81 BATCH_SIZE = 100 82 83 LEARNING_RATE_BASE = 0.8 84 LEARNING_RATE_DECAY = 0.99 85 REGULARIZATION_RATE = 0.0001 # lambda 86 TRAINING_STEPS = 30000 87 MOVING_AVERAGE_DACAY = 0.99 88 89 MODEL_SAVE_PATH = '/home/yangxl/files/save_model' 90 MODEL_NAME = 'conv2d.ckpt' 91 92 93 def train(mnist): 94 # 因为从池化层到全连接层要进行reshape,所以不能为shape[0]不能为None。 95 x = tf.placeholder(tf.float32, [BATCH_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS], 'x-input') 96 y_ = tf.placeholder(tf.float32, [BATCH_SIZE, mnist_inference.NUM_LABELS], 'y-input') 97 98 # 正则化 99 from tensorflow.contrib.layers import l2_regularizer 100 regularizer = l2_regularizer(REGULARIZATION_RATE) 101 102 y = mnist_inference.inference(x, True, regularizer) 103 104 global_step = tf.Variable(0, trainable=False) 105 106 # 滑动平均 107 variables_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DACAY, global_step) 108 variables_averages_op = variables_averages.apply(tf.trainable_variables()) 109 # 互斥分类问题; 110 # 因为标准答案是一个长度为10的一维数组,而该函数需要提供的是一个正确答案的数字,所以需要使用tf.argmax 函数来得到正确答案对应的类别编号。 111 cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1)) 112 cross_entropy_mean = tf.reduce_mean(cross_entropy) 113 loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses')) 114 115 learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, mnist.train.num_examples / BATCH_SIZE, 116 LEARNING_RATE_DECAY, staircase=True) 117 train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step) 118 with tf.control_dependencies([train_step, variables_averages_op]): 119 train_op = tf.no_op(name='train') 120 121 saver = tf.train.Saver() 122 123 with tf.Session() as sess: 124 tf.global_variables_initializer().run() 125 126 for i in range(TRAINING_STEPS): 127 xs, ys = mnist.train.next_batch(BATCH_SIZE) # xs.shape=(100, 784) 128 reshaped_xs = np.reshape(xs, [BATCH_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS]) 129 _, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: reshaped_xs, y_: ys}) 130 131 if i % 1000 == 0: 132 print('after %d training steps, loss on training batch is %g ' % (i, loss_value)) 133 saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step) 134 135 136 def main(argv=None): 137 mnist = input_data.read_data_sets('/home/yangxl/files/mnist', one_hot=True) 138 import time 139 # print('start...', int(time.time())) 140 train(mnist) 141 # print(int(time.time())) 142 143 144 if __name__ == '__main__': 145 tf.app.run() 146 147 150 # mnist_eval.py 151 152 #!coding:utf8 153 import tensorflow as tf 154 from tensorflow.examples.tutorials.mnist import input_data 155 import mnist_inference 156 import mnist_train 157 import time 158 import numpy as np 159 160 # 每10秒加载一次最新模型,并在测试数据上测试最新模型的正确率。 161 EVAL_INTERVAL_SECS = 60 162 163 def evaluate(mnist): 164 x = tf.placeholder(tf.float32, [mnist.test.num_examples, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS], 'x-input') 165 y_ = tf.placeholder(tf.float32, [mnist.test.num_examples, mnist_inference.NUM_LABELS], 'y-input') 166 167 y = mnist_inference.inference(x, False, None) 168 169 correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) 170 accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 171 172 # 滑动平均 173 variables_averages = tf.train.ExponentialMovingAverage(mnist_train.MOVING_AVERAGE_DACAY) 174 variables_to_restore = variables_averages.variables_to_restore() 175 176 saver = tf.train.Saver(variables_to_restore) # 训练时需要保存滑动平均模型,验证时才能加载到。 177 178 while True: 179 with tf.Session() as sess: 180 reshape_x = np.reshape(mnist.test.images, [-1, 28, 28, 1]) 181 validate_feed = {x: reshape_x, y_: mnist.test.labels} 182 183 # 通过checkpoint文件自动找到目录中最新模型的文件名 184 ckpt = tf.train.get_checkpoint_state(mnist_train.MODEL_SAVE_PATH) 185 if ckpt and ckpt.model_checkpoint_path: 186 saver.restore(sess, ckpt.model_checkpoint_path) 187 188 global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1] 189 accuracy_score = sess.run(accuracy, feed_dict=validate_feed) 190 print('after %s training steps, validation accuracy = %g ' % (global_step, accuracy_score)) 191 else: 192 print('No checkpoint file found') 193 return 194 time.sleep(EVAL_INTERVAL_SECS) 195 196 197 def main(argv=None): 198 mnist = input_data.read_data_sets('/home/yangxl/files/mnist', one_hot=True) 199 print('start...') 200 evaluate(mnist) 201 202 203 if __name__ == '__main__': 204 tf.app.run()
代码么问题,损失函数不固定,准确率大约为0.117,应该大约为99.4%才对啊。
一种卷积神经网络架构不能解决所有问题。比如,LeNet-5模型就无法很好地处理类似ImageNet这样比较大的图像数据集。
以下正则表达式公式总结了一些经典的用于图片分类问题的卷积神经网络架构:输入层 --> (卷积层+ --> 池化层?)+ --> 全连接层+
大部分卷积神经网络中一般最多连续使用三层卷积层。
在多轮卷积层和池化层之后,卷积神经网络在输出之前一般会经过1~2个全连接层。
在过滤器的深度上,大部分卷积神经网络都采用逐层递增的方式。
6.4.2 Inception-v3模型
LeNet-5模型中,不同卷积层通过串联的方式连接在一起,而Inception-v3模型中,inception结构是将不同卷积层通过并联的方式结合在一起。
在6.4.1中提到了一个卷积层可以使用边长为1、3或5的过滤器,那么如何在这些边长中选择呢?inception模型给出了一个方案,那就是同时使用所有不同尺寸的过滤器,然后再将得到的矩阵拼接起来。
虽然过滤器的尺寸不同,但如果所有的过滤器都使用全0填充并且步长为1,那么前向传播得到的结果矩阵的长和宽都与输入矩阵一致。这样经过不同过滤器处理的结果矩阵可以拼接成一个更深的矩阵。
Inception-v3 模型总共有46 层(图中方框里的层数),由 11 个(图中方框) Inception 模块组成。在 Inception-v3 模型中有 96 个卷积层。
inception-v3模型的代码和slim库,
1 import tensorflow as tf 2 import tensorflow.contrib.slim as slim 3 4 trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev) 5 6 def inception_v3_base(inputs, 7 final_endpoint='Mixed_7c', 8 min_depth=16, 9 depth_multiplier=1.0, 10 scope=None): 11 end_points = {} 12 13 if depth_multiplier <= 0: 14 raise ValueError('depth_multiplier is not greater than zero.') 15 depth = lambda d: max(int(d * depth_multiplier), min_depth) 16 17 with tf.variable_scope(scope, 'InceptionV3', [inputs]): 18 # arg_scope用于设置默认的参数取值 19 with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], 20 stride=1, 21 padding='VALID'): 22 # 299 × 299 × 3 23 end_point = 'Conv2d_1a_3x3' # 字母数字下划线,乘号用x代替 24 # 不使用全0填充 25 net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point) 26 end_points[end_point] = net 27 if end_point == final_endpoint: 28 return net, end_points 29 # 149 × 149 × 32 30 end_point = 'Conv2d_2a_3x3' 31 # 不使用全0填充,步长为1 32 net = slim.conv2d(net, depth(32), [3, 3], scope=end_point) 33 end_points[end_point] = net 34 if end_point == final_endpoint: 35 return net, end_points 36 # 147 × 147 × 32 37 end_point = 'Conv2d_2b_3x3' 38 net = slim.conv2d(net, depth(64), [3, 3], padding='SAME', scope=end_point) 39 end_points[end_point] = net 40 if end_point == final_endpoint: 41 return net, end_points 42 # 147 × 147 × 64 43 end_point = 'MaxPool_3a_3x3' 44 net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point) 45 end_points[end_point] = net 46 if end_point == final_endpoint: 47 return net, end_points 48 # 73 × 73 × 64 49 end_point = 'Conv2d_3b_1x1' 50 net = slim.conv2d(net, depth(80), [1, 1], scope=end_point) 51 end_points[end_point] = net 52 if end_point == final_endpoint: 53 return net, end_points 54 # 73 × 73 × 80 55 end_point = 'Conv2d_4a_3x3' 56 net = slim.conv2d(net, depth(192), [3, 3], scope=end_point) 57 end_points[end_point] = net 58 if end_point == final_endpoint: 59 return net, end_points 60 # 71 × 71 × 192 61 end_point = 'MaxPool_5a_3x3' 62 net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point) 63 end_points[end_point] = net 64 if end_point == final_endpoint: 65 return net, end_points 66 # 35 × 35 × 192 67 68 # Inception blocks 69 with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], 70 stride=1, 71 padding='SAME'): 72 # mixed: 35 × 35 × 256 73 end_point = 'Mixed_5b' 74 with tf.variable_scope(end_point): 75 with tf.variable_scope('Branch_0'): 76 branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 77 with tf.variable_scope('Branch_1'): 78 branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1') 79 branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_1x1') 80 with tf.variable_scope('Branch_2'): 81 branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 82 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_1x1') 83 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_1x1') 84 with tf.variable_scope('Branch_3'): 85 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 86 branch_3 = slim.conv2d(branch_3, depth(32), [1, 1], scope='Conv2d_0b_1x1') 87 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 88 end_points[end_point] = net 89 if end_point == final_endpoint: 90 return net, end_points 91 92 # mixed_1: 35 × 35 × 288 93 end_point = 'Mixed_5c' 94 with tf.variable_scope(end_point): 95 with tf.variable_scope('Branch_0'): 96 branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 97 with tf.variable_scope('Branch_1'): 98 branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0b_1x1') 99 branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv_1_0c_1x1') 100 with tf.variable_scope('Branch_2'): 101 branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 102 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3') 103 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3') 104 with tf.variable_scope('Branch_3'): 105 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 106 branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1') 107 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 108 end_points[end_point] = net 109 if end_point == final_endpoint: 110 return net, end_points 111 112 # mixed_2: 35 × 35 × 288 113 end_point = 'Mixed_5d' 114 with tf.variable_scope(end_point): 115 with tf.variable_scope('Branch_0'): 116 branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 117 with tf.variable_scope('Branch_1'): 118 branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1') 119 branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_5x5') 120 with tf.variable_scope('Branch_2'): 121 branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 122 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3') 123 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3') 124 with tf.variable_scope('Branch_3'): 125 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 126 branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1') 127 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 128 end_points[end_point] = net 129 if end_point == final_endpoint: 130 return net, end_points 131 132 # mixed_3: 17 × 17 × 768 133 end_point = 'Mixed_6a' 134 with tf.variable_scope(end_point): 135 with tf.variable_scope('Branch_0'): 136 branch_0 = slim.conv2d(net, depth(384), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1') 137 with tf.variable_scope('Branch_1'): 138 branch_1 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 139 branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], scope='Conv2d_0b_3x3') 140 branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1') 141 with tf.variable_scope('Branch_2'): 142 branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3') 143 net = tf.concat([branch_0, branch_1, branch_2], 3) 144 end_points[end_point] = net 145 if end_point == final_endpoint: 146 return net, end_points 147 148 # mixed_4: 17 x 17 x 768. 149 end_point = 'Mixed_6b' 150 with tf.variable_scope(end_point): 151 with tf.variable_scope('Branch_0'): 152 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 153 with tf.variable_scope('Branch_1'): 154 branch_1 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1') 155 branch_1 = slim.conv2d(branch_1, depth(128), [1, 7], scope='Conv2d_0b_1x7') # 输出层大小不变,即使过滤器长宽不同 156 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 157 with tf.variable_scope('Branch_2'): 158 branch_2 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1') 159 branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0b_7x1') 160 branch_2 = slim.conv2d(branch_2, depth(128), [1, 7], scope='Conv2d_0c_1x7') 161 branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0d_7x1') 162 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 163 with tf.variable_scope('Branch_3'): 164 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 165 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 166 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 167 end_points[end_point] = net 168 if end_point == final_endpoint: 169 return net, end_points 170 171 # mixed_5: 17 x 17 x 768. 172 end_point = 'Mixed_6c' 173 with tf.variable_scope(end_point): 174 with tf.variable_scope('Branch_0'): 175 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 176 with tf.variable_scope('Branch_1'): 177 branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 178 branch_1 = slim.conv2d(branch_1, depth(160), [1, 7], scope='Conv2d_0b_1x7') 179 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 180 with tf.variable_scope('Branch_2'): 181 branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 182 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0b_7x1') 183 branch_2 = slim.conv2d(branch_2, depth(160), [1, 7], scope='Conv2d_0c_1x7') 184 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0d_7x1') 185 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 186 with tf.variable_scope('Branch_3'): 187 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 188 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 189 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 190 end_points[end_point] = net 191 if end_point == final_endpoint: 192 return net, end_points 193 194 # mixed_6: 17 x 17 x 768. 195 end_point = 'Mixed_6d' 196 with tf.variable_scope(end_point): 197 with tf.variable_scope('Branch_0'): 198 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 199 with tf.variable_scope('Branch_1'): 200 branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 201 branch_1 = slim.conv2d(branch_1, depth(160), [1, 7], scope='Conv2d_0b_1x7') 202 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 203 with tf.variable_scope('Branch_2'): 204 branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 205 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0b_7x1') 206 branch_2 = slim.conv2d(branch_2, depth(160), [1, 7], scope='Conv2d_0c_1x7') 207 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0d_7x1') 208 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 209 with tf.variable_scope('Branch_3'): 210 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 211 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 212 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 213 end_points[end_point] = net 214 if end_point == final_endpoint: 215 return net, end_points 216 217 # mixed_7: 17 x 17 x 768. 218 end_point = 'Mixed_6e' 219 with tf.variable_scope(end_point): 220 with tf.variable_scope('Branch_0'): 221 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 222 with tf.variable_scope('Branch_1'): 223 branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 224 branch_1 = slim.conv2d(branch_1, depth(192), [1, 7], scope='Conv2d_0b_1x7') 225 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 226 with tf.variable_scope('Branch_2'): 227 branch_2 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 228 branch_2 = slim.conv2d(branch_2, depth(192), [7, 1], scope='Conv2d_0b_7x1') 229 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0c_1x7') 230 branch_2 = slim.conv2d(branch_2, depth(192), [7, 1], scope='Conv2d_0d_7x1') 231 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 232 with tf.variable_scope('Branch_3'): 233 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 234 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 235 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 236 end_points[end_point] = net 237 if end_point == final_endpoint: 238 return net, end_points 239 240 # mixed_8: 8 x 8 x 1280. 241 end_point = 'Mixed_7a' 242 with tf.variable_scope(end_point): 243 with tf.variable_scope('Branch_0'): 244 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 245 branch_0 = slim.conv2d(branch_0, depth(320), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3') 246 with tf.variable_scope('Branch_1'): 247 branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 248 branch_1 = slim.conv2d(branch_1, depth(192), [1, 7], scope='Conv2d_0b_1x7') 249 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 250 branch_1 = slim.conv2d(branch_1, depth(192), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3') 251 with tf.variable_scope('Branch_2'): 252 branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3') 253 net = tf.concat([branch_0, branch_1, branch_2], 3) 254 end_points[end_point] = net 255 if end_point == final_endpoint: 256 return net, end_points 257 258 # mixed_9: 8 x 8 x 2048. 259 end_point = 'Mixed_7b' 260 with tf.variable_scope(end_point): 261 with tf.variable_scope('Branch_0'): 262 branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1') 263 with tf.variable_scope('Branch_1'): 264 branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1') 265 branch_1 = tf.concat( 266 [ 267 slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'), 268 slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0b_3x1') 269 ], 270 3) 271 with tf.variable_scope('Branch_2'): 272 branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1') 273 branch_2 = slim.conv2d(branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3') 274 branch_2 = tf.concat( 275 [ 276 slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'), 277 slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1') 278 ], 279 3) 280 with tf.variable_scope('Branch_3'): 281 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 282 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 283 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 284 end_points[end_point] = net 285 if end_point == final_endpoint: 286 return net, end_points 287 288 # mixed_10: 8 x 8 x 2048. 289 end_point = 'Mixed_7c' 290 with tf.variable_scope(end_point): 291 with tf.variable_scope('Branch_0'): 292 branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1') 293 with tf.variable_scope('Branch_1'): 294 branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1') 295 branch_1 = tf.concat( 296 [ 297 slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'), 298 slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0c_3x1') 299 ], 300 3) 301 with tf.variable_scope('Branch_2'): 302 branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1') 303 branch_2 = slim.conv2d(branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3') 304 branch_2 = tf.concat( 305 [ 306 slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'), 307 slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1') 308 ], 309 3) 310 with tf.variable_scope('Branch_3'): 311 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 312 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 313 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 314 end_points[end_point] = net 315 if end_point == final_endpoint: 316 return net, end_points 317 raise ValueError('Unknown final endpoint %s' % final_endpoint) 318 319 320 # 源文件中该函数放在了后面也能调用,why?? 321 def _reduced_kernel_size_for_small_input(input_tensor, kernel_size): 322 ''' 323 Define kernel size which is automatically reduced for small input. 324 325 If the shape of the input images is unknown at graph construction time this 326 function assumes that the input images are is large enough. 327 328 ''' 329 shape = input_tensor.get_shape().as_list() # [?, 5, 5, 128] 330 if shape[1] is None or shape[2] is None: 331 kernel_size_out = kernel_size 332 else: 333 kernel_size_out = [min(shape[1], kernel_size[0]), min(shape[2], kernel_size[1])] 334 return kernel_size_out 335 336 337 def incepiton_v3(inputs, 338 num_classes=1000, 339 is_training=True, 340 dropout_keep_prob=0.8, 341 min_depth=16, 342 depth_multiplier=1.0, 343 prediction_fn=slim.softmax, 344 spatial_squeeze=True, 345 reuse=None, 346 scope='InceptionV3'): 347 if depth_multiplier <= 0: 348 raise ValueError('depth_multiplier is not greater than zero.') 349 depth = lambda d: max(int(d * depth_multiplier), min_depth) 350 351 with tf.variable_scope(scope, 'InceptionV3', [inputs, num_classes], reuse=reuse) as scope: 352 with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training): 353 net, end_points = inception_v3_base(inputs, scope=scope, min_depth=min_depth, depth_multiplier=depth_multiplier) 354 # Auxiliary Head logits 355 # 这一部分是做啥的?? 356 with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], 357 stride=1, 358 padding='SAME'): 359 aux_logits = end_points['Mixed_6e'] # mixed_7: 17 x 17 x 768 360 with tf.variable_scope('AuxLogits'): 361 # 5 × 5 × 768 362 aux_logits = slim.avg_pool2d(aux_logits, [5, 5], stride=3, padding='VALID', scope='AvgPool_1a_5x5') 363 # 5 × 5 × 128 364 aux_logits = slim.conv2d(aux_logits, depth(128), [1, 1], scope='Conv2d_1b_1x1') 365 366 # shape of feature map before the final layer. 367 kernel_size = _reduced_kernel_size_for_small_input(aux_logits, [5, 5]) 368 # 1 × 1 × 768 输入层大小与过滤器尺寸相同,按照公式计算就没问题 369 aux_logits = slim.conv2d(aux_logits, depth(768), kernel_size, padding='VALID', weights_initializer=trunc_normal(0.01), scope='Conv2d_2a_{}x{}'.format(*kernel_size)) 370 # 1 × 1 × 1000 371 aux_logits = slim.conv2d(aux_logits, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, weights_initializer=trunc_normal(0.001), scope='Conv2d_2b_1x1') 372 if spatial_squeeze: 373 # (?, 1000) 374 aux_logits = tf.squeeze(aux_logits, [1, 2], name='SpatialSqueeze') 375 end_points['AuxLogits'] = aux_logits 376 377 # final pooling and prediction 378 with tf.variable_scope('Logits'): 379 kernel_size = _reduced_kernel_size_for_small_input(net, [8, 8]) 380 # 1 × 1 × 2048 381 net = slim.avg_pool2d(net, kernel_size, padding='VALID', scope='AvgPool_1a_{}x{}'.format(*kernel_size)) 382 # 1 × 1 × 2048 383 # 这里居然有一个dropout方法?? 384 net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b') 385 end_points['Predictions'] = net 386 slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='Conv2d_1c_1x1') 387 if spatial_squeeze: 388 # (?, 2048) 389 logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze') 390 end_points['Logits'] = logits 391 end_points['Predictions'] = slim.softmax(logits, scope='Predictions') 392 393 return logits, end_points 394 395 396 # 在迁移中,定义模型时会用到 397 def inception_v3_arg_scope(weight_decay=0.00004, 398 batch_norm_var_collection='moving_vars', 399 batch_norm_decay=0.9997, 400 batch_norm_epsilon=0.001, 401 updates_collections=tf.GraphKeys.UPDATE_OPS, 402 use_fused_batchnorm=True): 403 """Defines the default InceptionV3 arg scope. 404 Returns: 405 An `arg_scope` to use for the inception v3 model. 406 """ 407 batch_norm_params = { 408 # Decay for the moving averages. 409 'decay': batch_norm_decay, 410 # epsilon to prevent 0s in variance. 411 'epsilon': batch_norm_epsilon, 412 # collection containing update_ops. 413 'updates_collections': updates_collections, 414 # Use fused batch norm if possible. 415 'fused': use_fused_batchnorm, 416 # collection containing the moving mean and moving variance. 417 'variables_collections': { 418 'beta': None, 419 'gamma': None, 420 'moving_mean': [batch_norm_var_collection], 421 'moving_variance': [batch_norm_var_collection], 422 } 423 } 424 425 # Set weight_decay for weights in Conv and FC layers. 426 with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_regularizer=slim.l2_regularizer(weight_decay)): 427 with slim.arg_scope( 428 [slim.conv2d], 429 weights_initializer=slim.variance_scaling_initializer(), 430 activation_fn=tf.nn.relu, 431 normalizer_fn=slim.batch_norm, 432 normalizer_params=batch_norm_params) as sc: 433 return sc 434 435 436 inputs = tf.placeholder(tf.float32, shape=[None, 299, 299, 3], name='X') 437 # inception_v3_base(inputs) 438 incepiton_v3(inputs)
输入层为299*299*3的三维矩阵。
6.5 卷积神经网络迁移学习
迁移学习,就是将一个问题上训练好的模型通过简单的调整使其适用于一个新的问题。
利用 ImageNet 数据集上训练好的 Inception-v3 模型来解决一个新的图像分类问题 。可以保留训练好的 Inception-v3 模型中所有卷积层的参数,只是替换最后一层全连接层。在最后这一层全连接层之前的网络层称之为瓶颈层( bottleneck ) 。瓶颈层指的是一层。
一般来说,在数据足够的情况下,迁移学习的效果不如完全重新训练。
迁移学习处理
处理文件样例,需要在2核8g上才能执行
1 import os 2 import glob 3 import tensorflow as tf 4 import numpy as np 5 6 INPUT_DATA = '/home/yangxl/flower_photos' # 输入文件 7 OUTPUT_DATA = '/home/yangxl/flower_processed_data.npy' # 输出文件 8 9 VALIDATION_PERCENTAGE = 10 10 TEST_PERCENTAGE = 10 11 12 def create_image_lists(sess, testing_percentage, validation_percentage): 13 sub_dirs = [x[0] for x in os.walk(INPUT_DATA)] # 当前目录和子目录 14 # print(sub_dirs) 15 is_root_dir = True 16 17 # 初始化各个数据集 18 training_images = [] 19 training_labels = [] 20 testing_images = [] 21 testing_labels = [] 22 validation_images = [] 23 validation_labels = [] 24 current_labels = 0 25 26 # 读取所有子目录 27 for sub_dir in sub_dirs: 28 if is_root_dir: # 把第一个排除了 29 is_root_dir = False 30 continue 31 32 # 获取一个子目录中所有的图片文件 33 extensions = ['jpg', 'jpeg', 'JPG', 'JPEG'] 34 file_list = [] 35 dir_name = os.path.basename(sub_dir) # '/'最后面的部分 36 print(dir_name) 37 for extension in extensions: 38 file_glob = os.path.join(INPUT_DATA, dir_name, '*.' + extension) 39 file_list.extend(glob.glob(file_glob)) # glob.glob返回一个匹配该模式的列表, glob和os配合使用来操作文件 40 if not file_list: 41 continue 42 43 # 处理图片数据 44 for file_name in file_list: 45 image_raw_data = tf.gfile.GFile(file_name, 'rb').read() # 二进制数据 46 image = tf.image.decode_jpeg(image_raw_data) # tensor, dtype=uint8 333×500×3 色道0~255 47 if image.dtype != tf.float32: 48 image = tf.image.convert_image_dtype(image, dtype=tf.float32) # 色道值0~1 49 image = tf.image.resize_images(image, [299, 299]) 50 image_value = sess.run(image) # numpy.ndarray 51 52 # 随机划分数据集 53 chance = np.random.randint(100) 54 if chance < validation_percentage: 55 validation_images.append(image_value) 56 validation_labels.append(current_labels) 57 elif chance < validation_percentage + testing_percentage: 58 testing_images.append(image_value) 59 testing_labels.append(current_labels) 60 else: 61 training_images.append(image_value) 62 training_labels.append(current_labels) 63 current_labels += 1 64 65 # 将训练数据随机打乱以获得更好的训练效果, 将数据打乱,但仍保持training_images和training_labels的对应关系。 66 state = np.random.get_state() 67 np.random.shuffle(training_images) 68 np.random.set_state(state) 69 np.random.shuffle(training_labels) 70 71 print("it's time to return") 72 return np.asarray([training_images, training_labels, 73 validation_images, validation_labels, 74 testing_images, testing_labels]) 75 76 def main(): 77 with tf.Session() as sess: 78 processed_data = create_image_lists(sess, TEST_PERCENTAGE, VALIDATION_PERCENTAGE) 79 # 通过numpy格式保存处理后的数据 80 np.save(OUTPUT_DATA, processed_data) 81 82 if __name__ == '__main__': 83 main()
获取交叉熵更简便的方式:
tf.losses.softmax_cross_entropy(tf.one_hot(labels, N_CLASSES), logits, weights=1.0)
train_step = tf.train.RMSPropOptimizer(LEARNING_RATE).minimize(tf.losses.get_total_loss())
迁移学习示例,
1 #!coding:utf8 2 3 import tensorflow as tf 4 import numpy as np 5 import tensorflow.contrib.slim as slim 6 7 # 加载inception-v3模型 8 import tensorflow.contrib.slim.python.slim.nets.inception_v3 as inception_v3 9 10 INPUT_DATA = '/home/yangxl/files/flower_processed_data.npy' 11 12 TRAIN_FILE = '/home/yangxl/files/save_model' 13 CKPT_FILE = '/home/yangxl/files/inception_v3.ckpt' 14 15 LEARNING_RATE = 0.0001 16 STEPS = 300 17 BATCH = 32 18 N_CLASSES = 5 # 5种花 19 20 CHECKPOINT_EXCLUDE_SCOPES = 'InceptionV3/Logits,InceptionV3/AuxLogits' 21 TRAINABLE_SCOPES = 'InceptionV3/Logits,InceptionV3/AuxLogits' 22 23 # 获取所有需要从训练好的模型中加载的参数 24 def get_tuned_variables(): 25 exclusions = [scope.strip() for scope in CHECKPOINT_EXCLUDE_SCOPES.split(',')] 26 variables_to_restore = [] 27 28 # 过滤参数 29 for var in slim.get_model_variables(): # 先定义了inception-v3模型,之后才会有变量 30 excluded = False 31 for exclusion in exclusions: 32 if var.op.name.startswith(exclusion): 33 excluded = True 34 break 35 if not excluded: 36 variables_to_restore.append(var) 37 return variables_to_restore 38 39 # 获取所有需要训练的变量列表 40 def get_trainable_variables(): 41 scopes = [scope.strip() for scope in TRAINABLE_SCOPES.split(',')] 42 variables_to_train = [] 43 for scope in scopes: 44 variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope) # 对scope进行正则匹配 45 variables_to_train.append(variables) 46 return variables_to_train 47 48 def main(arg=None): 49 processed_data = np.load(INPUT_DATA) 50 training_images = processed_data[0] 51 n_training_example = len(training_images) 52 training_labels = processed_data[1] 53 validation_images = processed_data[2] 54 validation_labels = processed_data[3] 55 testing_images = processed_data[4] 56 testing_labels = processed_data[5] 57 print('%d training examples, %s validation examples and %d tseting examples.' % (n_training_example, len(validation_labels), len(testing_labels))) 58 59 images = tf.placeholder(tf.float32, [None, 299, 299, 3], name='input_images') 60 labels = tf.placeholder(tf.int64, [None], name='labels') # 5种花 61 62 # 定义inception-v3模型,因为谷歌给出的只有模型参数取值,所以这里需要在这个代码中定义inception-v3的结构。 63 with slim.arg_scope(inception_v3.inception_v3_arg_scope()): 64 # inception_v3.inception_v3_arg_scope()是一个包含两个键的字典。嵌套的arg_scope函数返回的字典会整合到一起。 65 # inception_v3.inception_v3函数里的一些函数可能会使用字典中的参数。 66 logits, _ = inception_v3.inception_v3(images, num_classes=N_CLASSES) 67 68 # 获取需要训练的变量 69 trainable_variables = get_trainable_variables() 70 # print('==', len(trainable_variables), trainable_variables) 71 ''' 72 [[<tf.Variable 'InceptionV3/Logits/Conv2d_1c_1x1/weights:0' shape=(1, 1, 2048, 5) dtype=float32_ref>, 73 <tf.Variable 'InceptionV3/Logits/Conv2d_1c_1x1/biases:0' shape=(5,) dtype=float32_ref>], 74 [<tf.Variable 'InceptionV3/AuxLogits/Conv2d_1b_1x1/weights:0' shape=(1, 1, 768, 128) dtype=float32_ref>, 75 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta:0' shape=(128,) dtype=float32_ref>, 76 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2a_5x5/weights:0' shape=(5, 5, 128, 768) dtype=float32_ref>, 77 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta:0' shape=(768,) dtype=float32_ref>, 78 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2b_1x1/weights:0' shape=(1, 1, 768, 5) dtype=float32_ref>, 79 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2b_1x1/biases:0' shape=(5,) dtype=float32_ref>]] 80 81 ''' 82 # 定义损失函数。在模型定义的时候已经将正则化损失加入损失集合了。 83 tf.losses.softmax_cross_entropy(tf.one_hot(labels, N_CLASSES), logits, weights=1.0) 84 85 # 定义训练过程 86 train_step = tf.train.RMSPropOptimizer(LEARNING_RATE).minimize(tf.losses.get_total_loss()) 87 88 # 计算正确率 89 with tf.name_scope('evaluation'): 90 correct_prediction = tf.equal(tf.argmax(logits, 1), labels) 91 evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 92 93 # 定义加载模型的函数。返回一个回调函数callback,执行callback(sess)就会加载get_tuned_variables()变量列表到当前图。 94 load_fn = slim.assign_from_checkpoint_fn(CKPT_FILE, get_tuned_variables(), ignore_missing_vars=True) 95 96 # 定义保存新的训练好的模型的函数 97 saver = tf.train.Saver() 98 99 with tf.Session() as sess: 100 # 初始化没有加载进来的变量。这个过程一定要在模型加载之前,否则初始化过程会将已经加载好的变量重新赋值。 101 tf.global_variables_initializer().run() 102 # 加载已经训练好的模型 103 print('Loading tuned variables from %s' % CKPT_FILE) 104 load_fn(sess) 105 106 start = 0 107 end = BATCH 108 for i in range(STEPS): 109 sess.run(train_step, feed_dict={ 110 images: training_images[start: end], 111 labels: training_labels[start: end] 112 }) 113 114 # 输出日志 115 if i % 30 == 0 or i + 1 == STEPS: 116 saver.save(sess, TRAIN_FILE, global_step=i) 117 validation_accuracy = sess.run(evaluation_step, feed_dict={ 118 images: validation_images, labels: validation_labels 119 }) 120 print('Step %d: Validation accuracy = %.1f%%' % (i, validation_accuracy * 100.0)) 121 122 start = end 123 if start == n_training_example: 124 start = 0 125 end = start + BATCH 126 if end > n_training_example: 127 end = n_training_example 128 test_accuracy = sess.run(evaluation_step, feed_dict={ 129 images: testing_images, labels: testing_labels 130 }) 131 print('Final test accuracy = %.1f%%' % (test_accuracy * 100.0)) 132 133 if __name__ == '__main__': 134 tf.app.run()
执行过程:
代码执行了12个小时,但是top命令中的TIME+显示只有300多分钟,why??
执行过程中,`load average`相当高,但是进程的CPU、MEM使用率很低,可能是CPU执行了内存和swap之间的调度,really??