一、TF-slim介绍
TF-Slim 是 TensorFlow 中一个用来构建、训练、评估复杂模型的轻量化库。TF-Slim 模块可以和 TensorFlow 中其它API混合使用。
Slim 模块可以使模型的构建、训练、评估变得简单。但是在自己使用过程中还是会遇到不少问题,决定阅读网络源码来加深一下理解,也在此分享一下。如果哪里理解有误,烦请大家指出。
二、AlexNet网络结构
AlexNet包含五层卷积层,三层池化层以及三层全连接层。了解完网络结构,接下来看代码吧!
三、TF-slim中AlexNet代码
一、导入模型所需要的包
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from tensorflow.contrib import layers
from tensorflow.contrib.framework.python.ops import arg_scope
from tensorflow.contrib.layers.python.layers import layers as layers_lib
from tensorflow.contrib.layers.python.layers import regularizers
from tensorflow.contrib.layers.python.layers import utils
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import init_ops
from tensorflow.python.ops import nn_ops
from tensorflow.python.ops import variable_scope
一、AlexNet网络结构函数
def alexnet_v2(inputs,
num_classes=1000,
is_training=True,
dropout_keep_prob=0.5,
spatial_squeeze=True,
scope='alexnet_v2'):
"""AlexNet version 2.
Described in: http://arxiv.org/pdf/1404.5997v2.pdf
Parameters from:
github.com/akrizhevsky/cuda-convnet2/blob/master/layers/
layers-imagenet-1gpu.cfg
Note: All the fully_connected layers have been transformed to conv2d layers.
To use in classification mode, resize input to 224x224. To use in fully
convolutional mode, set spatial_squeeze to false.
The LRN layers have been removed and change the initializers from
random_normal_initializer to xavier_initializer.
Args:
inputs: a tensor of size [batch_size, height, width, channels].
num_classes: number of predicted classes.
is_training: whether or not the model is being trained.
dropout_keep_prob: the probability that activations are kept in the dropout
layers during training.
spatial_squeeze: whether or not should squeeze the spatial dimensions of the
outputs. Useful to remove unnecessary dimensions for classification.
scope: Optional scope for the variables.
Returns:
the last op containing the log predictions and end_points dict.
"""
with variable_scope.variable_scope(scope, 'alexnet_v2', [inputs]) as sc:
end_points_collection = sc.original_name_scope + '_end_points'
# Collect outputs for conv2d, fully_connected and max_pool2d.
with arg_scope(
[layers.conv2d, layers_lib.fully_connected, layers_lib.max_pool2d],
outputs_collections=[end_points_collection]):
net = layers.conv2d(
inputs, 64, [11, 11], 4, padding='VALID', scope='conv1')
net = layers_lib.max_pool2d(net, [3, 3], 2, scope='pool1')
net = layers.conv2d(net, 192, [5, 5], scope='conv2')
net = layers_lib.max_pool2d(net, [3, 3], 2, scope='pool2')
net = layers.conv2d(net, 384, [3, 3], scope='conv3')
net = layers.conv2d(net, 384, [3, 3], scope='conv4')
net = layers.conv2d(net, 256, [3, 3], scope='conv5')
net = layers_lib.max_pool2d(net, [3, 3], 2, scope='pool5')
# Use conv2d instead of fully_connected layers.
with arg_scope(
[layers.conv2d],
weights_initializer=trunc_normal(0.005),
biases_initializer=init_ops.constant_initializer(0.1)):
net = layers.conv2d(net, 4096, [5, 5], padding='VALID', scope='fc6')
net = layers_lib.dropout(
net, dropout_keep_prob, is_training=is_training, scope='dropout6')
net = layers.conv2d(net, 4096, [1, 1], scope='fc7')
net = layers_lib.dropout(
net, dropout_keep_prob, is_training=is_training, scope='dropout7')
net = layers.conv2d(
net,
num_classes, [1, 1],
activation_fn=None,
normalizer_fn=None,
biases_initializer=init_ops.zeros_initializer(),
scope='fc8')
# Convert end_points_collection into a end_point dict.
end_points = utils.convert_collection_to_dict(end_points_collection)
if spatial_squeeze:
net = array_ops.squeeze(net, [1, 2], name='fc8/squeezed')
end_points[sc.name + '/fc8'] = net
return net, end_points
首先看一下该函数传入的参数。
inputs:一个batch的张量,形式为[batch_size, height, width, channels],默认的话每个图像要resize成[batchsize,224,224,通道数]
num_classes:类别数目,影响返回FC层输出的大小(以默认值1000为例,若batchsize为64,则最终返回的shape为[64,1000])
is_training=True:是否为训练模式的标志位,作用于FC6和FC7,影响这两层是否需要进行Dropout。若为True,为训练模式,则dropout起工作。否则为False,非训练模式,下面两段代码都直接返回输入值,即dropout不工作。
net = layers_lib.dropout(
net, dropout_keep_prob, is_training=is_training, scope='dropout6')
net = layers_lib.dropout(
net, dropout_keep_prob, is_training=is_training, scope='dropout7')
dropout_keep_prob:每个神经元dropout过程中被保留的概率,默认为0.5
spatial_squeeze:是否要进行空间压缩的标志位,在图像分类问题中,最后的返回值需要是[batchsize,num_classes],而FC8最后的输出为[batchsize,1,1,num_classes]。因此需要将输出的第1,2维抛弃掉。下面的代码就是进行了这样的工作:
if spatial_squeeze:
net = array_ops.squeeze(net, [1, 2], name='fc8/squeezed')
对输入进行卷积操作做,包含64个大小为[11,11]的卷积核,步长为4,填充方式为为‘VALID’。其他填充方式还有‘SAME’。默认的激活函数为Relu。
具体操作细节见:https://www.cnblogs.com/White-xzx/p/9497029.html
若原图大小为
W
×
W
{W\times W}
W×W,卷积核大小为
F
×
F
{F\times F}
F×F,步长为
S
S
S
通过‘VALID’模式进行padding最后的输出shape为(向上取整):
(
W
−
F
+
1
)
/
S
(W-F+1)/S
(W−F+1)/S
通过‘SMAE’模式进行padding最后的输出shape为(向上取整):
W
/
S
W/S
W/S
FC层,使用卷积操作代替全连接层操作,和上面操作类似,最终返回的net即为我们需要的[batchsize,num_classes]特征。
之后就可以使用它来进行loss的计算啦。