【源码解读】tensorflow slim AlexNet

最新推荐文章于 2019-11-12 16:44:44 发布

puchapu

最新推荐文章于 2019-11-12 16:44:44 发布

阅读量289

点赞数

分类专栏：源码文章标签：源码解读

本文链接：https://blog.csdn.net/puchapu/article/details/90049384

版权

源码专栏收录该内容

1 篇文章 0 订阅

订阅专栏

代码来源：https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/nets/alexnet.py

一、TF-slim介绍

TF-Slim 是 TensorFlow 中一个用来构建、训练、评估复杂模型的轻量化库。TF-Slim 模块可以和 TensorFlow 中其它API混合使用。
Slim 模块可以使模型的构建、训练、评估变得简单。但是在自己使用过程中还是会遇到不少问题，决定阅读网络源码来加深一下理解，也在此分享一下。如果哪里理解有误，烦请大家指出。

二、AlexNet网络结构

在这里插入图片描述
AlexNet包含五层卷积层，三层池化层以及三层全连接层。了解完网络结构，接下来看代码吧！

三、TF-slim中AlexNet代码

一、导入模型所需要的包

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from tensorflow.contrib import layers
from tensorflow.contrib.framework.python.ops import arg_scope
from tensorflow.contrib.layers.python.layers import layers as layers_lib
from tensorflow.contrib.layers.python.layers import regularizers
from tensorflow.contrib.layers.python.layers import utils
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import init_ops
from tensorflow.python.ops import nn_ops
from tensorflow.python.ops import variable_scope

一、AlexNet网络结构函数

def alexnet_v2(inputs,
               num_classes=1000,
               is_training=True,
               dropout_keep_prob=0.5,
               spatial_squeeze=True,
               scope='alexnet_v2'):
  """AlexNet version 2.
  Described in: http://arxiv.org/pdf/1404.5997v2.pdf
  Parameters from:
  github.com/akrizhevsky/cuda-convnet2/blob/master/layers/
  layers-imagenet-1gpu.cfg
  Note: All the fully_connected layers have been transformed to conv2d layers.
        To use in classification mode, resize input to 224x224. To use in fully
        convolutional mode, set spatial_squeeze to false.
        The LRN layers have been removed and change the initializers from
        random_normal_initializer to xavier_initializer.
  Args:
    inputs: a tensor of size [batch_size, height, width, channels].
    num_classes: number of predicted classes.
    is_training: whether or not the model is being trained.
    dropout_keep_prob: the probability that activations are kept in the dropout
      layers during training.
    spatial_squeeze: whether or not should squeeze the spatial dimensions of the
      outputs. Useful to remove unnecessary dimensions for classification.
    scope: Optional scope for the variables.
  Returns:
    the last op containing the log predictions and end_points dict.
  """
  with variable_scope.variable_scope(scope, 'alexnet_v2', [inputs]) as sc:
    end_points_collection = sc.original_name_scope + '_end_points'
    # Collect outputs for conv2d, fully_connected and max_pool2d.
    with arg_scope(
        [layers.conv2d, layers_lib.fully_connected, layers_lib.max_pool2d],
        outputs_collections=[end_points_collection]):
      net = layers.conv2d(
          inputs, 64, [11, 11], 4, padding='VALID', scope='conv1')
      net = layers_lib.max_pool2d(net, [3, 3], 2, scope='pool1')
      net = layers.conv2d(net, 192, [5, 5], scope='conv2')
      net = layers_lib.max_pool2d(net, [3, 3], 2, scope='pool2')
      net = layers.conv2d(net, 384, [3, 3], scope='conv3')
      net = layers.conv2d(net, 384, [3, 3], scope='conv4')
      net = layers.conv2d(net, 256, [3, 3], scope='conv5')
      net = layers_lib.max_pool2d(net, [3, 3], 2, scope='pool5')

      # Use conv2d instead of fully_connected layers.
      with arg_scope(
          [layers.conv2d],
          weights_initializer=trunc_normal(0.005),
          biases_initializer=init_ops.constant_initializer(0.1)):
        net = layers.conv2d(net, 4096, [5, 5], padding='VALID', scope='fc6')
        net = layers_lib.dropout(
            net, dropout_keep_prob, is_training=is_training, scope='dropout6')
        net = layers.conv2d(net, 4096, [1, 1], scope='fc7')
        net = layers_lib.dropout(
            net, dropout_keep_prob, is_training=is_training, scope='dropout7')
        net = layers.conv2d(
            net,
            num_classes, [1, 1],
            activation_fn=None,
            normalizer_fn=None,
            biases_initializer=init_ops.zeros_initializer(),
            scope='fc8')

      # Convert end_points_collection into a end_point dict.
      end_points = utils.convert_collection_to_dict(end_points_collection)
      if spatial_squeeze:
        net = array_ops.squeeze(net, [1, 2], name='fc8/squeezed')
        end_points[sc.name + '/fc8'] = net
      return net, end_points

首先看一下该函数传入的参数。
inputs：一个batch的张量，形式为[batch_size, height, width, channels]，默认的话每个图像要resize成[batchsize,224,224,通道数]
num_classes：类别数目，影响返回FC层输出的大小（以默认值1000为例，若batchsize为64，则最终返回的shape为[64,1000]）
is_training=True：是否为训练模式的标志位，作用于FC6和FC7，影响这两层是否需要进行Dropout。若为True，为训练模式，则dropout起工作。否则为False，非训练模式，下面两段代码都直接返回输入值，即dropout不工作。

net = layers_lib.dropout(
    net, dropout_keep_prob, is_training=is_training, scope='dropout6')
net = layers_lib.dropout(
    net, dropout_keep_prob, is_training=is_training, scope='dropout7')

dropout_keep_prob：每个神经元dropout过程中被保留的概率，默认为0.5
spatial_squeeze：是否要进行空间压缩的标志位，在图像分类问题中，最后的返回值需要是[batchsize,num_classes]，而FC8最后的输出为[batchsize，1，1，num_classes]。因此需要将输出的第1，2维抛弃掉。下面的代码就是进行了这样的工作：

if spatial_squeeze:
    net = array_ops.squeeze(net, [1, 2], name='fc8/squeezed')

对输入进行卷积操作做，包含64个大小为[11,11]的卷积核，步长为4，填充方式为为‘VALID’。其他填充方式还有‘SAME’。默认的激活函数为Relu。
具体操作细节见：https://www.cnblogs.com/White-xzx/p/9497029.html
若原图大小为 ${W\times W}$ ，卷积核大小为 ${F\times F}$ ，步长为 $S$
通过‘VALID’模式进行padding最后的输出shape为(向上取整)：
$(W - F + 1) / S$
通过‘SMAE’模式进行padding最后的输出shape为(向上取整)：
$W / S$
FC层，使用卷积操作代替全连接层操作，和上面操作类似，最终返回的net即为我们需要的[batchsize，num_classes]特征。
之后就可以使用它来进行loss的计算啦。