【源码解读】tensorflow slim AlexNet

代码来源:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/nets/alexnet.py

一、TF-slim介绍

TF-Slim 是 TensorFlow 中一个用来构建、训练、评估复杂模型的轻量化库。TF-Slim 模块可以和 TensorFlow 中其它API混合使用。
Slim 模块可以使模型的构建、训练、评估变得简单。但是在自己使用过程中还是会遇到不少问题,决定阅读网络源码来加深一下理解,也在此分享一下。如果哪里理解有误,烦请大家指出。

二、AlexNet网络结构

在这里插入图片描述
AlexNet包含五层卷积层,三层池化层以及三层全连接层。了解完网络结构,接下来看代码吧!

三、TF-slim中AlexNet代码

一、导入模型所需要的包
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from tensorflow.contrib import layers
from tensorflow.contrib.framework.python.ops import arg_scope
from tensorflow.contrib.layers.python.layers import layers as layers_lib
from tensorflow.contrib.layers.python.layers import regularizers
from tensorflow.contrib.layers.python.layers import utils
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import init_ops
from tensorflow.python.ops import nn_ops
from tensorflow.python.ops import variable_scope
一、AlexNet网络结构函数
def alexnet_v2(inputs,
               num_classes=1000,
               is_training=True,
               dropout_keep_prob=0.5,
               spatial_squeeze=True,
               scope='alexnet_v2'):
  """AlexNet version 2.
  Described in: http://arxiv.org/pdf/1404.5997v2.pdf
  Parameters from:
  github.com/akrizhevsky/cuda-convnet2/blob/master/layers/
  layers-imagenet-1gpu.cfg
  Note: All the fully_connected layers have been transformed to conv2d layers.
        To use in classification mode, resize input to 224x224. To use in fully
        convolutional mode, set spatial_squeeze to false.
        The LRN layers have been removed and change the initializers from
        random_normal_initializer to xavier_initializer.
  Args:
    inputs: a tensor of size [batch_size, height, width, channels].
    num_classes: number of predicted classes.
    is_training: whether or not the model is being trained.
    dropout_keep_prob: the probability that activations are kept in the dropout
      layers during training.
    spatial_squeeze: whether or not should squeeze the spatial dimensions of the
      outputs. Useful to remove unnecessary dimensions for classification.
    scope: Optional scope for the variables.
  Returns:
    the last op containing the log predictions and end_points dict.
  """
  with variable_scope.variable_scope(scope, 'alexnet_v2', [inputs]) as sc:
    end_points_collection = sc.original_name_scope + '_end_points'
    # Collect outputs for conv2d, fully_connected and max_pool2d.
    with arg_scope(
        [layers.conv2d, layers_lib.fully_connected, layers_lib.max_pool2d],
        outputs_collections=[end_points_collection]):
      net = layers.conv2d(
          inputs, 64, [11, 11], 4, padding='VALID', scope='conv1')
      net = layers_lib.max_pool2d(net, [3, 3], 2, scope='pool1')
      net = layers.conv2d(net, 192, [5, 5], scope='conv2')
      net = layers_lib.max_pool2d(net, [3, 3], 2, scope='pool2')
      net = layers.conv2d(net, 384, [3, 3], scope='conv3')
      net = layers.conv2d(net, 384, [3, 3], scope='conv4')
      net = layers.conv2d(net, 256, [3, 3], scope='conv5')
      net = layers_lib.max_pool2d(net, [3, 3], 2, scope='pool5')

      # Use conv2d instead of fully_connected layers.
      with arg_scope(
          [layers.conv2d],
          weights_initializer=trunc_normal(0.005),
          biases_initializer=init_ops.constant_initializer(0.1)):
        net = layers.conv2d(net, 4096, [5, 5], padding='VALID', scope='fc6')
        net = layers_lib.dropout(
            net, dropout_keep_prob, is_training=is_training, scope='dropout6')
        net = layers.conv2d(net, 4096, [1, 1], scope='fc7')
        net = layers_lib.dropout(
            net, dropout_keep_prob, is_training=is_training, scope='dropout7')
        net = layers.conv2d(
            net,
            num_classes, [1, 1],
            activation_fn=None,
            normalizer_fn=None,
            biases_initializer=init_ops.zeros_initializer(),
            scope='fc8')

      # Convert end_points_collection into a end_point dict.
      end_points = utils.convert_collection_to_dict(end_points_collection)
      if spatial_squeeze:
        net = array_ops.squeeze(net, [1, 2], name='fc8/squeezed')
        end_points[sc.name + '/fc8'] = net
      return net, end_points

首先看一下该函数传入的参数。
inputs:一个batch的张量,形式为[batch_size, height, width, channels],默认的话每个图像要resize成[batchsize,224,224,通道数]
num_classes:类别数目,影响返回FC层输出的大小(以默认值1000为例,若batchsize为64,则最终返回的shape为[64,1000])
is_training=True:是否为训练模式的标志位,作用于FC6和FC7,影响这两层是否需要进行Dropout。若为True,为训练模式,则dropout起工作。否则为False,非训练模式,下面两段代码都直接返回输入值,即dropout不工作。

net = layers_lib.dropout(
    net, dropout_keep_prob, is_training=is_training, scope='dropout6')
net = layers_lib.dropout(
    net, dropout_keep_prob, is_training=is_training, scope='dropout7') 

dropout_keep_prob:每个神经元dropout过程中被保留的概率,默认为0.5
spatial_squeeze:是否要进行空间压缩的标志位,在图像分类问题中,最后的返回值需要是[batchsize,num_classes],而FC8最后的输出为[batchsize,1,1,num_classes]。因此需要将输出的第1,2维抛弃掉。下面的代码就是进行了这样的工作:

if spatial_squeeze:
    net = array_ops.squeeze(net, [1, 2], name='fc8/squeezed')

对输入进行卷积操作做,包含64个大小为[11,11]的卷积核,步长为4,填充方式为为‘VALID’。其他填充方式还有‘SAME’。默认的激活函数为Relu。
具体操作细节见:https://www.cnblogs.com/White-xzx/p/9497029.html
若原图大小为 W × W {W\times W} W×W,卷积核大小为 F × F {F\times F} F×F,步长为 S S S
通过‘VALID’模式进行padding最后的输出shape为(向上取整):
( W − F + 1 ) / S (W-F+1)/S (WF+1)/S
通过‘SMAE’模式进行padding最后的输出shape为(向上取整):
W / S W/S W/S
FC层,使用卷积操作代替全连接层操作,和上面操作类似,最终返回的net即为我们需要的[batchsize,num_classes]特征。
之后就可以使用它来进行loss的计算啦。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值