TensorFlow框架下的残差网络（ResNet）逐行代码解析，以及如何在finetune时调节输出特征的尺度

最新推荐文章于 2024-08-01 09:00:01 发布

jiongnima

最新推荐文章于 2024-08-01 09:00:01 发布

阅读量7.5k

点赞数 7

分类专栏：科研经验源码解析文章标签：深度学习源码解析残差网络 resnet 人工智能

本文链接：https://blog.csdn.net/jiongnima/article/details/86244783

版权

科研经验同时被 2 个专栏收录

17 篇文章 10 订阅

订阅专栏

源码解析

9 篇文章 1 订阅

订阅专栏

TensorFlow框架下的残差网络（ResNet）逐行代码解析，以及如何在finetune时调节输出特征的尺度

TensorFlow残差网络代码解析与输出特征尺度调节
- ResNet代码解析
- 如何在finetune时调节输出特征的尺度

TensorFlow残差网络代码解析与输出特征尺度调节

残差网络（下称ResNet）自2015年提出以来，无论在学术圈还是在工业界都作出了重大贡献。许多模型在使用ResNet作baseline之后，效果都得到了显著提升。因此，在进行实验时，许多工作需要在预训练的ResNet上面进行微调（finetune）。在本篇博客中，笔者就为大家逐行代码解析一下ResNet，并说明如何调整ResNet输出的特征维度。

在本篇博客中，笔者解析的ResNet代码是基于TensorFlow框架下的slim模块（tf.slim）的。
代码链接：https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models
在本篇博客中，笔者以ResNet V2 50^ 为例进行解析，其他ResNet版本代码都大同小异，可以举一反三，下面，就正式开始解析ResNet的代码。

ResNet代码解析

在ResNet V2 50^ 源码中，共有两个文件。一个文件是resnet_v2.py，是最主要的文件；另一个文件是resnet_utils.py，里面包含了一些辅助函数与操作。笔者按照老规矩，先放出逐行解析的源码：

首先是resnet_v2.py文件的源码：

# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains definitions for the preactivation form of Residual Networks.
Residual networks (ResNets) were originally proposed in:
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
    Deep Residual Learning for Image Recognition. arXiv:1512.03385
The full preactivation 'v2' ResNet variant implemented in this module was
introduced by:
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
    Identity Mappings in Deep Residual Networks. arXiv: 1603.05027
The key difference of the full preactivation 'v2' variant compared to the
'v1' variant in [1] is the use of batch normalization before every weight layer.
Typical use:
   from tensorflow.contrib.slim.nets import resnet_v2
ResNet-101 for image classification into 1000 classes:
   # inputs has shape [batch, 224, 224, 3]
   with slim.arg_scope(resnet_v2.resnet_arg_scope()):
      net, end_points = resnet_v2.resnet_v2_101(inputs, 1000, is_training=False)
ResNet-101 for semantic segmentation into 21 classes:
   # inputs has shape [batch, 513, 513, 3]
   with slim.arg_scope(resnet_v2.resnet_arg_scope()):
      net, end_points = resnet_v2.resnet_v2_101(inputs,
                                                21,
                                                is_training=False,
                                                global_pool=False,
                                                output_stride=16)
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

from nets import resnet_utils

slim = tf.contrib.slim
resnet_arg_scope = resnet_utils.resnet_arg_scope

#构造残差块
@slim.add_arg_scope
def bottleneck(inputs, depth, depth_bottleneck, stride, rate=1,
               outputs_collections=None, scope=None):
  """Bottleneck residual unit variant with BN before convolutions.
  This is the full preactivation residual unit variant proposed in [2]. See
  Fig. 1(b) of [2] for its definition. Note that we use here the bottleneck
  variant which has an extra bottleneck layer.
  When putting together two consecutive ResNet blocks that use this unit, one
  should use stride = 2 in the last unit of the first block.
  Args:
    inputs: A tensor of size [batch, height, width, channels].
    depth: The depth of the ResNet unit output.
    depth_bottleneck: The depth of the bottleneck layers.
    stride: The ResNet unit's stride. Determines the amount of downsampling of
      the units output compared to its input.
    rate: An integer, rate for atrous convolution.
    outputs_collections: Collection to add the ResNet unit output.
    scope: Optional variable_scope.
  Returns:
    The ResNet unit's output.
  """
  with tf.variable_scope(scope, 'bottleneck_v2', [inputs]) as sc:#对inputs增加域名，默认名称是'bottleneck_v2'
    depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)#获得输入的通道数
    preact = slim.batch_norm(inputs, activation_fn=tf.nn.relu, scope='preact')#做一次批归一化
    if depth == depth_in:#如果输入通道和depth参数相同，先对输入做个下采样
      shortcut = resnet_utils.subsample(inputs, stride, 'shortcut')
    else:#否则就使用1×1的卷积改变一下输入的通道数，变成depth个通道（构造残差块的其中一个分支）
      shortcut = slim.conv2d(preact, depth, [1, 1], stride=stride,
                             normalizer_fn=None, activation_fn=None,
                             scope='shortcut')
    #下面这个卷积和尺度分别为1-3-1的结构，构成了残差块的另一个分支。注意剧中的卷积层输出通道是depth_bottleneck，残差网络的亮点之一
    residual = slim.conv2d(preact, depth_bottleneck, [1, 1], stride=1,
                           scope='conv1')
    residual = resnet_utils.conv2d_same(residual, depth_bottleneck, 3, stride,
                                        rate=rate, scope='conv2')
    residual = slim.conv2d(residual, depth, [1, 1], stride=1,
                           normalizer_fn=None, activation_fn=None,
                           scope='conv3')
    #两个分支进行相加得到输出
    output = shortcut + residual

    return slim.utils.collect_named_outputs(outputs_collections,
                                            sc.name,
                                            output) #将output赋予sc,name，并进行收集

#resnet_v2函数，传入10个参数
#inputs：输入 [batch, height_in, width_in, channels]
#blocks：构建的残差组（块）
#num_classes：如果要分类，则传入最终的分类数目，多在图像分类时使用
#is_training：指示批归一化层训练时是否需要更新均值或者方差
#global_pool：是否在网络尾部进行特征高与宽尺度上的均值操作
#output_stride：相当重要的参数，指定网络输出与输入尺度的比值
#include_root_block：指定在残差组之前是否进行额外的卷积与最大池化计算（一般都需要）
#spatial_squeeze：是否削减网络输出的高与宽维度，多在图像分类时使用
#reuse：是否重用参数
#scope：指定函数中的各变量域名前缀
def resnet_v2(inputs,
              blocks,
              num_classes=None,
              is_training=True,
              global_pool=True,
              output_stride=None,
              include_root_block=True,
              spatial_squeeze=True,
              reuse=None,
              scope=None):
  """Generator for v2 (preactivation) ResNet models.
  This function generates a family of ResNet v2 models. See the resnet_v2_*()
  methods for specific model instantiations, obtained by selecting different
  block instantiations that produce ResNets of various depths.
  Training for image classification on Imagenet is usually done with [224, 224]
  inputs, resulting in [7, 7] feature maps at the output of the last ResNet
  block for the ResNets defined in [1] that have nominal stride equal to 32.
  However, for dense prediction tasks we advise that one uses inputs with
  spatial dimensions that are multiples of 32 plus 1, e.g., [321, 321]. In
  this case the feature maps at the ResNet output will have spatial shape
  [(height - 1) / output_stride + 1, (width - 1) / output_stride + 1]
  and corners exactly aligned with the input image corners, which greatly
  facilitates alignment of the features to the image. Using as input [225, 225]
  images results in [8, 8] feature maps at the output of the last ResNet block.
  For dense prediction tasks, the ResNet needs to run in fully-convolutional
  (FCN) mode and global_pool needs to be set to False. The ResNets in [1, 2] all
  have nominal stride equal to 32 and a good choice in FCN mode is to use
  output_stride=16 in order to increase the density of the computed features at
  small computational and memory overhead, cf. http://arxiv.org/abs/1606.00915.
  Args:
    inputs: A tensor of size [batch, height_in, width_in, channels].
    blocks: A list of length equal to the number of ResNet blocks. Each element
      is a resnet_utils.Block object describing the units in the block.
    num_classes: Number of predicted classes for classification tasks.
      If 0 or None, we return the features before the logit layer.
    is_training: whether batch_norm layers are in training mode.
    global_pool: If True, we perform global average pooling before computing the
      logits. Set to True for image classification, False for dense prediction.
    output_stride: If None, then the output will be computed at the nominal
      network stride. If output_stride is not None, it specifies the requested
      ratio of input to output spatial resolution.
    include_root_block: If True, include the initial convolution followed by
      max-pooling, if False excludes it. If excluded, `inputs` should be the
      results of an activation-less convolution.
    spatial_squeeze: if True, logits is of shape [B, C], if false logits is
        of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
        To use this parameter, the input images must be smaller than 300x300
        pixels, in which case the output logit layer does not contain spatial
        information and can be removed.
    reuse: whether or not the network and its variables should be reused. To be
      able to reuse 'scope' must be given.
    scope: Optional variable_scope.
  Returns:
    net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
      If global_pool is False, then height_out and width_out are reduced by a
      factor of output_stride compared to the respective height_in and width_in,
      else both height_out and width_out equal one. If num_classes is 0 or None,
      then net is the output of the last ResNet block, potentially after global
      average pooling. If num_classes is a non-zero integer, net contains the
      pre-softmax activations.
    end_points: A dictionary from components of the network to the corresponding
      activation.
  Raises:
    ValueError: If the target output_stride is not valid.
  """
  with tf.variable_scope(scope, 'resnet_v2', [inputs], reuse=reuse) as sc: #对inputs增加域名，默认名称是'resnet_v2'
    end_points_collection = sc.original_name_scope + '_end_points' #修改end_points_collection字符串
    with slim.arg_scope([slim.conv2d, bottleneck,
                         resnet_utils.stack_blocks_dense],
                        outputs_collections=end_points_collection): #为使用的函数设置默认参数
      with slim.arg_scope([slim.batch_norm], is_training=is_training): #为批归一化操作设置is_training参数
        net = inputs #先将net设置为输入
        if include_root_block: #如果要在残差组之前设置额外的卷积层
          if output_stride is not None: #如果output_stride参数不为None
            if output_stride % 4 != 0: #核验output_stride参数能不能被4整除
              raise ValueError('The output_stride needs to be a multiple of 4.')
            output_stride /= 4 #将output参数除以4，因为经过上面的卷积层与最大池化层特征尺度已经缩小了4倍
          # We do not include batch normalization or activation functions in
          # conv1 because the first ResNet unit will perform these. Cf.
          # Appendix of [2].
          with slim.arg_scope([slim.conv2d],
                              activation_fn=None, normalizer_fn=None): #为slim的卷积操作设置默认值
            net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1') #使用7×7的卷积核进行步长为2的卷积
          net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1') #使用3×3的窗口进行步长为2的最大池化
        net = resnet_utils.stack_blocks_dense(net, blocks, output_stride) #使用残差组对数据进行运算
        # This is needed because the pre-activation variant does not have batch
        # normalization or activation functions in the residual unit output. See
        # Appendix of [2].
        net = slim.batch_norm(net, activation_fn=tf.nn.relu, scope='postnorm') #做一次批归一化操作
        # Convert end_points_collection into a dictionary of end_points.
        end_points = slim.utils.convert_collection_to_dict(
            end_points_collection) #将end_points_collection转化为字典类型的end_points

        if global_pool: #如果global_pool参数为True
          # Global average pooling.
          net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True) #在输出特征的宽和高两个维度上进行均值化操作
          end_points['global_pool'] = net #在end_points中增加global_pool
        if num_classes: #如果设置了num_classes
          net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
                            normalizer_fn=None, scope='logits') #通过1×1的卷积输出num_classes个通道
          end_points[sc.name + '/logits'] = net #在end_points中增加sc.name + '/logits'
          if spatial_squeeze: #如果spatial_squeeze为True
            net = tf.squeeze(net, [1, 2], name='SpatialSqueeze') #消减宽和高两个维度（请注意为了确保宽和高的shape为1，输入图片长宽必须小于300）
            end_points[sc.name + '/spatial_squeeze'] = net #在end_points中增加sc.name + '/spatial_squeeze'
          end_points['predictions'] = slim.softmax(net, scope='predictions') #在end_points中增加predictions
        return net, end_points #返回net与end_points
resnet_v2.default_image_size = 224

#构造残差组的函数，主要使用bottleneck函数构造残差块
def resnet_v2_block(scope, base_depth, num_units, stride):
  """Helper function for creating a resnet_v2 bottleneck block.
  Args:
    scope: The scope of the block.
    base_depth: The depth of the bottleneck layer for each unit.
    num_units: The number of units in the block.
    stride: The stride of the block, implemented as a stride in the last unit.
      All other units have stride=1.
  Returns:
    A resnet_v2 bottleneck block.
  """
  #调用resnet_utils.py文件里面的命名元组构造残差组，需要三个值：scope表名称，unit_fn表操作函数名（此处是bottleneck函数），args表给unit_fn传入的参数
  #在这里给bottleneck函数传入了三个参数，第一个是depth，即残差块中首尾卷积层的通道数；第二个是depth_bottleneck，即残差块中居中卷积层的通道数；第三个是stride，即残差组对输入的缩小尺度
  #一个残差组一共有num_units个残差块，最后一个残差块才会改变输入尺度，其他残差块的步长（stride）都为1
  return resnet_utils.Block(scope, bottleneck, [{
      'depth': base_depth * 4,
      'depth_bottleneck': base_depth,
      'stride': 1
  }] * (num_units - 1) + [{
      'depth': base_depth * 4,
      'depth_bottleneck': base_depth,
      'stride': stride
  }])
resnet_v2.default_image_size = 224

#函数入口，传入8个参数
def resnet_v2_50(inputs,
                 num_classes=None,
                 is_training=True,
                 global_pool=True,
                 output_stride=None,
                 spatial_squeeze=True,
                 reuse=None,
                 scope='resnet_v2_50'):
  """ResNet-50 model of [1]. See resnet_v2() for arg and return description."""
  #构造残差组，一共分成四组，每一组中有四个参数：名字，基本通道数，残差块个数，对输入的缩小尺度
  blocks = [
      resnet_v2_block('block1', base_depth=64, num_units=3, stride=2),
      resnet_v2_block('block2', base_depth=128, num_units=4, stride=2),
      resnet_v2_block('block3', base_depth=256, num_units=6, stride=2),
      resnet_v2_block('block4', base_depth=512, num_units=3, stride=1),
  ]
  #在残差组构造完毕之后，进入resnet_v2函数进行最终的运算
  return resnet_v2(inputs, blocks, num_classes, is_training=is_training,
                   global_pool=global_pool, output_stride=output_stride,
                   include_root_block=True, spatial_squeeze=spatial_squeeze,
                   reuse=reuse, scope=scope)
resnet_v2_50.default_image_size = resnet_v2.default_image_size


def resnet_v2_101(inputs,
                  num_classes=None,
                  is_training=True,
                  global_pool=True,
                  output_stride=None,
                  spatial_squeeze=True,
                  reuse=None,
                  scope='resnet_v2_101'):
  """ResNet-101 model of [1]. See resnet_v2() for arg and return description."""
  blocks = [
      resnet_v2_block('block1', base_depth=64, num_units=3, stride=2),
      resnet_v2_block('block2', base_depth=128, num_units=4, stride=2),
      resnet_v2_block('block3', base_depth=256, num_units=23, stride=2),
      resnet_v2_block('block4', base_depth=512, num_units=3, stride=1),
  ]
  return resnet_v2(inputs, blocks, num_classes, is_training=is_training,
                   global_pool=global_pool, output_stride=output_stride,
                   include_root_block=True, spatial_squeeze=spatial_squeeze,
                   reuse=reuse, scope=scope)
resnet_v2_101.default_image_size = resnet_v2.default_image_size


def resnet_v2_152(inputs,
                  num_classes=None,
                  is_training=True,
                  global_pool=True,
                  output_stride=None,
                  spatial_squeeze=True,
                  reuse=None,
                  scope='resnet_v2_152'):
  """ResNet-152 model of [1]. See resnet_v2() for arg and return description."""
  blocks = [
      resnet_v2_block('block1', base_depth=64, num_units=3, stride=2),
      resnet_v2_block('block2', base_depth=128, num_units=8, stride=2),
      resnet_v2_block('block3', base_depth=256, num_units=36, stride=2),
      resnet_v2_block('block4', base_depth=512, num_units=3, stride=1),
  ]
  return resnet_v2(inputs, blocks, num_classes, is_training=is_training,
                   global_pool=global_pool, output_stride=output_stride,
                   include_root_block=True, spatial_squeeze=spatial_squeeze,
                   reuse=reuse, scope=scope)
resnet_v2_152.default_image_size = resnet_v2.default_image_size


def resnet_v2_200(inputs,
                  num_classes=None,
                  is_training=True,
                  global_pool=True,
                  output_stride=None,
                  spatial_squeeze=True,
                  reuse=None,
                  scope='resnet_v2_200'):
  """ResNet-200 model of [2]. See resnet_v2() for arg and return description."""
  blocks = [
      resnet_v2_block('block1', base_depth=64, num_units=3, stride=2),
      resnet_v2_block('block2', base_depth=128, num_units=24, stride=2),
      resnet_v2_block('block3', base_depth=256, num_units=36, stride=2),
      resnet_v2_block('block4', base_depth=512, num_units=3, stride=1),
  ]
  return resnet_v2(inputs, blocks, num_classes, is_training=is_training,
                   global_pool=global_pool, output_stride=output_stride,
                   include_root_block=True, spatial_squeeze=spatial_squeeze,
                   reuse=reuse, scope=scope)
resnet_v2_200.default_image_size = resnet_v2.default_image_size

然后是resnet_utils.py的源码：

# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains building blocks for various versions of Residual Networks.
Residual networks (ResNets) were proposed in:
  Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
  Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2015
More variants were introduced in:
  Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
  Identity Mappings in Deep Residual Networks. arXiv: 1603.05027, 2016
We can obtain different ResNet variants by changing the network depth, width,
and form of residual unit. This module implements the infrastructure for
building them. Concrete ResNet units and full ResNet networks are implemented in
the accompanying resnet_v1.py and resnet_v2.py modules.
Compared to https://github.com/KaimingHe/deep-residual-networks, in the current
implementation we subsample the output activations in the last residual unit of
each block, instead of subsampling the input activations in the first residual
unit of each block. The two implementations give identical results but our
implementation is more memory efficient.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import tensorflow as tf

slim = tf.contrib.slim

#Block类构造命名元组
class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
    """
    A named tuple describing a ResNet block
    """

def subsample(inputs, factor, scope=None):
    if factor == 1: #如果factor为1，直接返回
        return inputs
    else: #否则就按照factor进行最大池化操作（对inputs的缩小）
        return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope)

#构造了same卷积操作，通过pad数据使得卷积的输出严格按照stride参数进行缩小。
def conv2d_same(inputs, num_outputs, kernel_size, stride, rate=1, scope=None):
    if stride == 1:
        return slim.conv2d(inputs, num_outputs, kernel_size, stride=1, rate=rate,
                       padding='SAME', scope=scope)
    else:
        kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
        pad_total = kernel_size_effective - 1
        pad_beg = pad_total // 2
        pad_end = pad_total - pad_beg
        inputs = tf.pad(inputs,
                    [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
        return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride,
                       rate=rate, padding='VALID', scope=scope)


#stack_blocks_dense函数主要将残差块串起来，并根据output_stride进行调整
@slim.add_arg_scope
def stack_blocks_dense(net, blocks, output_stride=None,
                       store_non_strided_activations=False,
                       outputs_collections=None):
    # The current_stride variable keeps track of the effective stride of the
    # activations. This allows us to invoke atrous convolution whenever applying
    # the next residual unit would result in the activations having stride larger
    # than the target output_stride.
    current_stride = 1 #设置current_stride参数，可以帮助我们参考为了满足output_stride，什么时候开始调用孔洞卷积

    # The atrous convolution rate parameter.
    rate = 1 #设置空洞卷积的rate参数

    for block in blocks: #这个for循环表示，一个一个地顺序处理残差组
        with tf.variable_scope(block.scope, 'block', [net]) as sc: #对net增加域名，默认名称是'block'
            block_stride = 1 #先将block_stride参数设置为1
            for i, unit in enumerate(block.args): #这个for循环表示，一个一个地顺序处理残差组中的残差块
                if store_non_strided_activations and i == len(block.args) - 1: #如果store_non_strided_activations参数为True并处理到了残差组中最后一个残差块
                    # Move stride from the block's last unit to the end of the block.
                    block_stride = unit.get('stride', 1) #先将最后一个残差块的stride值记录在block_stride中
                    unit = dict(unit, stride=1) #再将最后一个残差块的stride设置为1

                with tf.variable_scope('unit_%d' % (i + 1), values=[net]): #在处理每一个残差块的时候，对net增加域名，默认名称是'unit_(i+1)'
                    # If we have reached the target output_stride, then we need to employ
                    # atrous convolution with stride=1 and multiply the atrous rate by the
                    # current unit's stride for use in subsequent layers.
                    if output_stride is not None and current_stride == output_stride: #如果output_stride不为None并且current_stride等于output_stride
                        net = block.unit_fn(net, rate=rate, **dict(unit, stride=1)) #遇到残差块的stride为2时，就做stride为1的，放大倍数为rate的孔洞卷积，相当于不缩小图像尺度
                        rate *= unit.get('stride', 1) #遇到残差块的stride为2时，rate参数会扩大两倍

                    else:
                        net = block.unit_fn(net, rate=1, **unit) #如果output_stride为None或者current_stride还不等于output_stride时
                        current_stride *= unit.get('stride', 1) #如果遇到残差块的stride为2，current_stride会扩大两倍
                        if output_stride is not None and current_stride > output_stride: #如果output_stride不为None并且current_stride大于了output_stride
                            raise ValueError('The target output_stride cannot be reached.') #直接通报异常

            # Collect activations at the block's end before performing subsampling.
            net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net) #将net赋予sc,name，并进行收集

            # Subsampling of the block's output activations.
            if output_stride is not None and current_stride == output_stride: #如果output_stride不为None并且current_stride等于output_stride
                rate *= block_stride #按照block_stride放大孔洞卷积的rate
            else:
                net = subsample(net, block_stride) #否则，在处理完一个残差组后，按照block_stride进行降采样
                current_stride *= block_stride #对current_stride进行放大（如果block_stride不为1的话）
                if output_stride is not None and current_stride > output_stride: #如果output_stride不为None并且current_stride大于了output_stride
                    raise ValueError('The target output_stride cannot be reached.') #直接通报异常

    if output_stride is not None and current_stride != output_stride: #如果output_stride不为None并且current_stride不等于output_stride
        raise ValueError('The target output_stride cannot be reached.') #直接通报异常

    return net


def resnet_arg_scope(weight_decay=0.0001,
                     batch_norm_decay=0.997,
                     batch_norm_epsilon=1e-5,
                     batch_norm_scale=True,
                     activation_fn=tf.nn.relu,
                     use_batch_norm=True):
    batch_norm_params = {
        'decay': batch_norm_decay,
        'epsilon': batch_norm_epsilon,
        'scale': batch_norm_scale,
        'updates_collections': tf.GraphKeys.UPDATE_OPS,
        'fused': None,  # Use fused batch norm if possible.
    }

    with slim.arg_scope(
        [slim.conv2d],
        weights_regularizer=slim.l2_regularizer(weight_decay),
        weights_initializer=slim.variance_scaling_initializer(),
        activation_fn=activation_fn,
        normalizer_fn=slim.batch_norm if use_batch_norm else None,
        normalizer_params=batch_norm_params):
        with slim.arg_scope([slim.batch_norm], **batch_norm_params):
            # The following implies padding='SAME' for pool1, which makes feature
            # alignment easier for dense prediction tasks. This is also used in
            # https://github.com/facebook/fb.resnet.torch. However the accompanying
            # code of 'Deep Residual Learning for Image Recognition' uses
            # padding='VALID' for pool1. You can switch to that choice by setting
            # slim.arg_scope([slim.max_pool2d], padding='VALID').
            with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc:
                return arg_sc

在resnet_v2.py文件中，首先在243行resnet_v2_50函数处进行resnet的调用，resnet_v2_50函数共完成了两件事：

进行残差组的构造。这个功能由resnet_v2.py文件中第217行的resnet_v2_block函数完成，共构建了四个残差组，每个残差组分别由（3, 4, 6, 3）个残差块组成，每个残差组的最后一个残差块进行调整尺度的变更（参见stride参数）。在resnet_v2_block函数中，使用resnet_utils.py文件中第42行定义的Block类中的命名元组进行了残差块的构造。在命名元组中，指定了构造残差块的函数是resnet_v2.py文件中第53行定义的bottleneck函数。可以在bottleneck函数中看到详细的残差块的经典的1-3-1型的结构。
总而言之，在构造残差组时，是从resnet_v2_block->resnet_utils.Block->bottleneck进行的。在这一步中，残差组的结构被确定了下来。
进行残差网络模型的前传。这个功能由resnet_v2函数提供，这个函数定义在resnet_v2.py文件中的第108行。resnet_v2.py文件一共接收10个参数，详见上文的代码逐行解析。
在resnet_v2函数中，见resnet_v2.py文件的代码解析：
第180行，如果include_root_block参数为True，则要在残差组前面进行7×7的卷积与最大池化（190与191行）。然后，再通过reset_utils.py文件中的stack_blocks_dense函数进行残差组的前传。前传完毕后，在192行进行一次批归一化操作。
到这里，如果ResNet只是用来提取特征的话，就可以结束了，因为global_pool和num_classes函数都设置为False或者None。
若global_pool参数为True，则进行特征高和宽两个维度上的平均。若设置了num_classes，就在206行通过1×1的卷积输出num_classes个通道；若spatial_squeeze参数一并为True，则在210行直接消减宽和高两个维度。

经过上面两个操作，就完成了残差网络的前传。

如何在finetune时调节输出特征的尺度

在本篇博客中，笔者解析的残差网络代码主要是用于finetune，即在预训练的网络参数上进行微调。那么，在微调的时候，为了在对齐预训练参数的前提下，灵活地得到不同尺度的特征，应该怎么做呢？

这就需要用到resnet_v2函数中的output_stride参数

在resnet_v2函数中，output_stride函数就是协调输出与输入的尺度大小，一般设定为大于等于4且小于32的2的倍数（比如8,16等）。比如output_stride设定为8，就表示输出特征的长宽是输入长宽的1/8。下面，我们就来看看，output_stride参数在调节输出特征尺度上的具体工作过程。
首先，按照源码中提供的默认设置：

def resnet_v2(inputs,
              blocks,
              num_classes=None,
              is_training=True,
              global_pool=True,
              output_stride=None,
              include_root_block=True,
              spatial_squeeze=True,
              reuse=None,
              scope=None)

当output_stride参数采用默认值，被设定为None时，表示没有指定输出与输入尺度之间具体的缩小倍数。此时，在resnet_v2_50函数中，输出尺度为输入尺度的1/32。因为残差组之前的7×7卷积层与最大池化层将各自的输入的尺度缩小了1/2，然后前三个残差组又会将各自的输入尺度缩小1/2。那么到最后，输出的特征图的尺度就是输入的1/32。在某些领域（比如目标检测，图像分割）中，如此小的特征尺度会降低目标的精准度，因此，一般都需要设置output_stride的值。

当output_stride被设置了值以后，输出特征尺度是如何进行变化的呢。比如output_stride被设定为8，那么，首先在resnet_v2函数中，经过一个卷积层与一个最大池化层，特征尺度就缩小了4倍。那么，在残差块中，特征尺度就只能缩小2倍了，这就完全仰仗stack_blocks_dense函数的功能：

#stack_blocks_dense函数主要将残差块串起来，并根据output_stride进行调整
@slim.add_arg_scope
def stack_blocks_dense(net, blocks, output_stride=None,
                       store_non_strided_activations=False,
                       outputs_collections=None):
    # The current_stride variable keeps track of the effective stride of the
    # activations. This allows us to invoke atrous convolution whenever applying
    # the next residual unit would result in the activations having stride larger
    # than the target output_stride.
    current_stride = 1 #设置current_stride参数，可以帮助我们参考为了满足output_stride，什么时候开始调用孔洞卷积

    # The atrous convolution rate parameter.
    rate = 1 #设置空洞卷积的rate参数

    for block in blocks: #这个for循环表示，一个一个地顺序处理残差组
        with tf.variable_scope(block.scope, 'block', [net]) as sc: #对net增加域名，默认名称是'block'
            block_stride = 1 #先将block_stride参数设置为1
            for i, unit in enumerate(block.args): #这个for循环表示，一个一个地顺序处理残差组中的残差块
                if store_non_strided_activations and i == len(block.args) - 1: #如果store_non_strided_activations参数为True并处理到了残差组中最后一个残差块
                    # Move stride from the block's last unit to the end of the block.
                    block_stride = unit.get('stride', 1) #先将最后一个残差块的stride值记录在block_stride中
                    unit = dict(unit, stride=1) #再将最后一个残差块的stride设置为1

                with tf.variable_scope('unit_%d' % (i + 1), values=[net]): #在处理每一个残差块的时候，对net增加域名，默认名称是'unit_(i+1)'
                    # If we have reached the target output_stride, then we need to employ
                    # atrous convolution with stride=1 and multiply the atrous rate by the
                    # current unit's stride for use in subsequent layers.
                    if output_stride is not None and current_stride == output_stride: #如果output_stride不为None并且current_stride等于output_stride
                        net = block.unit_fn(net, rate=rate, **dict(unit, stride=1)) #遇到残差块的stride为2时，就做stride为1的，放大倍数为rate的孔洞卷积，相当于不缩小图像尺度
                        rate *= unit.get('stride', 1) #遇到残差块的stride为2时，rate参数会扩大两倍

                    else:
                        net = block.unit_fn(net, rate=1, **unit) #如果output_stride为None或者current_stride还不等于output_stride时
                        current_stride *= unit.get('stride', 1) #如果遇到残差块的stride为2，current_stride会扩大两倍
                        if output_stride is not None and current_stride > output_stride: #如果output_stride不为None并且current_stride大于了output_stride
                            raise ValueError('The target output_stride cannot be reached.') #直接通报异常

            # Collect activations at the block's end before performing subsampling.
            net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net) #将net赋予sc,name，并进行收集

            # Subsampling of the block's output activations.
            if output_stride is not None and current_stride == output_stride: #如果output_stride不为None并且current_stride等于output_stride
                rate *= block_stride #按照block_stride放大孔洞卷积的rate
            else:
                net = subsample(net, block_stride) #否则，在处理完一个残差组后，按照block_stride进行降采样
                current_stride *= block_stride #对current_stride进行放大（如果block_stride不为1的话）
                if output_stride is not None and current_stride > output_stride: #如果output_stride不为None并且current_stride大于了output_stride
                    raise ValueError('The target output_stride cannot be reached.') #直接通报异常

    if output_stride is not None and current_stride != output_stride: #如果output_stride不为None并且current_stride不等于output_stride
        raise ValueError('The target output_stride cannot be reached.') #直接通报异常

    return net

在stack_blocks_dense函数中，通过一个current_stride值进行output_stride的监控。如果一旦current_stride与output_stride相等，就通过将原有卷积都变成stride为1的孔洞卷积的方式，避免输出特征尺度的缩小。孔洞卷积的卷积核膨胀倍数由rate参数决定，rate参数会在网络前传中发生变化。

在output_stride设定为8时，在四个残差组中，从第二个残差组的最后一个残差块开始，就开始做膨胀倍数为2，步长为1的卷积；到第三个残差组的最后一个残差块，就开始做膨胀倍数为4，步长为1的卷积；最后，地四个残差组的自后一个残差块，做膨胀系数为8，步长为1的卷积。

以上，就阐释清楚了output_stride参数如何控制输出特征的尺度。本质是将原有的卷积操作，均转化成了卷积核膨胀倍数为rate，步长为1的孔洞卷积，这样就不会使输出的特征维度发生变化了！

到这里，本篇博文就接近尾声了。衷心希望，能对各位读者朋友的学习与工作有帮助。

欢迎阅读笔者后续博客，各位读者朋友的支持与鼓励是我最大的动力！

written by jiong
“我们都在努力奔跑，我们都是追梦人！”