关于ResNet网络的一点理解（网络结构、building block 及 “bottleneck” building block）

最新推荐文章于 2025-04-23 22:18:19 发布

C小C

最新推荐文章于 2025-04-23 22:18:19 发布

阅读量1.2w

点赞数 7

分类专栏： CNN网络结构文章标签： ResNet building Block “bottleneck” 网络结构

本文链接：https://blog.csdn.net/C_chuxin/article/details/82946163

版权

CNN网络结构专栏收录该内容

14 篇文章

订阅专栏

【时间】2018.10.05

【题目】关于ResNet网络的一点理解（网络结构、building block 及 “bottleneck” building block）

概述

本文主要讲解对ResNet网络结构、building block 及 “bottleneck” building block的一些理解，主要讲述了ResNet网络结构的构成，以及building block 如何转换为对应的 “bottleneck” building block。而有关残差的相关内容已经有很多博主进行了详细的阐述，在此就不赘述了。

一、ResNet网络结构

1.1.原论文描述

Plain Network. Our plain baselines (Fig. 3, middle) are mainly inspired by the philosophy of VGG nets [41] (Fig. 3, left). The convolutional layers mostly have 3×3 filters and follow two simple design rules: (i) for the same output feature map size, the layers have the same number of filters; and (ii) if the feature map size is halved, the number of filters is doubled so as to preserve the time complexity per layer. We perform downsampling directly by convolutional layers that have a stride of 2. The network ends with a global average pooling layer and a 1000-way fully-connected layer with softmax. The total number of weighted layers is 34 in Fig. 3 (middle).

Plain网络。我们的plain网络结构(图3，中)主要受VGG网络 (图.3，左)的启发。卷积层主要为3*3的滤波器，并遵循以下两点要求：(i) 输出特征映射尺寸相同的层含有相同数量的滤波器；(ii) 如果特征尺寸减半，则滤波器的数量增加一倍来保证每层的时间复杂度相同。我们直接用步长为2的卷积层进行下采样。网络以一个全局平均池层和一个带有Softmax的1000路全连接层结束。在图3(中)，有权值的层的总数为34 。

图3 对应于ImageNet的网络框架举例。左：VGG-19模型（196亿个FLOPs）作为参考。中：plain网络，含有34个参数层（36 亿个FLOPs）。右：残差网络，含有34个参数层（36亿个FLOPs）。虚线表示的shortcuts增加了维度。Table 1展示了更多细节和其它变体。

表1. 对应于ImageNet的结构框架。括号中为构建块的参数(同样见Fig.5)，数个构建块进行堆叠。下采样由stride为2的conv3_1、conv4_1和conv5_1 来实现。

Residual Network. Based on the above plain network, we insert shortcut connections (Fig. 3, right) which turn the network into its counterpart residual version. The identity shortcuts (Eqn.(1)) can be directly used when the input and output are of the same dimensions (solid line shortcuts in Fig.3). When the dimensions increase(dotted line shortcuts in Fig. 3), we consider two options: (A) The shortcut still performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no extra parameter; (B) The projection shortcut in Eqn.(2) is used to match dimensions (done by 1×1 convolutions). For both options, when the shortcuts go across feature maps of two sizes, they are performed with a stride of 2.

残差网络。基于上述plain网络，我们插入快捷连接(图3,右)将网络转换为对应的残差版本。当输入和输出尺寸相同时(图3中的实线快捷连接)，可以直接使用恒等快捷键(eqn.1)。当维度增加时（Fig.3中的虚线部分），考虑两个选项： (A) shortcut仍然使用恒等映射，在增加的维度上使用0来填充，这样做不会增加额外的参数； (B) 使用Eq.2的映射shortcut来使维度保持一致（通过1*1的卷积）。对于这两个选项，当shortcut跨越两种尺寸的特征图时，均使用stride为2的卷积。

1.2 对网络结构的理解

（1）各个ResNet的结构如表1所示。在较浅层次的网络中（ResNet-18/34），它由下图2中的building Block块构建而成，如表1中红框内表示3个building Block块构成了conv2.x；而在较深的网络中，网络由“bottleneck”building Block块构成，如表1中蓝框内表示3个“bottleneck”building Block块构成了conv2.x；

卷积实现：

（2）在每个卷积块中（conv2.x/conv3.x/conv4.x/conv5.x）,分为两类：一是conv2.x中，它先采用步长为2的最大池化使得特征图的尺寸下降一半（应该是有进行零填充的0），之后是3个building Block块，并且每个building Block块的输入及输出尺寸是一样的。二是在conv3.x/conv4.x/conv5.x中，尺寸的下降一半是通过第一个卷积（conv3_1、conv4_1和conv5_1）采取步长为2实现的，因此，在第一个building Block块中，输入及输出的特征图的尺寸是不一样的，而剩余的building Block块的输入及输出尺寸是一样的。以conv3.x为例，第一个building Block块中输入输出分别56x56和28x28，剩余的building Block块的输入及输出尺寸都是28x28。

（3）对于（2）中所说的，尺寸的下降一半是通过第一个卷积（conv3_1、conv4_1和conv5_1）采取步长为2实现的，个人的理解是只是对图5中红框部分采用了步长为2的卷积，其余的还是步长位1，这个还未验证，之后看了相关的代码的实现再来确定他的实现。

二、“bottleneck”building Block块

2.1.原文描述

Deeper Bottleneck Architectures. Next we describe our deeper nets for ImageNet. Because of concerns on the training time that we can afford, we modify the building block as a bottleneck design 4 . For each residual function F, we use a stack of 3 layers instead of 2 (Fig. 5). The three layers are 1×1, 3×3, and 1×1 convolutions, where the 1×1 layers are responsible for reducing and then increasing (restoring) dimensions, leaving the 3×3 layer a bottleneck with smaller

input/output dimensions. Fig. 5 shows an example, where both designs have similar time complexity.

更深层次的瓶颈架构。接下来，我们将描述我们针对ImageNet的更深层次的网络。由于考虑到我们负担得起的训练时间，我们将积木块（building block）修改为瓶颈设计（bottleneck design）。对于每个残差函数F，我们使用一个由3层组成的堆栈，而不是2层(图5)。这三层分别是1×1、3×3和1×1卷积，其中1×1层负责减小然后增加(恢复)维数，使3×3层成为输入/输出维数较小的瓶颈。图5给出了一个例子，其中两种设计都具有相似的时间复杂度。

图5、ImageNet的一个更深层次的残差函数F。左图：如图3所示的用于ResNet-34的一个及积木块(在56×56特征图上)。右：ResNet-50/101/152的“瓶颈”积木块。

2.2 对“bottleneck”building Block块理解

（1）为何说两种设计都具有相似的时间复杂度？我们以图5为例进行验证，图5对应conv2.x,输出特征图的尺寸为56x56,我们主要计算卷积中的乘法次数，对于图5左边，乘法次数是(3x3x64x56x56x64)x2,对于图5右边，乘法次数是1x1x256x56x56x64+3x3x64x56x56x64+1x1x64x56x56x256,红色部分是一样的，剩余部分由于256=64x4，前者可以分解为3x3x（64x56x56x64），后者可以分解为（4+4）x（64x56x56x64），是9与8的细微差别，所以说具有相似的时间复杂度。

（2）如何将building Block块转换为对应的“bottleneck”building Block块？通过（1）的计算过程，可知，就保证时间复杂度相似而言，只需要将输出的维度变为原来的4倍就行了，比如原本的building Block块是{（3x3,d）,(3x3,d},则转换后的对应“bottleneck”building Block块为{（1x1,d）,(3x3,d),(1x1,4xd)}。