ResNet学习

最新推荐文章于 2024-07-31 11:07:57 发布

Xxy_

最新推荐文章于 2024-07-31 11:07:57 发布

阅读量5.1w

点赞数 18

分类专栏：深度学习翻译

本文链接：https://blog.csdn.net/xxy0118/article/details/78324256

版权

本文结构：

我的阅读笔记

1.ResNet之Building block

2.ResNet之CIFAR-10实验结构

其他资料

1.ResNet作者何凯明博士在ICML2016上的tutorial演讲

2.Bottleneck

3.diss ResNet的论文

论文翻译

---------------------------------------------------------------------------------------------------------------------

我的阅读笔记

1.ResNet之Building block

以下内容为我的理解，如有不正确的地方，还望各位大神指导！

如图所示为截选自ResNet-34的部分Buildingblock，论文作者对于identiy shortcut和projection shortcut的两种options的描述，让我困惑了很久，一直在纠结到底为什么经过了3×3的卷积层以后，仍然能够保持输入输出的一致？

以下是我对这个问题的理解：

首先，为了方便，我将buildingblock区分为两类：

a. 第一类Building block（BB1）如上图中实线部分的building block所示，特点为输入输出的维度一致，特征图个数也一致；

b. 第二类Building block（BB2）如上图虚线部分的building block所示，特点为输出是输入维度的1/2，输出特征图个数是输入特征图个数的2倍（即执行了/2操作）。

区分了两类Building block后，来来来跟我一起仔细读一下论文：

1. “The identity shortcuts (Eqn.(1)) can be directly used when theinput and output are of the same dimensions (solid line shortcuts in Fig. 3).”

What？对于BB1，让我直接相加？输入都经过两次3×3的卷积操作了啊喂，维度不一样怎么相加！好吧，经过查阅资料，作者可能委婉的表达了中间过程，但是我没有发现吧。我琢磨着中间过程应该如下所示：

好啦，这下可以“can bedirectly used”了。接着读论文：

2. “When the dimensions increase (dotted line shortcuts in Fig. 3),we consider two options: (A) The shortcut still performs identity mapping, withextra zero entries padded for increasing dimensions. This option introduces noextra parameter; (B) The projection shortcut in Eqn.(2) is used to matchdimensions (done by 1×1 convolutions). For both options, when the shortcuts goacross feature maps of two sizes, they are performed with a stride of 2.”

对于BB2，作者提供了两种选择：(A)如BB1的处理一样，0填充技术，只是要填充好多0啊，这也是为什么得到实验4.1中的ResidualNetworks部分的“B is slightly better than A. We argue that this is because thezero-padded dimensions in A indeed have no residual learning.”的结论（P6右侧中间）。(B)采用公式（2）的projectionshortcut，让Ws与输入做步长为2的1×1的卷积操作，这样，输入和输出就具有相同的维数，接下来在进行相加操作就OK啦！过程如下图所示：

2.ResNet之CIFAR-10实验结构

哎呀，对于我的理解能力来说，作者对基于ResNet的CIFAR-10的实验网络结构描述的太混乱了！好不容易才搞清楚的。以n=3，20层的ResNet为例，具体结构如下表所示：

Output map size	Output_size	20-layer ResNet
Conv1	32×32	{3×3,16}
Conv2_x	32×32	{3×3,16; 3×3,16}×3
Conv3_x	16×16	{3×3,32; 3×3,32}×3
Conv4_x	8×8	{3×3,64; 3×3,64}×3
InnerProduct	1×1	Average pooling 10-d fc