最近梳理ResNet原始论文https://arxiv.org/pdf/1512.03385.pdf,并进行tensorflow复现,最后介绍后续诸路大神,在ResNet上面的魔改。
1.Motivation
Degradation Problem
Based on the observation that with stacking the layers, accuracy gets saturated and then degrades rapidly, which means that not all systems are similarly easy to optimize.
2.Solution
To address this degradation problem, this paper introduce a deep residual learning framework.
Let the stack layers to fit another mapping, namely
F
(
x
)
=
H
(
x
)
−
x
F(x) = H(x) - x
F(x)=H(x)−x. The original mapping is recast into
F
(
X
)
+
x
F(X) + x
F(X)+x.
This paper hypothesize that it easier to optimize the residual mapping rathre than the original one, unreferenced mapping.
Shortcut connections [2, 34, 49] are those skipping one or more layers. In our case, the shortcut connections simply perform identity mapping, and their outputs are added to the outputs of the stacked layers (Fig. 2).
3.Advantages
- Easy to optimize, but the counterpart “plain” nets (that simply stack layers) exhibit higher training error when the depth increases;
- Easily enjoy accuracy gains from greatly increased depth, producing results substantially better than previous networks.
4.Network
When the dimensions increase (dotted line shortcuts in Fig. 3), we consider two options: (A) The shortcut still performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no extra parameter; (B) The projection shortcut in Eqn.(2) is used to match dimensions (done by 1×1 convolutions). For both options, when the shortcuts go across feature maps of two sizes, they are performed with a stride of 2.