Multi-branch convolutional networks
这是在学习阅读Selective Kernel Networks论文中出现的related work。
从来没接触过,没了解过。今天学习记录一下。
Highway networks
R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv preprint arXiv:1505.00387, 2015.
highway network是主要讲的是什么呢,阅读其论文,主要描述的就是提出了一个门控函数。
对于一个卷积层的输入X,设门控函数为T(x),卷积操作为H(x),
那么highway network的输出定义为 Y = T(x) * H(x) + (1-T(x)) * X
这种思想类似于ResNet的短路连接。
不过highway networks是在2015年提出来的。
准确的说 是ResNet的发明借鉴 了 highway networks的思路。
ResNet
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
大名鼎鼎的ResNet就不用多介绍了,这里我就放一张图片。
multi-residual networks
M. Abdi and S. Nahavandi. Multi-residual networks. arxiv preprint. arXiv preprint arXiv:1609.05672, 2016.
多分枝残差网络。这到底是个什么东西呢,还得看论文怎么描述。
In this article, we take one step toward understanding the learning behavior of deep residual networks, and supporting the observation that deep residual networks behave like ensembles. We propose a new convolutional neural network architecture which builds upon the success of residual networks by explicitly exploiting the interpretation of very deep networks as an ensemble. The proposed multi-residual network increases the number of residual functions in the residual blocks. Our architecture generates models that are wider, rather than deeper, which significantly improves accuracy. We show that our model achieves an error rate of 3.73% and 19.45% on CIFAR-10 and CIFAR-100 respectively, that outperforms almost all of the existing models. We also demonstrate that our model outperforms very deep residual networks by 0.22% (top-1 error) on the full ImageNet 2012 classification dataset. Additionally, inspired by the parallel structure of multi-residual networks, a model parallelism technique has been investigated. The model parallelism method distributes the computation of residual blocks among the processors, yielding up to 15% computational complexity improvement.
多分枝残差网络不是在深度上而是在宽度上做文章。
使ResNet的宽度更深同样也能增加模型的accuracy。
多分枝残差网络这样做明显增加了参数量。
文中有一个参数对比。
k = 10的14层的multi-resnet和resnet110的参数量达到一致,同时在CIFAR-10数据集上面的top-1 error比ResNet110多出0.21%.
Multilevel ResNets
K. Zhang, M. Sun, X. Han, X. Yuan, L. Guo, and T. Liu. Residual networks of residual networks: Multilevel residual networks. Transactions on Circuits and Systems for Video Technology, 2017
A residual-networks family with hundreds or even thousands of layers dominates major image recognition tasks, but building a network by simply stacking residual blocks inevitably limits its optimization ability. This paper proposes a novel residual-network architecture, Residual networks of Residual networks (RoR), to dig the optimization ability of residual networks. RoR substitutes optimizing residual mapping of residual mapping for optimizing original residual mapping.In particular, RoR adds level-wise shortcut connections upon original residual networks to promote the learning capability of residual networks. More importantly, RoR can be applied to various kinds of residual networks (ResNets, Pre-ResNets and WRN) and significantly boost their performance. Our experiments demonstrate the effectiveness and versatility of RoR, where it achieves the best performance in all residual-network-like structures. Our RoR-3-WRN58-4+SD models achieve new state-of-the-art results on CIFAR-10, CIFAR-100 and SVHN, with test errors 3.77%, 19.73% and 1.59%, respectively. RoR-3 models also achieve state-of-the-art results compared to ResNets on ImageNet data set.
Multilevel ResNets在原有的ResNet上添加了多个shortcuts来增加其丰富性。
可以看出对于对于Top-1 error。
添加了多个shortcuts的Multilevel ResNets其效果增加不足0.2%-0.4%。
但是由于没有增加参数,只是增加了短路连接,不影响模型的参数量。
提出了一种新的ResNet Unit,能够更好的训练和更优的精度。
K. He, X. Zhang, S. Ren, and J. Sun, “Identity mapping in deep residual networks,” in ECCV 2016
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which makes training easier and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10 (4.62% error) and CIFAR-100, and a 200-layer ResNet on ImageNet.
本文提出的,称之为full pre-activation Resnet
感觉就是加入了BN和ReLU,然后取了名字叫做
full pre-activation。
这里的 灰色直线指的就是shortcut,。
这里的weight指的是 weight layer
Wide residual networks
S. Zagoruyko, and N. Komodakis, “Wide residual networks,” in BMVC 2017
核心:深度缩减,宽度加深,这样的话28层,每层宽度为10的WRN28网络,精度比普通的ResNet要高很多,不过params暴增。
wide residual networks到底是怎么样一个道理呢?
也就是说让每个卷积层的输出通道变多,
卷积核大小不变,但是输出的通道增加了k倍,所以参数量增加了好多倍。
InceptionNets
[1]C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
[2]S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[3] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016.
[4]C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, 2017.
这个博主讲的非常好,但是具体细致理解还是得看原论文。
https://www.cnblogs.com/vincent1997/p/10920036.html