Multi-branch convolutional networks
这是在学习阅读Selective Kernel Networks论文中出现的related work。
Highway networks
R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv preprint arXiv:1505.00387, 2015.
highway network是主要讲的是什么呢,阅读其论文,主要描述的就是提出了一个门控函数。
那么highway network的输出定义为 Y = T(x) * H(x) + (1-T(x)) * X
不过highway networks是在2015年提出来的。
准确的说 是ResNet的发明借鉴 了 highway networks的思路。
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
multi-residual networks
M. Abdi and S. Nahavandi. Multi-residual networks. arxiv preprint. arXiv preprint arXiv:1609.05672, 2016.
In this article, we take one step toward understanding the learning behavior of deep residual networks, and supporting the observation that deep residual networks behave like ensembles. We propose a new convolutional neural network architecture which builds upon the success of residual networks by explicitly exploiting the interpretation of very deep networks as an ensemble. The proposed multi-residual network increases the number of residual functions in the residual blocks. Our architecture generates models that are wider, rather than deeper, which significantly improves accuracy. We show that our model achieves an error rate of 3.73% and 19.45% on CIFAR-10 and CIFAR-100 respectively, that outperforms almost all of the existing models. We also demonstrate that our model outperforms very deep residual networks by 0.22% (top-1 error) on the full ImageNet 2012 classification dataset. Additionally, inspired by the parallel structure of multi-residual networks, a model parallelism technique has been investigated. The model parallelism method distributes the computation of residual blocks among the processors, yielding up to 15% computational complexity improvement.
k = 10的14层的multi-resnet和resnet110的参数量达到一致,同时在CIFAR-10数据集上面的top-1 error比ResNet110多出0.21%.
Multilevel ResNets
K. Zhang, M. Sun, X. Han, X. Yuan, L. Guo, and T. Liu. Residual networks of residual networks: Multilevel residual networks. Transactions on Circuits and Systems for Video Technology, 2017
A residual-networks family with hundreds or even thousands of layers dominates major image recognition tasks, but building a network by simply stacking residual blocks inevitably limits its optimization ability. This paper proposes a novel residual-network architecture, Residual networks of Residual networks (RoR), to dig the optimization ability of residual networks. RoR substitutes optimizing residual mapping of residual mapping for optimizing original residual mapping.In particular, RoR adds level-wise shortcut connections upon original residual networks to promote the learning capability of residual networks. More importantly, RoR can be applied to various kinds of residual networks (ResNets, Pre-ResNets and WRN) and significantly boost their performance. Our experiments demonstrate the effectiveness and versatility of RoR, where it achieves the best performance in all residual-network-like structures. Our RoR-3-WRN58-4+SD models achieve new state-of-the-art results on CIFAR-10, CIFAR-100 and SVHN, with test errors 3.77%, 19.73% and 1.59%, respectively. RoR-3 models also achieve state-of-the-art results compared to ResNets on ImageNet data set.
Multilevel ResNets在原有的ResNet上添加了多个shortcuts来增加其丰富性。
可以看出对于对于Top-1 error。
添加了多个shortcuts的Multilevel ResNets其效果增加不足0.2%-0.4%。
提出了一种新的ResNet Unit,能够更好的训练和更优的精度。
K. He, X. Zhang, S. Ren, and J. Sun, “Identity mapping in deep residual networks,” in ECCV 2016
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which makes training easier and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10 (4.62% error) and CIFAR-100, and a 200-layer ResNet on ImageNet.
本文提出的,称之为full pre-activation Resnet
full pre-activation。
这里的 灰色直线指的就是shortcut,。
这里的weight指的是 weight layer
Wide residual networks
S. Zagoruyko, and N. Komodakis, “Wide residual networks,” in BMVC 2017
wide residual networks到底是怎么样一个道理呢?
[1]C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
[2]S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[3] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016.
[4]C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, 2017.