论文笔记 | Deep Residual Learning for Image Recognition

Authors

Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun

Abstract

Residual Networks are easier to optimize and gain accuracy from considerably increased depth, but it have lower complexity than VGGnets.

1 Introduction

这里写图片描述
We denote the desired underlying mapping as H(x)=F(x)+x, We hypothesize that it is easier to optimize the residual mapping than to optimize original, unreferenced mapping. To the extreme, if an identity mapping were optimal, it would be easier to push the residual to zero than to fit an identity mapping by a stack of nonlinear layers.

2 relate work

2.1 Residual representation

F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for
image categorization. In CVPR, 2007.
H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and
C. Schmid. Aggregating local image descriptors into compact codes.
TPAMI, 2012.
W. L. Briggs, S. F. McCormick, et al. A Multigrid Tutorial. Siam,
2000.

2.2 shortcut connections

#highway
R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks.
arXiv:1505.00387, 2015
R. K. Srivastava, K. Greff, and J. Schmidhuber. Training very deep
networks. 1507.06228, 2015.
#LSTM
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural
computation, 9(8):17351780, 1997

3 Deep residual learning

3.1 residual learning

The degradation problem suggests that the solver might have difficults in approximating identity mappings by multiple layers.

3.2 Identity Mapping by Shortcuts

这里写图片描述
we can perform a linear projection by the shortcut connetions to match the dimensions.
这里写图片描述
In this exprements, the F has 2-3 layers , but cannot only 1 layer, that will similar to a linear year.
这里写图片描述
Plain Network The convolutional layer mostly have 3x3 filters and follow two simple design rules:1) for the same output feature map size, the layers have the same number of filters;2) if the feature map size is halved, the number of filters is doubled so as to preserve the time complexity per layer.
Residual Network: when the dimensions increase(dotted line shortcuts), there are two options:1) zeros pads for increaseing dimensions 2)1x1 convolutions(slightly better)

4 Experments

BN ensures forward propagated signals to have non-zero variances.

S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep
network training by reducing internal covariate shift. In ICML, 2015

ResNet eases the optimization by providing faster convergence at the early stage.
shortcut : identity or projection?
这里写图片描述
deep bottleneck architectures
这里写图片描述
Analysis of layer responses
这里写图片描述

Object detection Improvement

box refinement:

S. Gidaris and N. Komodakis. Object detection via a multi-region &
semantic segmentation-aware cnn model. In ICCV, 2015.

global context:
RoI–add a global feature SPP

Conclusions

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
deep residual learning for image recognition是一种用于图像识别的深度残差学习方法。该方法通过引入残差块(residual block)来构建深度神经网络,以解决深度网络训练过程中的梯度消失和梯度爆炸等问题。 在传统的深度学习网络中,网络层数增加时,随之带来的问题是梯度消失和梯度爆炸。这意味着在网络中进行反向传播时,梯度会变得非常小或非常大,导致网络训练变得困难。deep residual learning则使用了残差连接(residual connection)来解决这一问题。 在残差块中,输入特征图被直接连接到输出特征图上,从而允许网络直接学习输入与输出之间的残差。这样一来,即使网络层数增加,也可以保持梯度相对稳定,加速网络训练的过程。另外,通过残差连接,网络也可以更好地捕获图像中的细节和不同尺度的特征。 使用deep residual learning方法进行图像识别时,我们可以通过在网络中堆叠多个残差块来增加网络的深度。这样,网络可以更好地提取图像中的特征,并在训练过程中学习到更复杂的表示。通过大规模图像数据训练,deep residual learning可以在很多图像识别任务中达到甚至超过人类表现的准确性。 总之,deep residual learning for image recognition是一种利用残差连接解决梯度消失和梯度爆炸问题的深度学习方法,通过增加网络深度并利用残差学习,在图像识别任务中获得了突破性的表现。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值