论文笔记 | Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

1 Introuduction

In this work we study the combination of the two most recent ideas: Residual connections and Inception v3. We replace the filter concatenation stage of the Inception architecture with residual connections.
Besides a straightforwad integration, we have also designed a new version named Inception -v4.

However the use of residual connections seems to imporove the training speed greadly.

3 Architectural Choices

3.1 Inception -v4

这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述
Convolutions marked with V are valid padded, meaning that input patch of each unit is fully contained in the previous layer and the gid size of the output activation map is reduced accordingly.

3.2 Residual Inception blocks

Inception-resnet-v1 and Inception-ResNet v2
这里写图片描述
IRV1 roughly the computational cost of Inception-v3, v2maeches the raw coset of the newly introduced Inceptionv4, but inceptionv4 was proved to be dignificantly slower in practice.
这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述

3.3 Scaling of the Residuals

Also we found that if the number of filters exceeded 1000 , the residual variants started to exhibit instabilities and the network has just ‘died’ early in the training, meaning that the last layer before the average pooling started to produce only zeros after a few tens of thousands of iterations. this could not be prevented , neither by lowering the lr nor by adding an extra batch-nomalization to this layer.
We found that scaliing down the residual before adding them to the previous layer activation seemed to stabilize the training. In gerneral we picked some scaling factors between 0.1 and 0.3 to scale the residuals before their being added to the accumulated layer activations:
这里写图片描述
Even where the scaling was not strictly necessary, it never seemed to harm the final accuracy, but it helped to stabilize the training.

A similar instability was observed by He kaiming, they suggest a two-phase training in the case of very deep residual networks. first warm-up with lower lr, followed by a second phase with high lr.

4 result

这里写图片描述

5 Conclusion

  1. Inception-resnet-v1: a hybird inception version that has a simliar computional cost to Inceptionv3
  2. I-R-v2: costlier, but significantly imporved recognition performance.
  3. Inception-v4: roughly the same rocognition performance as Inception-Resnet-v2
  4. Residual connections leads to dramatically improved training speed for the Inception architecture.

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值