ImageNet Evolution论文笔记（4）

最新推荐文章于 2022-07-11 16:09:50 发布

3602138103

最新推荐文章于 2022-07-11 16:09:50 发布

阅读量443

点赞数

分类专栏：深度学习之图像处理

本文链接：https://blog.csdn.net/qq_27163197/article/details/78356204

版权

深度学习之图像处理专栏收录该内容

18 篇文章 0 订阅

订阅专栏

Deep residual learning for image recognition

degradation problem：with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly。【解决了增加深度带来的副作用（退化问题），这样能够通过单纯地增加网络深度，来提高网络性能。】

Deep Residual Learning

Identity Mapping by Shortcuts
we explicitly let these layers approximate a residual functionF(x) := H(x) − x。The original function thus becomesF(x)+x。
对于残差网络，维度匹配的shortcut连接为实线，反之为虚线。维度不匹配时，同等映射有两种可选方案：一是直接通过zero padding 来增加维度（channel）。二是乘以W矩阵投影到新的空间。实现是用1x1卷积实现的，直接改变1x1卷积的filters数目。这种会增加参数。
这里写图片描述
Network Architectures
1，全是3x3卷积核；积步长2取代池化；使用Batch Normalization【中间的正则化层】,
2，取消Max池化、全连接层、Dropout

更深网络：根据Bootleneck优化残差映射网络【原始：3x3x256x256->3x3x256x256。优化：1x1x256x64->3x3x64x64->1x1x64x256】
这里写图片描述

实验参数

1，The image is resized with its shorter side randomly sampled in [256; 480] for scale augmentation .A 224×224 crop is randomly sampled from an image or its horizontal flip, with the per-pixel mean subtracted。
2，standard color augmentation，adopt batch normalization (BN) right after each convolution and before activation
3，SGD with a mini-batch size of 256. The learning rate，starts from 0.1 and is divided by 10 when the error plateaus, and the models are trained for up to 60 × 104 iterations.
4，a weight decay of 0.0001 and a momentum of 0.9
testing
1，adopt the standard 10-crop testing
2， average the scores at multiple scales (images are resized such that the shorter side is in {224; 256; 384; 480; 640}).