【ResNext】《Aggregated Residual Transformations for Deep Neural Networks》

最新推荐文章于 2024-07-25 08:49:42 发布

bryant_meng

最新推荐文章于 2024-07-25 08:49:42 发布

阅读量1.8k

点赞数 4

分类专栏： CNN / Transformer 文章标签： ResNeXt Group Convolution

本文链接：https://blog.csdn.net/bryant_meng/article/details/86605287

版权

CNN / Transformer 专栏收录该内容

210 篇文章 7 订阅

订阅专栏

在这里插入图片描述
CVPR-2017

Torch 版代码：https://github.com/facebookresearch/ResNeXt
Caffe 版代码：https://github.com/soeaver/caffe-model/tree/master/cls/resnext
Caffe 代码可视化工具：http://ethereon.github.io/netscope/#/editor

文章目录

1 Background and Motivation
2 Innovations / Contributions
3 Advantages
4 Related work
5 Method
- 5.1 split-transform-aggregate（modules）
- 5.2 Architecture
6 Experiments
7 Conclusion

1 Background and Motivation

Research on visual recognition is undergoing a transition from “feature engineering” to “network engineering”.

Inception famliy 带来的启发是 split-transform-merge

split：1x1
transform：3x3，5x5（passway）
merge：concatenate

作者说 inception family carefully designed topologies are able to achieve compelling accuracy with low theoretical complexity（哈哈哈哈哈哈哈）

Inception 有太多超参数要去 design 了

the filter numbers and sizes are tailored for each individual transformation（也即 inception module 里面分支的设计）
the modules are customized stage-by-stage

这样会导致 it is in general unclear how to adapt the Inception architectures to new datasets / tasks

作者在 inception 的基础上，采用 VGG / ResNet’s strategy of repeating layers，利用组卷积的思想（split-transform-merge strategy），提出了 ResNext。结构设计更加规范！

在这里插入图片描述

2 Innovations / Contributions

1）提出了 ResNext 结构
2）将 group convolution 发扬光大（split-transform-aggregate）

3 Advantages

ILSVRC 2016 classification task （2nd place）
better results than its ResNet counterpart（ImageNet-5K set，COCO detection set）

ImageNet-5K set 是 5000 classes，要知道，resnet 相当爆炸了，出来的时候几乎横扫了视觉任务竞赛的榜单！这个比 resnet 效果还好！

increasing cardinality is a more effective way of gaining accuracy than going deeper or wider

4 Related work

Multi-branch convolutional networks
inception family、resnet （two-branch）、Deep neural decision forests
Grouped convolutions
第一次是出现在 AlexNet，To the best of our knowledge, there has been little evidence on exploiting grouped convolutions to improve accuracy.（用组卷积来提升分类精度的，未有人）
Compressing convolutional networks
These methods [6, 18, 21, 16] have shown elegant compromise of accuracy with lower complexity and smaller model sizes.
Ensembling
把 ResNeXt 看成 ensembling 是 imprecise 的，因为每个 paths 都 trained jointly，而不是 independently

5 Method

注意文中的 width 指的是 number of channels（a group），deep 指的是 number of layers

5.1 split-transform-aggregate（modules）

1）思想起源

传统的 fully connection
在这里插入图片描述
$X = [x_1, x_2, ..., x_D]$ is a D-channel input vector

第一步 $X$ split 成 a low-dimensional embedding $x_i$
第二步 transform， $w_ix_i$
第三步 aggregate， $\sum_{1}^{D}$

合起来 $\sum_{i=1}^{D}w_{i}x_{i}$

2）移花接木
引用到 convolution 2D（split-transform-aggregate）
在这里插入图片描述
作者将 fc 的结构升华下，定义 aggregated transformation 结构如下：

C 表示 cardinality，也即 number of groups， $\tau_i$ should project $x$ into an (optionally low-dimensional) embedding and then transform it.（关于 low-dimensional embedding 的理解参考深度学习中 Embedding层两大作用的个人理解）

加 residual connection 后表示为：
在这里插入图片描述

parameters（差不多）

left： $\approx 70k$
right： $\approx 70k$ （cardinality = 32，width = 4）

3）三种等价的形式

We have trained all three forms and obtained the same results.（选（c）因为 more succinct and faster）
在这里插入图片描述
parameters
（a） $256 * 4 * 32 + 4 * 3 * 3 * 4 * 32 + 4 * 256 * 32 = 70144$
（b） $256 * 4 * 32 + 4 * 3 * 3 * 4 * 32 + 128 * 256 = 70144$
（c） $256 * 128 + 128 / 32 * 3 * 3 * 128 / 32 * 32 + 128 * 256 = 70144$

因为都是三层，每层的 resolution 都一样，所以同 parameters 的话，也同计算量！（c）结构比（a）,（b）结构看上去简洁很多，作者后续的设计都是采用的（c）结构

4）趋利避害
figure 3 （c）的结构，depth 要 $\geq 3$ ，why
在这里插入图片描述
parameters

left： $64 * 3 * 3 * 4 * 32 + 4 * 3 * 3 * 64 * 32 = 147456$
right： $64 * 3 * 3 * 128 + 128 * 3 * 3 * 64 = 147456$

可以看出，如果 depth = 2，和普通的两层卷积等价，no sense！

5.2 Architecture

两个设计准则：

1）分辨率一样，block 的参数都一样，
2）分辨率减半， channles 翻倍

第二条设计准则并不陌生，在 resnet 论文中也有见过，此文的解释也如出一辙，The second rule ensures that the computational complexity, in terms of FLOPs (floating-point operations, in # of multiply-adds), is roughly the same for all blocks.
在这里插入图片描述
1）关于 cardinality 和 width 的理解

$C$ 为 cardinality，也即是 number of groups，
$d = 4$ 表示 $w i d t h = 4$ ，也即每组的 channels 为 4 dimension

$C * d = f i l t e r s$ （见 table 2），值得注意的是，这里的 filters 仅仅指的是第一个 bottleneck block 的 filters（上例子中 $32 * 4 = 128$ ），因为其它 bottleneck block 的 filters 都可以根据 128 结合两个设计准则推导出来。

这里一定要辨别清楚。不然你会懵圈，后面 C = 32 不变， $d$ 还等于4的话，256，512，1024 就解释不通了，这里我困惑了很久！

在这里插入图片描述

2）关于 residual connection（shortcut）的细节

resnext 采用的 shorcut 结构为 resnet-B

	same resolution	down sampling
resnet-A	identity	zero padding
resnet-B	identity	conv（stride=2）
resnet-C	conv（stride=1）	conv（stride=2）

convolution 也可以叫做 a liner projection（mapping）

3）stride = 2 在 bottleneck block 哪层卷积？
bottleneck block 像个夹心饼干，前后两个 $1 * 1$ 卷积，中间一个 group convolution，如果 resolution 降低，那层用 stride = 2 呢？看了下代码 https://github.com/soeaver/caffe-model/tree/master/cls/resnext ，stride =2 用在 group convolution 那层，重复的 bottleneck block 中，第一个bottleneck block负责 down sampling！

6 Experiments

在这里插入图片描述

left： $\approx 70k$
right： $\approx 70k$ （C = 32，d = 4）

从 C（cardinality）和 d（width）两个角度来做实验！

6.1 Experiment on ImageNet 1K

6.1.1 Cardinality vs. Width

solo resnet（ImageNet 1K）
在这里插入图片描述
C 和 d 的设计原则，preserved complexity，多 C 更有效（没有必要更多的组，acc 饱和了）

6.1.2 Increasing Cardinality vs. Deeper / Wider

Cardinality：分的组更多

Deeper：网络的layers更多

Wider：每组的 channels 更多

在这里插入图片描述

结论：increasing cardinality C shows much better results than going deeper or wider、
注意：ResNeXt-101 结果比 ResNet-200 还好（half complexity），侧面说明了 cardinality is a more effective dimension than depth and width

6.1.3 Residual connection

在这里插入图片描述

6.1.4 Comparisons with state-of-the-art results

在这里插入图片描述

6.2 Experiments on ImageNet5K

在这里插入图片描述

6.3 Experiments on CIFAR-10

在这里插入图片描述

6.4 Experiments on COCO object detection

在这里插入图片描述

7 Conclusion

比 ResNet 强，比 inception famliy 设计的更规范和容易，第一次把 group convolution 用来提升精度！弄清楚 cardinality 和 width 的关系！弄清楚作者说的 low-dimensional embedding！

bryant_meng

关注

4
点赞
踩
6

收藏

觉得还不错? 一键收藏
5
评论
【ResNext】《Aggregated Residual Transformations for Deep Neural Networks》

CVPR-2017Torch 版代码：https://github.com/facebookresearch/ResNeXtCaffe 版代码：https://github.com/soeaver/caffe-model/tree/master/cls/resnextCaffe 代码可视化工具：http://ethereon.github.io/netscope/#/editor文章...
复制链接

扫一扫

专栏目录