【SENet】《Squeeze-and-Excitation Networks》

最新推荐文章于 2024-05-09 11:10:33 发布

bryant_meng

最新推荐文章于 2024-05-09 11:10:33 发布

阅读量1.8k

点赞数 2

分类专栏： CNN / Transformer 文章标签： SE blcok

本文链接：https://blog.csdn.net/bryant_meng/article/details/88041204

版权

CNN / Transformer 专栏收录该内容

199 篇文章 7 订阅

订阅专栏

在这里插入图片描述
CVPR-2018

caffe 代码：https://github.com/hujie-frank/SENet
caffe 代码可视化工具：http://ethereon.github.io/netscope/#/editor

1 Background and Motivation

CNN extract informative features by fusing spatial and channel-wise information together within local receptive fields.

为了增强 CNN 的 representation power

several methods：enhancing spatial encoding（spatial correlations、spatial attention）比如，inception 中的多个感受野 concatenate
作者：focus on the channel relationship

因此作者设计出 Squeeze-and-Excitation 结构，emphasise informative features and suppress less useful ones（channel-wise）

2 Advantages

ILSVRC2017 classification first place
reduced the top-5 error to 2.251%

The development of new CNN architectures is a challenging engineering task, typically involving the selection of many new hyperparameters and layer configurations.

3 Related work

Deep architecture
Attention and gating mechanisms

4 Method

the role it performs at different depths adapts to the needs of the network

In the early layers, it learns to excite informative features in a class agnostic manner
In later layers, the SE block becomes increasingly specialised（in a highly class-specific manner）

SE 结构可以自成一派（用 SE block stacking 成 neural network），也可以中西结合，即插即用，as a drop-in replacement for the original block at any depth in the architecture（eg，resnet、resnext 的 bottleneck block）.

在这里插入图片描述
$F_{tr}$ is a convolution operator ， $F_{tr}:X→U$ ，具体运算如下

对于上面公式的理解，可以参考如下图片

$X$ 即为输入的 feature map， $V_c$ （上图红色字体，小写c，黑色的为大写）为某个 filter， $u_c$ （上图红色字体，小写c，黑色的为大写）是某个生成的结果！

其中 $V = [v_1,v_2,...,v_C]$ （大写的 C）
其中 $U = [u_1,u_2,...,u_C]$ （大写的 C）
在这里插入图片描述

上标表示 spatial kernel，也即我图片中拆分出 $X$ 和 $v_c$ （小写的 c ）画出来的部分！这样就明朗了很多，至于从这个耳熟能详的公式，如何就能引发对 channels 的特征重要性的思考，进而提出 SE block 的结构，我目前还体会不出来！

4.1 Squeeze: Global Information Embedding

在这里插入图片描述

$u_c$ 是特征图 $u$ 的 $c$ 通道，上面的公式是对该通道进行 global average pooling， $z_c$ 为标量，是 channel descriptor（如下图）的一小格

当然，global average pooling 只是一种统计全局信息的方式，more sophisticated aggregation strategies could be employed here as well.

4.2 Excitation: Adaptive Recalibration

两个设计准则

flexible（channels 之间有 non-linear interaction）
non-mutually-exclusive（非互斥的，避免 one-hot）

作者落地的方式为：employ a simple gating mechanism with a sigmoid activation，再细化一点即 two fully connection，再具体一点，如下图所示，第一个 fc 降低 dimension，activation function 为 relu，第二个还原为原来的 dimension，activation function 为 sigmoid（借鉴 LSTM 中的门机制）

在这里插入图片描述
公式如下：

$\delta$ 为 relu， $\sigma$ 为 sigmoid

上述公式的意义为， $U$ 的一个 channels 与 $s$ 的一个 dimension 相乘，相当于对 feature map 的加权！对应如下图 $F_{scale}$ 部分！

最后的输出

总结一下

$F_{sq}$ global average pooling
$F_{ex}$ two fully connection
$F_{scale}$ feature map （ $U$ ） multiply channels weight （ $F_{ex}$ 的输出结果）

4.3 Exemplars: SEInception and SEResNet

在这里插入图片描述

SEInception：
$F_{tr}$ 替换成 Inception block，关于 Inception 的理论与实践，可以参考 https://blog.csdn.net/bryant_meng/article/details/78597190 中 1.1 Classification / Object Detection 和 4.1 【Keras】Classification in CIFAR-10 系列连载

在这里插入图片描述
左边正常的 inception，右边 SE-Inception

SE-ResNet：

$F_{tr}$ 替换成 non-identity branch of a residual module

4.4 Model and Computational Complexity

trade-off between model complexity and performance

	ResNet-50	SE-ResNet-50
GPU：training a mini-batch 256 images，8 TItan X	190 ms	209 ms
CPU：inference	164 ms	167ms

global pooling and inner product are less optimised in existing GPU libraries

额外的参数量如下：two FC layers of the gating mechanism
在这里插入图片描述

s 为 stage
r 为 reduction ratio
$N_s$ 为 repeated block number for stage s.
$C_s$ 为 the dimension of the output channels，也即 number of channels

在这里插入图片描述
看这个图就知道怎么计算了，某个 stage 中的一个 block 的计算量增加量为 $\frac{C}{r}*C+\frac{C}{r}*C$ ，我们都知道，越后面的 stage，C 越大，增加的计算量也越大，作者实验表明，去掉后面 stage 的 SE 结构，效果不会降太多，但是计算量会增加的少一些！

5 Experiments

database

ImageNet 2012
COCO
Places 365-Challenge

$r$ ：reduction ratio is 16

5.1 ImageNet Classification

在这里插入图片描述
看 SENet 的小括号，加 SE 结构效果都有提升！看看下面训练和测试的 loss

看看在轻量级网络上的表现

看 table 2 和 table 3 SENet 小括号中的内容就说明了一起，强，有普适性， can be used in combination with a wide range of architectures.（residual or no residual）

华山论剑，一决雌雄
在这里插入图片描述

5.2 Scene Classification

在这里插入图片描述
providing evidence that SE blocks can perform well on different datasets

5.3 Object Detection on COCO

在这里插入图片描述
基于 Faster R-CNN，猛猛猛

5.4 Analysis and Interpretation

1）Reduction ratio

$r$ ：作者设置 reduction ratio 为 16，trade-off between model complexity and performance
在这里插入图片描述

2）The role of Excitation

感觉是统计 SE block 在 squeeze 之后，excitation之前的 activation 情况，5 类，每类 50 个样本，average activations for fifty uniformly sampled channels

在这里插入图片描述

作者有如下三个发现：

lower layer features are typically more general（例如（a）,说明特征共享）
higher layer features have greater specificity（例如（c）、（d），不同类别的不同 channels激活值不一样）
（e）中，activation 为1，也即类似于 identity 了，所以在此处加不加 SE block 不是那么重要，不加的话还可以大量减少计算量，参考本博客 4.4 小节的分析！

小节

利用了 gate mechanism，有普适性，图 5 的关于特征的分析尤为重要，以及 reduction ratio（two fully connection 中）complexity 和 performance 的 trade off！

bryant_meng

关注

2
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
【SENet】《Squeeze-and-Excitation Networks》

CVPR-2018caffe 代码：https://github.com/hujie-frank/SENetcaffe 代码可视化工具：http://ethereon.github.io/netscope/#/editor文章目录1 Background and Motivation2 Advantages3 Related work4 Method4.1 Squeeze: Glob...
复制链接

扫一扫