【MobileNet V2】《MobileNetV2：Inverted Residuals and Linear Bottlenecks》

最新推荐文章于 2022-04-29 11:34:59 发布

bryant_meng

最新推荐文章于 2022-04-29 11:34:59 发布

阅读量711

点赞数 1

分类专栏： CNN / Transformer 文章标签： caffe 深度学习机器学习

本文链接：https://blog.csdn.net/bryant_meng/article/details/89334743

版权

CNN / Transformer 专栏收录该内容

201 篇文章 7 订阅

订阅专栏

在这里插入图片描述
CVPR-2018

caffe 版本的代码：https://github.com/shicai/MobileNet-Caffe/blob/master/mobilenet_v2_deploy.prototxt
caffe 代码可视化工具：http://ethereon.github.io/netscope/#/editor

1 Background and Motivation

NN 在 image recognition tasks 超过人类，但是 computational resources beyond 移动端和嵌入式装置。作者设计新的网络结构，不失精度的同时，大幅度减少所需的 computational resources！

2 Advantages / Contributions

Classification：慢好，快略差 compare to the state of the art（table 4）
Object Detection：比 YOLOv2 好，（table 6）20× less computation and 10× less parameters than YOLOv2
Segmentation：效果一般，计算量降了很多

3 Innovations

Inverted residual with linear bottleneck（separate the network expressiviness from its capacity——feature work）
Linear Bottlenecks

4 Relative work

Tuning deep neural architectures（AlexNet、VGGNet、GoogleNet、ResNet）
algorithmic architecture exploration
- hyper parameters optimization
- pruning
- changing the connectivity structure of the internal convolutional blocks，eg shufflenet
genetic algorithms and reinforcement learning to architectural search.
（However one drawback is that the resulting networks end up very complex.）

哈哈，学会怎么 “diss” NASNet 了

4 Method

在这里插入图片描述

4.1 Depthwise Separable Convolutions

具体的分析可以参考如下文章：

Depth wise Separable Convolution = Depth wise Convolution + Point wise convolution
在这里插入图片描述
图片来源：https://zhuanlan.zhihu.com/p/28749411

depth-wise separable convolution 参数量或者计算量为原来的
$\frac{1}{d_j} + \frac{1}{k^2}$

$d_j$ 为 output dimension
$k$ 为 depth-wise convolution 的 kernel size

$d_j$ 一般远远大于 $k$ ，所以参数量或者计算量减少主要来自于 $\frac{1}{k^2}$

MobileNet V2 uses k = 3，computational cost is 8 to 9 times smaller than that of standard convolutions.

4.2 Linear Bottlenecks

该小节作者的思路我没有看的太懂，但是最后两个结论还是能理解！！！

求简要介绍一下流形学习的基本思想？

It has been long assumed that manifolds of interest in neural networks could be embedded in low-dimensional subspaces.

1）上面一句话，一个比较好的实现方式是，如【MobileNet】《MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications》中的 $\beta$ ，也即减少 feature map 的channels，作者这里叫做 width multiplier approach，哈哈哈，好正式的名字

Width multiplier approach allows one to reduce the dimensionality of the activation space until the manifold of interest spans this entire space.

就是表达可以减少 channels（低维流形）不丢失精度的意思！

2）但是仔细想想，因为 non-linear activation 的存在，low-dimension 又在往 high-dimension 映射耶！例如 ReLU，n-joints （分段函数，n 段）会把 1D 的值映射到 $\mathbb{R}^n$ space.

且映射关系为，partial input space → all output dimension

3）In other words, deep networks only have the power of a linear classifier on the non-zero volume part of the output domain.（for ReLU）这句话理解之后就比较直观了，ReLU 只线性作用于输出为非零的部分！！！

作者分析后总结两点：

If the manifold of interest remains non-zero volume after ReLU transformation, it corresponds to a linear transformation.
（如果output 的值 non-zero，那么 ReLU 只是一个 linear 变化）
ReLU is capable of preserving complete information about the input manifold, but only if the input manifold lies in a low-dimensional subspace of the input space.

实现的方式就是，最后的 point-wise convolution 用的是 linear activation，对于低维空间而言，进行线性映射会保存特征，而非线性映射会破坏特征。（参考 MobileNetV2阅读笔记）

4.3 Inverted residuals

作者 shortcuts directly between the bottlenecks（rather than expansion）基于如下的直觉，the bottlenecks actually contain all the necessary information，while an expansion layer acts merely as an implementation detail that accompanies a non-linear transformation of the tensor

在这里插入图片描述
ReLU6 如下 $f (x) = m i n (6, m a x (0, x))$ ，限制在 6 以内，its robustness when used with low-precision computation

（a）的形式，对应我红色框出来的部分
在这里插入图片描述
（b）的具体形式如下，

参数量

parameters 为 $k*t*k + 3*3*t*k +t*k*{k}' = t*k(k+3*3+{k}')$

t：expansion factor

换成论文中的表达形式（parameters，computational cost 乘以 hw 即可）

${d}'\cdot t\cdot {d}' + k\cdot k\cdot t\cdot {d}' +t\cdot {d}'\cdot {d}'' = t\cdot {d}'({d}'+k\cdot k+{d}'')$

${d}'$ ：input channels
${d}''$ ：output channels
$k$ ：kernel size of depth-wise convolution

4.4 Information flow interpretation

t ：expansion factor

0，identity
<1，traditional bottleneck
>1 mobilenet v2（比 <1 的效果好）

无法完全 get 作者说的他设计的 bottleneck 有分离 capacity 和 expressiveness 的能力，传统的 bottleneck 是两者的结合

capacity：natural separation between the input/output domains of the building blocks (bottleneck layers，encoded by bottleneck inputs)
expressiveness：layer transformation（非线性的能力，encoded by expansion layers ）

4. 5 Model Architecture

Inverted Residuals
在这里插入图片描述

t：expansion factor
c：channels of output
n：重复的次数
s：stride

caffe 版本的代码：https://github.com/shicai/MobileNet-Caffe/blob/master/mobilenet_v2_deploy.prototxt
caffe 代码可视化工具：http://ethereon.github.io/netscope/#/editor

4.6 Trade-off hyper parameters

也即【MobileNet】《MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications》中的 $\alpha$ 和 $\beta$

width multiplier 1, 224 × 224（300 M multiply-adds and uses 3.4 M parameters.）
width multiplier 0.35-1.4，96 to 224（7-585 M multiply-adds，1.7M to 6.7 parameters）

实现小细节，width multiplier 没有作用在 very last convolutional layer. This improves performance for smaller models.

5 Experiments

ImageNet classification
COCO object detection
VOC image segmentation

评价指标

multiply-adds（MAdd）
actual latency
the number of parameters

5.1 ImageNet Classification

在这里插入图片描述
1.4 is the width multiplier

在这里插入图片描述

5.2 Object Detection

在这里插入图片描述
SSDLite：We replace all the regular convolutions with separable convolutions (depthwise followed by 1 × 1 projection) in SSD prediction layers.

5.3. Ablation study

5.3.1 Inverted residual connections

在这里插入图片描述
可以看出，作者设计的 block 好

5.3.2 Importance of linear bottlenecks

在这里插入图片描述
relu+linear 比全部用 relu6 好，support that non-linearity destroys information in low-dimensional space

7 Conclusion（own）

linear bottleneck 的由来，也即原文中 3.2 小节，理解的不是很透彻!
原文 3.4. Information flow interpretation，capacity 和 expressiveness 分离理解的不是很透彻！（conclusion 中也说到了，这个是 feature work）
5.1 小节 Memory efficient inference，get 不到
This structure maintains a compact representation at the input and the output while expanding to a higher-dimensional feature space internally to increase the expressiveness of nonlinear perchannel transformations.（来自未来版 mobilenet-v3 的总结）

8 补充（摘抄）

8.1 figure 1 and figure 2

CVPR 版本的论文有两个图只是摆出来了，却没有在正文中引用到（figure 1 和 figure 2），表示看不懂，不过轻量化网络ShuffleNet MobileNet v1/v2 解析一文中给出了很好的解释！！！

在这里插入图片描述

斜线纹理的表示用的是 linear activation，比较虚的颜色（非蓝色）的feature map 表示的是下一个 bottleneck 的！！！

上面两段话截图来自轻量化网络ShuffleNet MobileNet v1/v2 解析，醍醐灌顶，优秀如斯！！！

8.2 v1 和 v2 的区别

【MobileNet】《MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications》
在这里插入图片描述
图片来源：https://zhuanlan.zhihu.com/p/33075914

节选自：https://zhuanlan.zhihu.com/p/33075914，总结的太好了

8.3 v2 和 resnet 的区别

在这里插入图片描述
a 1x1 expansion + depth-wise separable

节选自：https://zhuanlan.zhihu.com/p/33075914

bryant_meng

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【MobileNet V2】《MobileNetV2：Inverted Residuals and Linear Bottlenecks》

CVPR-2018caffe 版本的代码：https://github.com/shicai/MobileNet-Caffe/blob/master/mobilenet_v2_deploy.prototxtcaffe 代码可视化工具：http://ethereon.github.io/netscope/#/editor文章目录1 Background and Motivation2 Advantages / Contributions3 Innovations4 Relative work4 M.
复制链接

扫一扫