论文精读：MobileNetV2: Inverted Residuals and Linear Bottlenecks

最新推荐文章于 2024-02-23 13:45:45 发布

深度不估计，目标不检测，语义不分割

最新推荐文章于 2024-02-23 13:45:45 发布

阅读量502

点赞数

分类专栏：网络文章标签：网络

本文链接：https://blog.csdn.net/qq_34782826/article/details/100030846

版权

网络专栏收录该内容

1 篇文章

订阅专栏

本文深入探讨了MobileNetV2的结构与原理，强调了其创新的倒置残差结构和线性瓶颈设计，旨在提高移动设备上的模型性能与效率。通过对深度可分离卷积、ReLU行为以及信息流的分析，阐述了模型设计背后的直觉与理论依据。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

论文地址：https://arxiv.org/pdf/1801.04381.pdf

模型结构简单，重点是理解模型设计的动机，并记录一下卷积的通用知识，已经熟知的知识就不再记录了，详细读原文。

Abstract:

1、SSDLite, MobileDeepLabv3

2、MobileNetv2基于inverted redidual structure(和resnet block 块设计思路相反)，在bootlneck层使用shortcut。

3、inverted redidual structure中间层使用depthwise 卷积

4、去除窄层的非线性以保持表达能力是重要的，这样可以提升性能，作者倡导这一设计直觉

Introction：

本文的贡献：

the inverted residual with linear bottleneck

被压缩的低维通道的输入特征首先扩张到高维通道
使用depthwise 卷积
随后用线性卷积将特征投影回低维通道特征。
这种模块非常适合移动设备，因为它允许通过永远不完全实现大型中间张量（使用了depthwise带来的收益）来显着减少推理期间所需的内存占用

作者提出的这个块，与resnet很类似，resent每一个block 也是3次操作：①1*1通道降维②3*3卷积③1*1通道扩张

RelatedWork

大量的工作也致力于改变内部卷积块的连接结构，如ShuffleNet或引入稀疏性。

还有其他研究方向，看论文。

Preliminaries, discussion and intuition：

Depthwise Separable Convolutions：

标准的卷积操作：

input： $h_i , w_i , d_i$ 卷积核： $k*k*d_i*d_j$ 假设same操作， output: $h_i, w_i,d_j$ 一共进行的计算操作： $((k*k*d_i)*h_i*w_i)*d_j$

depthwise操作计算量： $(k*k*h_i*w_i)*d_i+(1*1*h_i*w_i*d_i)*d_j = h_i*w_i*d_i*(k^2+d_j)$

depthwise卷积和传统卷积计算量比一下 = $\frac{k^2+d_j}{k^2*d_j}$ 差不多减少了 $k^2$ 倍，MobileNetv2使用k=3,计算量减少了8 -9 倍。

Linear Bottlenecks:这部分不好理解

假设某层输出的feature map 是 $h_i, w_i, d_i$ ,通过激活函数形成了manifold of interest。研究人员广泛认为manifold of interest 被嵌入到低维子空间，因此在MobileNetV1中使用参数 width multiplier 做降维，继续这个设计动机，width multiplier 能够降低维度使得manifold of interest 跨过整个空间，但对于Relu，在一维空间，它是”射线“(即Relu在第一象限是线性的)，在n维空间，它是一个带有一个分段曲线。如果后面加上ReLU，就会有较大的信息丢失，因此为了减少信息丢失，就有了文中的linear bottleneck

To summarize, we have highlighted two propertiesthat are indicative of the requirement that the manifold of interest should lie in a low-dimensional subspace of the higher-dimensional activation space:

1. If the manifold of interest remains non-zero vol-ume afterReLUtransformation, it corresponds toa linear transformation.对于ReLU层输出的非零值而言，ReLU层起到的就是一个线性变换的作用，这个从ReLU的曲线就能看出来。

2.ReLU is capable of preserving complete information about the input manifold, but only if the input manifold lies in a low-dimensional subspace of the input space ReLU层可以保留input manifold的信息，但是只有当input manifold是输入空间的一个低维子空间时才有效。

Inverted residuals：

使用shortcut的动机与resnet类似，希望提高梯度在多层传播的能力，并且倒置的设计内存效率更高。在block最后一层不是用非线性激活函数。

论文中图2 ，图3很容易理解。

Running time and parameter count for bottleneck convolution：

size: $h*w$ 扩张因子: $t$ kernel size: $k$ inputchannel : $d^'$ outputchannel: $d^ ''$