【MnasNet】《MnasNet：Platform-Aware Neural Architecture Search for Mobile》

最新推荐文章于 2024-11-17 01:00:00 发布

bryant_meng

最新推荐文章于 2024-11-17 01:00:00 发布

阅读量763

点赞数 2

分类专栏： CNN / Transformer 文章标签：计算机视觉目标检测人工智能 MnasNet

本文链接：https://blog.csdn.net/bryant_meng/article/details/122457666

版权

CNN / Transformer 专栏收录该内容

243 篇文章

订阅专栏

在这里插入图片描述

CVPR-2019

1 Background and Motivation

作者旨在设计一个新的 resource-constrained mobile model 让其在 resource-constrained platforms 跑的更加欢快

2 Related Work

现有网络的基础上压缩：量化，pruning ，NetAdapt 等，do not focus on learning novel compositions of CNN operations
hand-crafted 设计，usually take significant human efforts
NAS，基于各种 learning algorithms，例如 reinforcement learning / evolutionary search / differentiable search

3 Advantages / Contributions

NAS 出 MnasNet，两个主要创新点

incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency（不单单是 ACC）
a novel factorized hierarchical search space that encourages layer diversity throughout the network.（不像 NasNet 那样是 cell 级别的，而是 block 级别的）

achieve new state-of-the-art results on both ImageNet classification and COCO object detection under typical mobile inference latency constraints

4 Method

4.1 Problem Formulation

以前方法的 objective function
在这里插入图片描述
$m$ 是 model， $A C C$ 是 accuracy， $L A T$ 是 inference latency， $T$ 是 target latency

上面的 objective 仅考虑了精度，没有考虑速度

作者 more interested in finding multiple Pareto-optimal solutions in a single architecture search（速度和精度的 trade-off）

设计了如下的 objective function
在这里插入图片描述

根据 $\alpha$ 和 $\beta$ 取值的不同，有如下的 soft 和 hard 版

在这里插入图片描述
横坐标是 latency，纵坐标为 objective

soft 版本 $- 0.07$ 的由来如下：

we empirically observed doubling the latency usually brings about 5% relative accuracy gain

$\cdot (1 + ％5 ) \cdot (2l/T )^{\beta}\approx Reward(M1) = a \cdot (l/T )^{\beta}$

根据上面公式求出来 $\beta \approx -0.07$

4.2 Factorized Hierarchical Search Space

在这里插入图片描述
allowing different layer architectures in different blocks

同一个 block 中的 N 个 layer 是一样的，layer 里面的操作如下

在这里插入图片描述
搜索的时候 using MobileNetV2 as a reference

每个 layers 数量 {0, +1, -1} based on MobileNetV2

filter size per layer {0.75, 1.0, 1.25} to MobileNetV2

成品结构之一

在这里插入图片描述

搜索空间的大小如下：

假设 $B$ blocks，and each block has a sub search space of size $S$ with average $N$ layers per block

搜索空间大小为 $S^B$

每个 layer 都不同的话，则为 $S^{B*N}$

4.3 Search Algorithm

在这里插入图片描述
sample-eval-update loop，maximize the expected reward：

reward value R(m) 用的是 objective function

5 Experiments

5.1 Datasets

directly perform our architecture search on the ImageNet training set but with fewer training steps (5 epochs)

区别于 NasNet 的 Cifar10

5.2 Results

1）ImageNet Classification Performance
在这里插入图片描述
T = 75 ms，一次搜索，多个 model A1 / A2 / A3

相比 mobileNet v2，引入了 SE 模块，探讨下 SE 模块的影响
在这里插入图片描述

2）Model Scaling Performance

在这里插入图片描述
这里的 depth multiplier 指的是 channels，可以看出全方位领先 mobilenet v2

作者也可以灵活的通过改变 NAS 时 T 的值来控制模型的大小，上表可以看出，比在大模型上砍通道数效果更猛

3）COCO Object Detection Performance

在这里插入图片描述
没什么好评论的，都是菜鸡互啄，哈哈，开玩笑哒，有一定提升

5.3 Ablation Study and Discussion

1）Soft vs. Hard Latency Constraint
在这里插入图片描述

在这里插入图片描述

hard 版 focus more on faster models to avoid the latency penalty（objective function 也可以看出）

soft 版 tries to search for models across a wider latency range

2）Disentangling Search Space and Reward

在这里插入图片描述
解耦探讨下两个创新点的作用

3）Layer Diversity

在这里插入图片描述

6 Conclusion（own）

在 mobilenet v2 基础上搜
Pareto-optimal，帕累托最优（来自百度百科）

帕累托最优（Pareto Optimality），也称为帕累托效率（Pareto efficiency），是指资源分配的一种理想状态，假定固有的一群人和可分配的资源，从一种分配状态到另一种状态的变化中，在没有使任何人境况变坏的前提下，使得至少一个人变得更好，这就是帕累托改进或帕累托最优化。
帕累托最优状态就是不可能再有更多的帕累托改进的余地；换句话说，帕累托改进是达到帕累托最优的路径和方法。帕累托最优是公平与效率的“理想王国”。是由帕累托提出的。