论文阅读：EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

最新推荐文章于 2023-04-02 22:17:11 发布

贾小树

最新推荐文章于 2023-04-02 22:17:11 发布

阅读量496

点赞数

分类专栏：论文阅读目标分类

本文链接：https://blog.csdn.net/j879159541/article/details/108934008

版权

论文阅读同时被 2 个专栏收录

74 篇文章 1 订阅

订阅专栏

目标分类

8 篇文章 0 订阅

订阅专栏

文章目录

1、论文总述

本篇论文的出发点是将分类模型的大小和效率放在一起考虑，希望增大模型的同时，效率也能比较高（推理速度比较快的意思），作者在论文中指出，以前的传统增大模型的方法主要是在单个维度上进行（例如：模型输入尺寸resolution，模型深度depth，模型宽度width（feature map的通道数）），而作者在本文中则是将resolution、depth、width放在一起统一进行考虑，认为他们之间是有关系，例如增大resolution的同时也应该增大depth和width，提出了一个φ混合系数将三者联系到一起，作者先是在mobilenet和resnet上验证了提出的复合系数，有效果，然后作者又用NAS自己搜索出了一个baseline，在baseline基础上逐渐增大混合系数φ就诞生了EfficientNets家族。

EfficientNet被“吐槽”许多，首先是transfer learning的困难，EfficientNet那些古怪的超参使它看起来更像是对ImageNet的“过拟合”。然后是EfficientNet很低的FLOPs却伴随着较高的推理时间，比如B3版本的FLOPs不到ResNet50的一半，推理速度却是ResNet50的两倍。

参考链接：FLOPs与模型推理速度

EfficientNets屠榜图如下：
在这里插入图片描述

compound scaling method示意图如下
在这里插入图片描述

In this paper, we want to study and rethink the process
of scaling up ConvNets. In particular, we investigate the
central question: is there a principled method to scale up
ConvNets that can achieve better accuracy and efficiency?
Our empirical study shows that it is critical to balance all
dimensions of network width/depth/resolution, and surprisingly such balance can be achieved by simply scaling each of them with constant ratio. Based on this observation, we
propose a simple yet effective compound scaling method.
Unlike conventional practice that arbitrary scales these factors, our method uniformly scales network width, depth,and resolution with a set of fixed scaling coefficients.

在这里插入图片描述

2、compound scaling method方法的合理性

Intuitively, the compound scaling method makes sense because if the input image is bigger, then the network needs
more layers to increase the receptive field and more channels
to capture more fine-grained patterns on the bigger image. In
fact, previous theoretical (Raghu et al., 2017; Lu et al., 2018)
and empirical results (Zagoruyko & Komodakis, 2016) both
show that there exists certain relationship between network
width and depth, but to our best knowledge, we are the
first to empirically quantify the relationship among all three
dimensions of network width, depth, and resolution.

在这里插入图片描述

Observation 1 – Scaling up any dimension of network
width, depth, or resolution improves accuracy, but the accuracy gain diminishes for bigger models.

在这里插入图片描述

Observation 2 – In order to pursue better accuracy and
efficiency, it is critical to balance all dimensions of network
width, depth, and resolution during ConvNet scaling.

3、给定计算资源后确定α, β, γ（即compound scaling method的流程）

主要思想：基于（2）（3）利用网格搜索

在这里插入图片描述

基于（2）（3）式，进行scale up
在这里插入图片描述

4、EfficientNet-B0的来源和主要组成部分

主要思想：nas搜索

Inspired by (Tan et al., 2019), we develop our baseline network by leveraging a multi-objective neural architecture search that optimizes both accuracy and FLOPS. Specifi-
cally, we use the same search space as (Tan et al., 2019),
and use ACC(m)×[F LOP S(m)/T]w as the optimization
goal, where ACC(m) and F LOP S(m) denote the accuracy and FLOPS of model m, T is the target FLOPS and w=-0.07 is a hyperparameter for controlling the trade-off
between accuracy and FLOPS. Unlike (Tan et al., 2019;
Cai et al., 2019), here we optimize FLOPS rather than latency since we are not targeting any specific hardware device. Our search produces an efficient network, which we
name EfficientNet-B0. Since we use the same search space
as (Tan et al., 2019), the architecture is similar to Mnas-Net, except our EfficientNet-B0 is slightly bigger due to the larger FLOPS target (our FLOPS target is 400M). Table 1 shows the architecture of EfficientNet-B0. Its main building block is mobile inverted bottleneck MBConv (Sandler et al., 2018; Tan et al., 2019), to which we also add squeeze-and-excitation optimization (Hu et al., 2018).

EfficientNet-B0的结构如下：

在这里插入图片描述 mobile inverted bottleneck MBConv：是指mobilenetV2中的倒置残差结构，也就是说用的深度可分离卷积！！

5、mobilenet与efficientnet的区别

（1）mobilenet整体宽度跟resnet50近似或者略小于的情况下，FLOPs肯定更小的多，数据访存量也肯定更小（因为参数更少），所以无论什么硬件平台速度都肯定远胜于resnet50。
（2）然后EfficientNet，是为了弥补depthwise精度的缺失，从b0往后，是强行把网络整体变胖，虽然FLOPs显示的很低，但是GPU下推理速度不行。EfficientNet paper里所有的实验是在GPU下跑出来的
（3）EfficientNetb3在cpu上的inference time，我在服务器上大概测了一下。EfficientNetB3 QPS大概13～14，ResNet50大概18～19。
（4）depthwise的成功恰恰就是在模型整体瘦小的时候，efficientnet是强行造出了一个大胖的depthwise

参考链接：FLOPs与模型推理速度文章下的评论

6、EfficientNets指标

在这里插入图片描述

7、EfficientNets激活可视化

In order to further understand why our compound scaling
method is better than others, Figure 7 compares the class
activation map (Zhou et al., 2016) for a few representative
models with different scaling methods. All these models are
scaled from the same baseline, and their statistics are shown
in Table 7. Images are randomly picked from ImageNet
validation set. As shown in the figure, the model with compound scaling tends to focus on more relevant regions with
more object details, while other models are either lack of
object details or unable to capture all objects in the images.

在这里插入图片描述