EfficientNet

【Google Research, Brain Team 2019】

参考的优秀博文:EfficientNet网络详解

Abstract

  • 在本文中,我们系统地研究了模型的伸缩性,并发现谨慎地平衡网络深度、宽度和分辨率可以获得更好的性能。

    In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance.
  • 使用一个简单而高效的复合系数,统一缩放深度/宽度/分辨率的所有维度。

    uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient.
  • 对一个较为简单的网络使用神经网络结构搜索方法得到最佳复合系数

    neural architecture search

在这里插入图片描述

1. Introduction

1.1 Model Scaling

在这里插入图片描述

  • width:增加网络的channels
  • depth:增加网络深度
  • resolution:增加网络输入的resolution
  • compound scaling:同时增加网络的width、depth、resolution

2. Related Work

  • ConvNet Accuracy: GoogleNet 、SENet 、GPipe
  • ConvNet Efficiency: Model compression 、SqueezeNets 、MobileNets 、ShuffleNets 、neural architecture search 、

【NAS起初用于设计efficient mobile-size ConvNets ,首次用于大型网络 】

Recently, neural architecture search becomes increasingly popular in designing efficient mobile-size ConvNets (Tan et al., 2019; Cai et al., 2019). However, it is unclear how to apply these techniques for larger models that have much larger design space and much more expensive tuning cost.
  • Model Scaling: ResNet --depth、WideResNet +MobileNets --width、

3. Compound Model Scaling

3.1 Problem Formulation

  • ConvNet 的一般表示:

在这里插入图片描述

  • 固定Fi,寻找每一层的Li,Ci,Hi,Wi

    By fixing Fi, model scaling simplifies the design problem for new resource constraints, but it still remains a large design space to explore different Li; Ci; Hi; Wi for each layer. I

在这里插入图片描述

3.2 Scaling Dimensions

  • Depth:增加网络的深度depth能够得到更加丰富、复杂的特征并且能够很好的应用到其它任务中。但网络的深度过深会面临梯度消失,训练困难的问题。尽管skip connection和batch normalization可以减轻训练困难的问题,但是增加网络depth的收益会消失。

    The intuition is that deeper ConvNet can capture richer and more complex features, and generalize well on new tasks. However, deeper networks are also more difficult to train due to the vanishing gradient problem.Although several techniques, such as skip connections (He et al., 2016) and batch normalization (Ioffe & Szegedy, 2015), alleviate the training problem, the accuracy gain of very deep network diminishes.
  • Width:增加网络的width能够获得更高细粒度的特征并且也更容易训练,但对于width很大而深度较浅的网络往往很难学习到更深层次的特征。

    wider networks tend to be able to capture more fine-grained features and are easier to train. However, extremely wide but shallow networks tend to have difficulties in capturing higher level features.
  • Resolution:增加输入网络的图像分辨率能够潜在得获得更高细粒度的特征模板,但对于非常高的输入分辨率,准确率的增益也会减小。并且大分辨率图像会增加计算量。

    With higher resolution input images, ConvNets can potentially capture more fine-grained patterns. but the accuracy gain diminishes for very high resolutions.

在这里插入图片描述

3.3 Compound Scaling

  • 直观地说,对于更高分辨率的图像,我们应该增加网络深度,这样更大的接收区域可以帮助在更大的图像中捕捉到包含更多像素的类似特征。相应的,我们也应该在分辨率较高的时候增加网络宽度,以便在高分辨率图像中捕捉到更多的像素更细粒度的模式。这些直觉表明,我们需要协调和平衡不同的尺度,而不是传统的一维尺度。

    Intuitively, for higher resolution images, we should increase network depth, such that the larger receptive fields can help capture similar features that include more pixels in bigger images. Correspondingly, we should also increase network width when resolution is higher, in order to capture more fine-grained patterns with more pixels in high resolution images. These intuitions suggest that we need to coordinate and balance different scaling dimensions rather than conventional single-dimension scaling.

在这里插入图片描述

  • compound scaling method :

在这里插入图片描述

4. EfficientNet Architecture

4.1 EfficientNet-B0 baseline network

在这里插入图片描述

4.2 MBconv

mbblock

MBConv结构:

  • MBConv结构主要由一个1x1的普通卷积(升维作用,包含BN和Swish)
  • 一个kxkDepthwise Conv卷积(包含BN和Swish)
  • k的具体值可看EfficientNet-B0的网络框架主要有3x35x5两种情况
  • 一个SE模块
  • 一个1x1的普通卷积(降维作用,包含BN)
  • 一个Droupout层构成
  • 关于shortcut连接,仅当输入MBConv结构的特征矩阵与输出的特征矩阵shape相同时才存在

4.3 Compound Scaling Method

在这里插入图片描述

5. Experiments

5.1 EfficientNet Performance Results on ImageNet

在这里插入图片描述

5.2 Scaling Up MobileNets and ResNet

在这里插入图片描述

5.3 Class Activation Map (CAM)

在这里插入图片描述

  • In order to further understand why our compound scaling method is better than others
  • Images are randomly picked from ImageNet validation set. As shown in the figure, the model with compound scaling tends to focus on more relevant regions with more object details, while other models are either lack of object details or unable to capture all objects in the images.

5.4 Scaling Up EfficientNet-B0 with Different Methods

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值