集效率之大成的EfficientNet

moonstar000

已于 2022-05-12 17:59:51 修改

阅读量720

点赞数 1

文章标签：深度学习 cnn 计算机视觉

于 2022-05-11 20:19:30 首次发布

本文链接：https://blog.csdn.net/moonstar000/article/details/124626624

版权

基于EfficientNet实现分类任务

EfficientNet简介

近年来，由于硬件设备的不断升级，算力资源不断增多。在此背景下，为了使模型得到更高的精度，我们往往会对现有的模型在三个维度（深度，宽度，分辨率）上进行缩放。大量实验表明模型的精度确实会随着模型的某一维度（深度，宽度或分辨率）的缩放比例的增大而增大。与此同时，实验中也发现了两个需要注意的现象。首先，当模型的缩放比例的增大到一定程度后，模型的精度不会继续随着缩放比例的增加而增加；其次，模型的缩放的三个维度不是相互的独立的，而是存在相关关系的。

基于两个现象，EfficientNet提出了平衡模型深度，宽度以及分辨率的缩放比例的算法来提高模型缩放的效率。除此之外，作者还通过NAS算法得到了性能极佳的基准模型，同时还使用了多种训练技巧，充分诠释了EfficientNet的efficient。

算法解析

EfficientNet论文中实现高效的模型缩放主要是从两个方面入手，首先是设置合适的最优化目标，然后是设置模型的深度，宽度以及分辨率的缩放系数的相关关系。我们也同样会从这两个方面来对算法进行解析。

模型优化目标

模型中每一个执行单元（卷积层，BN层等）我们可以理解为是一个函数： $\mathit{Y}_{i} = \mathcal{F}_{i}(\mathit{X_{\langle H_{i},W_{i}, C_{i} \rangle}})$ ，其中 $\mathcal{F}_{i}$ 代表执行单元， $\mathit{Y}_{i}$ 代表输出的Tensor， $\mathit{X}$ 代表输入的Tensor, $H_{i},W_{i}, C_{i}$ 分别代表输入Tensor的分辨率（长，宽）以及通道数。相应的一个卷积网络就理解为各个执行单元叠加的复合函数： $\mathcal{N} = \mathcal{F}_k \odot ... \odot \mathcal{F}_2 \odot \mathcal{F}_1(\mathit{X}_{\langle H_1,W_1, C_1 \rangle}) = \bigodot_{i=1...k} \mathcal{F}_i(\mathit{X}_{\langle H_{i},W_{i}, C_{i} \rangle}) \tag{1}$

同时，模型的执行单元往往存在着大量的重复，如果我们将相邻的多个相同的执行单元表示为执行单元叠加的形式： $\mathcal{F}_i(\mathit{X}_{\langle H_{i},W_{i}, C_{i} \rangle}) = \mathcal{F}^{L_j}_j(\mathit{X}_{\langle H_{i},W_{i}, C_{i} \rangle}) \tag{2}$ 得到卷积网络的函数为： $\mathcal{N} = \bigodot_{i=1...s} \mathcal{F}^{L_i}_i(\mathit{X}_{\langle H_{i},W_{i}, C_{i} \rangle}) \tag{3}$

论文中我们只关注模型的在深度，宽度以及分辨率这三个维度的缩放，所以我们将基准模型的执行单元 $\hat\mathcal{F}_i$ ，每个执行单元的深度 $\hat L_i$ 和通道数 $\hat C_i$ ，输入Tensor分辨率 $\hat H_i,\hat W_i$ 都设置为常量。模型的输出只是模型的深度缩放系数 $d$ ，宽度缩放系数 $w$ ，分辨率缩放系数 $r$ 的函数，模型最终的输出函数如公式（4）中的第二个等式所示。

$\max\limits_{d, w, r} \quad Accuracy(\mathcal{N}(d, w, r)) \\ ~\\ s.t. \quad \mathcal{N}(d, w, r) = \bigodot\limits_{i=1...s} \hat{\mathcal{F}}^{d \cdot \hat{L}_{i} }_{i}(X_{\langle r \cdot \hat{H}_{i},r \cdot \hat{W}_{i}, w \cdot \hat{C}_{i} \rangle}) \\ ~\\ Memory(\mathcal{N}) \le target\_memory \\ ~\\ FLOPS(\mathcal{N}) \le target\_flops \tag{4}$

显然，模型的最终优化目标是最大化预测准确率，该论文是通过调整模型深度，宽度和分辨率的缩放比例来达到这个目标，即 $\max\limits_{d, w, r} Accuracy(\mathcal{N}(d, w, r))$ 。除了上面已经介绍过的模型的输出函数外，这个最优化问题的限制条件还有FLOPS计算量和存储空间的限制。由此，我们得到了论文中模型优化的整体目标，即在给定的FLOPS和存储空间下，控制基准模型的基础执行单元的结构，基准模型的深度，宽度和输入图片分辨率不变，通过调整模型的深度缩放系数，宽度缩放系数以及分辨率缩放系数来实现模型的预测准确率最大化。

模型缩放算法

论文提出模型缩放算法的背景是算力资源的丰富，因此作者在设计算法时也把缩放比例与FLOPS的关系作为重点。

为了理清FLOPS与缩放系数的关系，我们先固定其他变量，假设FLOPS只是深度缩放系数 $d$ ，宽度缩放系数 $w$ ，分辨率缩放系数 $r$ 的函数， $\textup{FLOPS}_i = d\mathcal{G}_i(w, r)$ ，其中 $\mathcal{G}_i$ 代表执行单元 $i$ 的FLOPS， $d$ 代表该执行单元的个数（模型的深度缩放系数）， $w$ 代表输入Tensor的通道数的放大系数， $r$ 代表输入Tensor的长和宽的放大系数。模型缩放的效果如下图所示。
模型缩放示意图

图1

模型的总FLOPS就是各个执行单元FLOPS的相加，即 $\textup{FLOPS} = \mathcal{G}(d, w, r) = \sum\limits_{i=1...k}d\mathcal{G}_{i}(w, r) \tag{5}$ 基准模型（当 $d = w = r = 1$ 时）的 $\hat \textup{FLOPS}_{baseline} = \sum\limits_{i=1...k}\hat\mathcal{G}_{i}(1, 1)$ 。

综合公式（5），我们可知，当基准模型的相同的执行单元增加到 $d$ 个，放大后的模型的FLOPS也相应增到 $d$ 倍，即
$\textup{FLOPS}_d = \sum\limits_{i=1...k}d\hat\mathcal{G}_{i}(1, 1) = d\sum\limits_{i=1...k}\hat\mathcal{G}_{i}(1, 1) = d\hat\textup{FLOPS}_{baseline} \tag{6}$

模型最主要的执行单元是卷积层，因此我们将模型的FLOPS近似为卷积层的FLOPS的近似计算公式： $\mathcal{G}_{conv_i}(w, r) \thickapprox r \hat X_{w_i} \times r \hat X_{h_i} \times w \hat X_{c_i} \times \hat \mathit{Kernal}_{w_i} \times \hat \mathit{Kernal}_{h_i} \times w \hat \mathit{Kernal}_{n_i} = (w^2 \cdot r^2) \hat\mathcal{G}_{conv_i}(1, 1) \quad \tag{7}$

根据该公式(7)我们可得， $\textup{FLOPS}_w = \sum\limits_{i=1...k}d\mathcal{G}_{i}(w, 1) \thickapprox w^2\sum\limits_{i=1...k}\hat\mathcal{G}_{i}(1, 1) = w^2\hat\textup{FLOPS}_{baseline} \tag{8}$

$\textup{FLOPS}_r = \sum\limits_{i=1...k}d\mathcal{G}_{i}(1, r) \thickapprox r^2\sum\limits_{i=1...k}\hat\mathcal{G}_{i}(1, 1) = r^2\hat\textup{FLOPS}_{baseline} \tag{9}$

综合以上等式，我们可以得到， $\textup{FLOPS} = \mathcal{G}(d, w, r) = \sum\limits_{i=1...k}d\mathcal{G}_{i}(w, r) \thickapprox (d \cdot w^2 \cdot r^2) \hat\textup{FLOPS}_{baseline} \tag{10}$

理清FLOPS与各个缩放系数的关系后，我们就可以来比较顺畅地理解论文中提出的混合缩放方法。

$\tag{11} \textup{depth}: d = \alpha^{\phi} \\ \textup{width}: w = \beta^{\phi} \\ \textup{resolution}: r = \gamma^{\phi}\\ ~ \\ s.t. \quad \alpha \cdot \beta^{2} \cdot \gamma^{2} \thickapprox 2 \\ \alpha \ge 1, \beta \ge 1, \gamma \ge 1$

首先，如公式（11）所示，作者使用混合因子 $\phi$ 来统一的控制模型在深度，宽度以及分辨率三个维度上的缩放，之所以使用幂次的形式来计算缩放系数，是因为 $\textup{FLOPS}$ 是 $d, w, r$ 三个缩放系数相乘的结果，使用幂次的形式更加方便计算，
$\textup{FLOPS} \thickapprox (d \cdot w^2 \cdot r^2) \hat\textup{FLOPS}_{baseline} = (\alpha \cdot \beta^2 \cdot \gamma^2)^\phi \hat\textup{FLOPS}_{baseline} \tag{12}$ 基于相同的原因，作者假设 $\alpha \cdot \beta^{2} \cdot \gamma^{2} \thickapprox 2$ ，然后我们得到， $\textup{FLOPS}=2^\phi \hat\textup{FLOPS}_{baseline} \tag{13}$ 此时， $\textup{FLOPS}$ 就只是 $\phi$ 的函数，极大地简化了FLOPS与缩放参数的相关关系。

然后，作者同时还假设了模型的深度，宽度和分辨率的缩放系数都是大于1的。这个假设是来源于作者对于以往的研究的观察。

增加模型深度可以帮助模型理解更加复杂的特征，增加模型的输入的通道数可以帮助模型获取细节特征，增加输入图片的分辨率可以帮助模型识别更加细粒度的模式，这些措施都可以增加模型的精度，作者所做的实验也佐证了这个现象，如图2所示。

图3

当输入图片的分辨率增大时，需要增加模型的深度来理解其中的高层次的特征，增加模型的宽度来捕捉图片中的细节信息。
当模型的宽度增加时，需要增加模型的深度来理解捕捉到的细节的高层次的特征。

综上所述，模型的缩放参数应该设置为同时大于1。后续的实验中，作者也发现从三个维度统一地放大模型，确实比单一放大某一个维度的模型获取到更多的细节信息和图片模式的信息，如图4所示。
在这里插入图片描述

图4

最后，作者获取最佳的缩放模型的方法，分为两步：

固定 $\phi$ 的取值（ $\phi=1$ ），基于公式（4）和（11）对基准模型进行缩放，搜索 $\alpha, \beta, \gamma$ 的最佳取值，最终得到 $\alpha=1.2, \beta=1.1, \gamma=1.15$ 。
固定 $\alpha, \beta, \gamma$ 的取值，在基准模型的基础上尝试使用不同的 $\phi$ 并评估模型的性能和精度，最终通过筛选得到了EfficientNet_B1-B7模型。

模型结构

下面我们通过MindSpore vision套件来剖析EfficientNet的结构，相关模块在Vision套件中都有API可直接调用，完整代码可以参考以下链接：https://gitee.com/mindspore/vision/blob/master/mindvision/classification/models/efficientnet.py。

MBConv结构

EfficientNet模型的基础结构是MBConv，但是不同于MobileNet，作者在其中加入了squeeze-and-excitation模块并且使用了不同的激活函数，其结构图如图3所示。

图3

论文中缩放的主要对象就是MBConv模块，为了方便计算MBConv模块缩放后的各项参数，我们编写了MBConfig类。

from typing import Optional, Union, Callable, List, Any
import math

def make_divisible(v: float,
                   divisor: int,
                   min_value: Optional[int] = None
                   ) -> int:
    if not min_value:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class MBConvConfig:
    def __init__(
            self,
            expand_ratio: float,
            kernel_size: int,
            stride: int,
            in_chs: int,
            out_chs: int,
            num_layers: int,
            width_cnf: float,
            depth_cnf: float,
    ) -> None:
        self.expand_ratio = expand_ratio
        self.kernel_size = kernel_size
        self.stride = stride
        self.input_channels = self.adjust_channels(in_chs, width_cnf)
        self.out_channels = self.adjust_channels(out_chs, width_cnf)
        self.num_layers = self.adjust_depth(num_layers, depth_cnf)

    @staticmethod
    def adjust_channels(channels: int, width_cnf: float, min_value: Optional[int] = None) -> int:
        """Calculate the width of MBConv."""
        # MBConv模块的通道数取基准模型的通道数乘以宽度缩放系数所得到的数，最接近的8的整数倍。eg. 32 X 1.8 = 57.6 => 56(8 X 7).
        return make_divisible(channels * width_cnf, 8, min_value)

    @staticmethod
    def adjust_depth(num_layers: int, depth_cnf: float) -> int:
        """Calculate the depth of MBConv."""
        # MBConv模块的深度取基准模型的深度乘以深度缩放系数所得到的数的向上取整。eg. 4 X 2.6 = 10.4 => 11.
        return int(math.ceil(num_layers * depth_cnf))

ConvNormActivation结构

ConvNormActivation模块是所有卷积网络中最基础的模块，由一个卷积层（Conv, Depwise Conv），一个归一化层(BN)，一个激活函数组成。图2中可以套用这个结构的的小模块：Conv+BN+Swish，Depwise Conv+BN+Swish，Conv+BN。

from mindspore import nn

class ConvNormActivation(nn.Cell):
    """
    Convolution/Depthwise fused with normalization and activation blocks definition.
    """
    def __init__(self,
                 in_planes: int,
                 out_planes: int,
                 kernel_size: int = 3,
                 stride: int = 1,
                 groups: int = 1,
                 norm: Optional[nn.Cell] = nn.BatchNorm2d,
                 activation: Optional[nn.Cell] = nn.ReLU
                 ) -> None:
        super(ConvNormActivation, self).__init__()
        padding = (kernel_size - 1) // 2
        # 设置和添加卷积层
        layers = [
            nn.Conv2d(
                in_planes,
                out_planes,
                kernel_size,
                stride,
                pad_mode='pad',
                padding=padding,
                group=groups
            )
        ]
		# 判断是否设置归一化层
        if norm:
        	# 设置归一化层
            layers.append(norm(out_planes))
        # 判断是否设置激活函数
        if activation:
        	# 设置激活函数
            layers.append(activation())

        self.features = nn.SequentialCell(layers)

    def construct(self, x):
        output = self.features(x)
        return output

Squeeze-and-Excitation结构

SE模块通过自适应池化层，缩小卷积+激活函数，膨胀卷积+激活函数得到每个特征通道的权重，然后对原始输入的每个通道进行加权，最终得到原始特征的重标定。

from mindvision.classification.engine.ops.swish import Swish

class SqueezeExcite(nn.Cell):
    def __init__(self,
                 in_chs: int,
                 reduce_chs: int,
                 act_fn: Union[str, nn.Cell] = Swish,
                 gate_fn: Union[str, nn.Cell] = "sigmoid"
                 ) -> None:
        super(SqueezeExcite, self).__init__()
        self.act_fn = nn.get_activation(act_fn) if isinstance(act_fn, str) else act_fn()
        self.gate_fn = nn.get_activation(gate_fn) if isinstance(gate_fn, str) else gate_fn()
        reduce_chs = reduce_chs or in_chs
        self.conv_reduce = nn.Conv2d(in_channels=in_chs,
                                     out_channels=reduce_chs,
                                     kernel_size=1,
                                     has_bias=True,
                                     pad_mode='pad'
                                     )
        self.conv_expand = nn.Conv2d(in_channels=reduce_chs,
                                     out_channels=in_chs,
                                     kernel_size=1,
                                     has_bias=True,
                                     pad_mode='pad'
                                     )
        self.avg_global_pool = P.ReduceMean(keep_dims=True)

    def construct(self, x) -> Tensor:
        """Squeeze-excite construct."""
        x_se = self.avg_global_pool(x, (2, 3))
        x_se = self.conv_reduce(x_se)
        x_se = self.act_fn(x_se)
        x_se = self.conv_expand(x_se)
        x_se = self.gate_fn(x_se)
        x = x * x_se
        return x

Stochastic Depth

stochastic depth模块依据随机遮盖MBConv的输出的各个通道，每条通道有20%的概率(依据 $p = 0.8$ 的伯努利分布)被遮盖。

import mindspore.nn.probability.distribution as msd

class DropConnect(nn.Cell):
    def __init__(self,
                 keep_prob: float = 0.
                 ):
        super(DropConnect, self).__init__()
        self.drop_rate = keep_prob
        
		# 设置产生随机数的概率分布函数
        self.bernoulli = msd.Bernoulli(probs=0.8, dtype=dtype.int32)

    def construct(self, x: Tensor):
        if not self.training or self.drop_rate == 0.:
            return x
            
        # 根据设置的概率分布函数，随机遮盖输入Tensor的某几个通道
        return x * self.bernoulli.sample((x.shape[0],) + (1,) * (x.ndim-1))

将以上小模块串联，得到MBConv模块。

from mindvision.check_param import Validator, Rel

class MBConv(nn.Cell):
    def __init__(
            self,
            cnf: MBConvConfig,
            keep_prob: float,
            norm: Optional[nn.Cell] = None,
            se_layer: Callable[..., nn.Cell] = SqueezeExcite,
    ) -> None:
        super().__init__()

        Validator.check_int_range(cnf.stride, 1, 2, Rel.INC_BOTH, "stride")

        self.shortcut = cnf.stride == 1 and cnf.input_channels == cnf.out_channels

        layers: List[nn.Cell] = []
        activation = Swish

        # expand conv: the out_channels is cnf.expand_ratio times of the in_channels.
        expanded_channels = cnf.adjust_channels(cnf.input_channels, cnf.expand_ratio)
        if expanded_channels != cnf.input_channels:
            layers.append(
                ConvNormActivation(
                    cnf.input_channels,
                    expanded_channels,
                    kernel_size=1,
                    norm=norm,
                    activation=activation,
                )
            )

        # depthwise conv: splits the filter into groups.
        layers.append(
            ConvNormActivation(
                expanded_channels,
                expanded_channels,
                kernel_size=cnf.kernel_size,
                stride=cnf.stride,
                groups=expanded_channels,
                norm=norm,
                activation=activation,
            )
        )

        # squeeze and excitation
        squeeze_channels = max(1, cnf.input_channels // 4)
        layers.append(se_layer(expanded_channels, squeeze_channels, Swish, "sigmoid"))

        # project
        layers.append(
            ConvNormActivation(
                expanded_channels, cnf.out_channels, kernel_size=1, norm=norm, activation=None
            )
        )

        self.block = nn.SequentialCell(layers)
        self.dropout = DropConnect(keep_prob)
        self.out_channels = cnf.out_channels

    def construct(self, x) -> Tensor:
        """MBConv construct."""
        result = self.block(x)
        if self.shortcut:
            result = self.dropout(result)
            result += x
        return result

基准模型结构

基于MBConv模块，EfficientNet的主体结构的各项参数是作者通过NAS算法搜索到的最佳参数，如表1所示。

Stage	Operator $\\ \hat \mathcal{F}_i$	Resolution $\\ \hat H_i \times \hat W_i$	#Channels $\\ \hat C_i$	#Layers $\\ \hat L_i$
1	Conv3x3	224 X 224	32	1
2	MBConv1, k3x3	112 x 112	16	1
3	MBConv6, k3x3	112 x 112	24	2
4	MBConv6, k5x5	56 x 56	40	2
5	MBConv6, k3x3	28 x 28	80	3
6	MBConv6, k5x5	14 x 14	112	3
7	MBConv6, k5x5	14 x 14	192	4
8	MBConv6, k3x3	7 x 7	320	1
9	Conv1x1 & Pooling & FC	7 x 7	320	1

表1

根据表2的参数，我们构造了EfficientNet的主体结构，如下面的代码所示。

import copy
from functools import partial

import mindspore.nn as nn
from mindspore import Tensor
from mindspore.ops import operations as P

class EfficientNet(nn.Cell):

    def __init__(
            self,
            width_mult: float = 1,
            depth_mult: float = 1,
            inverted_residual_setting: Optional[List[MBConvConfig]] = None,
            keep_prob: float = 0.2,
            block: Optional[nn.Cell] = None,
            norm_layer: Optional[nn.Cell] = None,
    ) -> None:
        super(EfficientNet, self).__init__()

        if block is None:
            block = MBConv

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
            if width_mult >= 1.6:
                norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.99)

        layers: List[nn.Cell] = []

        bneck_conf = partial(MBConvConfig, width_cnf=width_mult, depth_cnf=depth_mult)

        if not inverted_residual_setting:
            inverted_residual_setting = [
                bneck_conf(1, 3, 1, 32, 16, 1),
                bneck_conf(6, 3, 2, 16, 24, 2),
                bneck_conf(6, 5, 2, 24, 40, 2),
                bneck_conf(6, 3, 2, 40, 80, 3),
                bneck_conf(6, 5, 1, 80, 112, 3),
                bneck_conf(6, 5, 2, 112, 192, 4),
                bneck_conf(6, 3, 1, 192, 320, 1),
            ]

        # building first layer
        firstconv_output_channels = inverted_residual_setting[0].input_channels
        layers.append(
            ConvNormActivation(
                3, firstconv_output_channels, kernel_size=3, stride=2, norm=norm_layer, activation=Swish
            )
        )

        # building MBConv blocks
        total_stage_blocks = sum(cnf.num_layers for cnf in inverted_residual_setting)
        stage_block_id = 0

        # cnf is the settings of block
        for cnf in inverted_residual_setting:
            stage: List[nn.Cell] = []

            # cnf.num_layers is the num of the same block
            for _ in range(cnf.num_layers):
                # copy to avoid modifications. shallow copy is enough
                block_cnf = copy.copy(cnf)

                # overwrite info if not the first conv in the stage
                if stage:
                    block_cnf.input_channels = block_cnf.out_channels
                    block_cnf.stride = 1

                # adjust dropout rate of blocks based on the depth of the stage block
                sd_prob = keep_prob * float(stage_block_id) / total_stage_blocks

                stage.append(block(block_cnf, sd_prob, norm_layer))
                stage_block_id += 1

            layers.append(nn.SequentialCell(stage))

        # building last several layers
        lastconv_input_channels = inverted_residual_setting[-1].out_channels
        lastconv_output_channels = 4 * lastconv_input_channels
        layers.append(
            ConvNormActivation(
                lastconv_input_channels,
                lastconv_output_channels,
                kernel_size=1,
                norm=norm_layer,
                activation=Swish,
            )
        )

        self.features = nn.SequentialCell(layers)
        self.avgpool = P.AdaptiveAvgPool2D(1)

    def construct(self, x) -> Tensor:
        """Efficientnet construct."""
        x = self.features(x)

        x = self.avgpool(x)
        x = P.Flatten()(x)

        return x

EfficientNet_B0-B7结构

EfficientNet一族模型的缩放参数，MBConv模块的Stochastic Depth的概率以及模型最终的dropout rate，如表2所示。

Model	Image_Size	Width_Coefficient	Depth_Coefficient	Dropout_Rate	Stochastic_Depth
EfficientNet_b0	224	1.0	1.0	0.2	0.2
EfficientNet_b1	240	1.0	1.1	0.2	0.2
EfficientNet_b2	260	1.1	1.2	0.3	0.2
EfficientNet_b3	300	1.2	1.4	0.3	0.2
EfficientNet_b4	380	1.4	1.8	0.4	0.2
EfficientNet_b5	456	1.6	2.2	0.4	0.2
EfficientNet_b6	528	1.8	2.6	0.5	0.2
EfficientNet_b7	600	2.0	3.1	0.5	0.2

表2

根据各个模型的参数，作者构造了EfficientNet_B0-B7模型，我们在示例中只展示了EfficientNet_B0如下面的代码所示。

from mindvision.classification.models.head import DenseHead
from mindvision.classification.utils.model_urls import model_urls
from mindvision.utils.load_pretrained_model import LoadPretrainedModel
from mindvision.classification.models.classifiers import BaseClassifier

def _efficientnet(arch: str,
                  width_mult: float,
                  depth_mult: float,
                  dropout: float,
                  input_channel: int,
                  num_classes: int,
                  pretrained: bool,
                  **kwargs: Any,
                  ) -> EfficientNet:
    """EfficientNet architecture."""

    backbone = EfficientNet(width_mult, depth_mult, **kwargs)
    head = DenseHead(input_channel, num_classes, keep_prob=1 - dropout)
    model = BaseClassifier(backbone, head=head)

    if pretrained:
        # Download the pre-trained checkpoint file from url, and load
        # checkpoint file.
        LoadPretrainedModel(model, model_urls[arch]).run()
    return model

def efficientnet_b0(num_classes: int = 1000,
                    pretrained: bool = False,
                    ) -> EfficientNet:
    return _efficientnet("efficientnet_b0", 1.0, 1.0, 0.2, 1280, num_classes, pretrained)

模型训练与推理

本案例基于MindSpore-GPU版本，在单GPU卡上完成模型训练和验证。

首先导入相关模块，配置相关超参数并读取数据集，该部分代码在Vision套件中都有API可直接调用，详情可以参考以下链接：https://gitee.com/mindspore/vision 。

可通过:http://image-net.org/ 进行数据集下载。

加载前先定义数据集路径，请确保你的数据集路径如以下结构。

.ImageNet/
    ├── ILSVRC2012_devkit_t12.tar.gz
    ├── train/
    ├── val/
    └── efficientnet_infer.png

模型训练

训练模型前，需要先按照论文中给出的参数设置损失函数，优化器以及回调函数，MindSpore Vision套件提供了提供了相应的接口，具体代码如下所示。

# Set lr scheduler.
import mindspore.nn as nn
from mindspore import context
from mindspore.train import Model

from mindvision.engine.callback import LossMonitor
from mindvision.engine.loss import CrossEntropySmooth
from mindvision.classification.dataset import ImageNet
from mindvision.classification.models import efficientnet_b0
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig

context.set_context(mode=context.GRAPH_MODE, device_target='GPU')

dataset_path = './ImageNet/'
ckpt_save_dir = './CheckPoints/'
resize = 380
batch_size = 16
epoch_size = 300

dataset_train = ImageNet(data_url,
                         split="train",
                         shuffle=True,
                         resize=resize,
                         batch_size=batch_size,
                         repeat_num=1,
                         num_parallel_workers=1).run()

step_size = dataset_train.get_dataset_size()

network = efficientnet_b0(num_classes=1000, pretrained=True, is_training=True)

lr = nn.cosine_decay_lr(max_lr=0.256, min_lr=0.0
                        total_step=epoch_size * step_size, step_per_epoch=step_size,
                        decay_epoch=epoch_size)

# Define optimizer.

network_opt = nn.RMSProp(network.trainable_params(),
                         learning_rate=lr,
                         momentum=0.9,
                         decay=0.9,
                         )

# Define loss function.
network_loss = CrossEntropySmooth(
    sparse=True, reduction="mean", smooth_factor=0.1, classes_num=1000
)

# Set the checkpoint config for the network.
ckpt_config = CheckpointConfig(
    save_checkpoint_steps=step_size, keep_checkpoint_max=10
)
ckpt_callback = ModelCheckpoint(prefix='efficientnet', directory=ckpt_save_dir, config=ckpt_config)
# Init the model.
model = Model(network, loss_fn=network_loss, optimizer=network_opt, metrics={'acc'})

# Begin to train.
model.train(epoch_size,
            dataset_train,
            callbacks=[ckpt_callback, LossMonitor(lr)],
            dataset_sink_mode=False)

Epoch:[ 1/ 300],    step:[  664/ 5004],    loss:[1.997/2.292],    time:611.926,   lr:0.00006.
Epoch:[ 1/ 300],    step:[  665/ 5004],    loss:[2.396/2.292],    time:565.492,   lr:0.00006.
Epoch:[ 1/ 300],    step:[  666/ 5004],    loss:[2.425/2.293],    time:573.950,   lr:0.00006.
Epoch:[ 1/ 300],    step:[  667/ 5004],    loss:[2.404/2.293],    time:588.116,   lr:0.00006.
Epoch:[ 1/ 300],    step:[  668/ 5004],    loss:[2.065/2.292],    time:565.874,   lr:0.00006.
...

模型验证

模型验证过程与训练过程相似。不同的是验证过程不需要设置优化器，但是需要设置评价指标

调用ImageNet验证集数据的只需要将接口的split参数设置为"val"即可，具体代码如下所示。

import mindspore.nn as nn
from mindspore import context
from mindspore.train import Model

from mindvision.engine.loss import CrossEntropySmooth
from mindvision.classification.dataset import ImageNet
from mindvision.classificaiton.models import efficientnet_b0

context.set_context(mode=context.GRAPH_MODE, device_target='GPU')

dataset_path = './ImageNet/'
resize = 224
batch_size = 16

dataset_eval = ImageNet(dataset_path,
                        split="val",
                        num_parallel_workers=8,
                        resize=resize,
                        batch_size=batch_size).run()
                        
network = efficientnet_b0(1000, pretrained=True)
network.set_train(False)

# Define loss function.
network_loss = CrossEntropySmooth(sparse=True, reduction="mean", smooth_factor=0.1,
                                  classes_num=1000)

# Define eval metrics.
eval_metrics = {'Top_1_Accuracy': nn.Top1CategoricalAccuracy(),
                'Top_5_Accuracy': nn.Top5CategoricalAccuracy()}

# Init the model.
model = Model(network, network_loss, metrics=eval_metrics)

# Begin to eval.
result = model.eval(dataset_eval)
print(result)

{'Top_1_Accuracy': 0.7739076504481434, 'Top_5_Accuracy': 0.9343990076824584}

使用MindSpore Vision套件的EfficientNet_B0-B7的Top-1 Accuracy与使用TensorFlow的对比，以及使用EfficientNet_B0-B7的Top-5 Accuracy，如下图所示：
准确率图

模型推理

模型的推理过程较为简单，只需要使用ImageNet数据集接口读取要推理的图片，加载预训练网络，通过Model.predict方法对图片进行推理即可，具体代码如下所示。

import numpy as np
import mindspore.nn as nn
from mindspore import context, Tensor
from mindspore.train import Model

from mindvision.dataset.download import read_dataset
from mindvision.classification.dataset import ImageNet
from mindvision.classificaiton.models import efficientnet_b0
from mindvision.classification.utils.image import show_result

context.set_context(mode=context.GRAPH_MODE, device_target="GPU")

data_path = './ImageNet/efficientnet_infer.png'
resize = 224
batch_size = 16

# Data pipeline.
dataset_infer = ImageNet(data_path,
	                  	 split="infer",
	                     num_parallel_workers=8,
	                     resize=resize,
	                     batch_size=batch_size).run()

network = efficientnet_b0(1000, pretrained=True)

network.set_train(False)

# Init the model.
model = Model(network)

# Begin to infer
image_list, _ = read_dataset(data_path)
for data in dataset_infer.create_dict_iterator(output_numpy=True):
    image = data["image"]
    image = Tensor(image)
    prob = model.predict(image)
    label = np.argmax(prob.asnumpy(), axis=1)
    for i, v in enumerate(label):
        predict = dataset.index2label[v]
        output = {v: predict}
        print(output)
        show_result(img=image_list[i], result=output, out_file=image_list[i])

{282: 'tiger cat'}

推理后的图片如下图所示：
推理图片

总结

本案例对EfficientNet的论文中提出的模型缩放算法进行了详细的解释和推导，向读者完整地呈现了该算法的最优化目标，缩放系数相关性等核心问题的解析。同时，通过MindSpore Vision套件，剖析了EfficientNet的主要模块和主体结构，还完成了EfficientNet_B0模型在ImageNet数据上的训练，验证和推理的过程，如需完整的源码可以参考MindSpore Vision套件。

引用

[1] Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International conference on machine learning. PMLR, 2019: 6105-6114.