Pytorch之EfficientNetV2图像分类

Super.Bear

已于 2023-10-11 19:40:31 修改

阅读量3.5k

点赞数 8

分类专栏： Pytorch 文章标签： pytorch 分类人工智能

于 2023-10-09 22:41:56 首次发布

本文链接：https://blog.csdn.net/qq_53144843/article/details/133500225

版权

Pytorch 专栏收录该内容

20 篇文章

订阅专栏

文章目录

前言
一、EfficientNet V2
二、网络实现
- 1.构建EfficientNetV2网络
- 2.训练和测试模型
三、实现图像分类
结束语

💂 个人主页:风间琉璃
🤟 版权: 本文由【风间琉璃】原创、在CSDN首发、需要转载请联系博主
💬 如果文章对你有帮助、欢迎关注、点赞、收藏(一键三连)和订阅专栏哦

前言

Google 在2021年4月份提出了 EfficientNet 的改进版 EfficientNet v2: Smaller Models and Faster Training。从论文题目上就可以看出 v2 版本相比 v1，模型参数量更小，训练速度更快。

一、EfficientNet V2

EfficientNet V2主要创新点：
$\star$ 引入Fused-MBConv模块，新的更高效的网络结构EfficientNet V2

$\star$ 提出了改进的渐进式学习方法，根据训练图片的尺寸动态的调节正则化方法，可以提升训练速度及准确率。

1. 网络简介

在 EfficientNet V1的基础上，引入了Fused-MBConv到搜索空间中，同时为 渐进式学习引入了自适应正则强度调整机制。两种改进的组合使得 EfficientNet v2 在多个基准数据集上取得了 SOTA 性能，且训练速度更快。比如 EfficientNet v2 取得了87.3%的 top1 精度且训练速度快5-11倍。

EfficientNet v2 与其他 SOTA 模型在训练速度、参数量以及精度方面的对比在这里插入图片描述
通过上图很明显能够看出EfficientNetV2网络不仅Accuracy达到了当前的SOTA（State-Of-The-Art）水平，而且训练速度更快参数数量更少。EfficientNetV2-XL (21k)在ImageNet ILSVRC2012的Top-1上达到87.3%。在EfficientNetV1中作者关注的是准确率，参数数量以及FLOPs（理论计算量小不代表推理速度快），在EfficientNetV2中作者进一步关注模型的训练速度。

2. EfficientNetV1弊端

要改进 EfficientNet，首先要分析存在的问题，本文共指出三个方面的训练瓶颈问题：

🥇训练图像的尺寸很大时，训练速度非常慢

EfficientNet 在较大的图像输入时会使用较多的显存，如果 GPU/TPU 总显存固定，此时就要降低训练的batch size，这会大大降低训练速度。
在这里插入图片描述
通过上表可以看到，在Tesla V100上当训练的图像尺寸为380x380时，batch_size=24网络可以训练，当训练的图像尺寸为512x512时，batch_size=24时出现OOM（Out Of Memory）。而且增大了img_size，Acc并没有增加。所以，针对这个问题一个比较好想到的办法就是适当降低训练图像的尺寸。

一种解决方案就是采用较小的图像尺寸训练，采用更小的图像块会导致更小的计算量、更大的batch，可以加速训练(2.2x)；与此同时，更小的图像块训练还会导致稍高的精度。

但在论文中提出了一种更高级的训练技巧：progressive Learning，通过渐进式调整图像尺寸和正则化因子达到训练加速的目的。

🥈在网络浅层中使用Depthwise convolutions速度会很慢

EfficientNet 中大量使用 depthwise conv，相比普通卷积，它的好处是参数量和 FLOPs 更小，但是它并不能较好地利用现代加速器（GPU/TPU）。虽然理论上计算量很小，但是实际上使用起来并没有想象中的快。

Google 提出了 Fused-MBConv结构去更好的利用移动端或服务端的加速器，其结构是 把 MBconv 结构中的 depthwise conv3x3 和 expansion conv1x1 替换成一个普通的conv3x3，如下图所示。
在这里插入图片描述
在 Edge TPU 测试发现虽然前者的参数量和计算量更少，但是由于常规的conv更能较好地利用TPU，反而后者的执行速度更快。

作者在EfficientNet-B4上进行测试，并发现 将浅层MBConv结构替换成Fused-MBConv结构能够明显提升训练速度，如下表所示。
在这里插入图片描述
这里逐渐将 EfficientNet-B4 的 MBConv 替换成 Fused-MBConv，发现如果将 stage1~3 替换为 Fused-MBConv 可以加速训练并带来少量的参数量与 FLOPs 提升；但如果将全部 stage 替换，此时参数量和 FLOPs 大幅度提升，但是训练速度反而下降。这说明适当地组合 MBConv 和 Fused-MBConv 才能取得最佳效果，所以作者使用NAS技术去搜索MBConv和Fused-MBConv的最佳组合。

🥉同等的放大每个stage是次优的

EfficientNetV1 的各个 stage 均采用一个复合缩放策略，每个stage的深度和宽度都是同等放大的。比如 depth 系数为2时，各个 stage 的层数均加倍。

但是不同stage在训练速度和参数量的影响并不一致，同等缩放所有stage会得到次优结果。此外，针对 EfficientNet 的采用大尺寸图像导致大计算量、训练速度降低问题，作者对缩放规则进行了轻微调整并设定了最大图像size的限制，用了非均匀的缩放策略来缩放模型。

3.NAS Search

基于上述的三个问题分析，为了提升训练速度，EfficientNet v2 采用training-aware NAS来设计，优化目标包括 accuracy，parameter efficiency 和 training efficiency，搜索的卷积单元包括 MBConv 和 Fused MBConv。

这里是以EfficientNet作为backbone，设计空间包含：
$\star$ convolutional operation type : {MBConv, Fused-MBConv}
$\star$ number of layer
$\star$ kernel size : {3x3, 5x5}
$\star$ expansion ratio (MBConv中第一个expand conv1x1或者Fused-MBConv中第一个expand conv3x3): {1, 4, 6}

搜索得到的 EfficientNetV2-S 模型，如下表
在这里插入图片描述
EfficientNet-B0的网络结构，如下表所示

相比v1结构，v2在前1~3个 stage 采用 Fused MBConv；另外可以看出前面stage的MBConv的 expansion ratio 较小，而v1的各个 stage 的 expansion ratio 几乎都是6；V1部分 stage 采用了5x5卷积核，而V2只采用了3x3卷积核，但包含更多 layers 来弥补感受野；V2中也没有V1中的最后的 stride-1的stage。这些区别让 EfficientNet v2 参数量更少，显存消耗也更少。

对 EfficientNetV2-S 进行缩放，可以进一步得到另外两个更大的模型： EfficientNetV2-M 和 EfficientNetV2-L。v2 缩放规则相比 v1 增加两个额外的约束，一个是限制图像最大size（最大480），二是后面的stage的layers更大, 后两个较大的模型的输入size均为480。

下图给出了 EfficientNetv2 模型在 ImageNet 上 top-1 acc 和 train step time，这里的训练采用固定的图像大小，不过比推理时图像大小降低30%，而图中的 EffNet(reprod) 也是采用这样的训练策略，比 baseline 训练速度和效果均有明显提升，而 EfficientNet v2 在训练速度和效果上有进一步地提升。
在这里插入图片描述

4. Progressive Learning渐进学习策略

除了模型设计优化，论文还提出了一种 progressive learning 策略来进一步提升 EfficientNet v2 的训练速度，即训练过程渐进地增大图像大小，但在增大图像同时也采用更强的正则化策略，训练的正则化策略包括数据增强和 dropout 等。

在V1中作者研究了训练图像尺寸、网络深度、网络宽度对Acc的影响。训练图像的尺寸对训练模型的效率有很大的影响。在之前的一些工作中很多人尝试使用动态的图像尺寸，一开始用很小的图像尺寸，后面再增大来加速网络的训练，但通常会导致Accuracy降低。

对于Accuracy降低的原因，作者提出了一个猜想：Accuracy的降低是不平衡的正则化unbalanced regularization导致的。在训练不同尺寸的图像时，应该使用动态的正则方法（之前都是使用固定的正则方法）。

为了验证这个猜想，作者接着做了一些实验。在前面提到的搜索空间中采样并训练模型，训练过程中尝试使用不同的图像尺寸以及不同强度的数据增强data augmentations。当训练的图片尺寸较小时，使用较弱的数据增强augmentation能够达到更好的结果；当训练的图像尺寸较大时，使用更强的数据增强能够达到更好的接果。

不同的图像输入采用不同的正则化策略，这不难理解，在早期的训练阶段，用更小的图像和较弱的正则化来训练网络，这样网络就可以轻松、快速地学习简单的表示。然后，逐渐增加图像的大小，但也通过增加更强的正则化，使学习更加困难。

如下表所示，当Size=128，RandAug magnitude=5时效果最好；当Size=300，RandAug magnitude=15时效果最好：
在这里插入图片描述
在训练过程中，作者将整个训练划分为4个阶段，每个阶段87个 epoch：在训练的早期采用小图像块+弱化正则；在训练的后期采用更大的图像块+增强的正则，每进入一个阶段，图像大小以及数据增强均线性提升。
在这里插入图片描述
如上图图所示，在训练早期使用较小的训练尺寸以及较弱的正则方法weak regularization，这样网络能够快速的学习到一些简单的表达能力。接着逐渐提升图像尺寸，同时增强正则方法adding stronger regularization。

在这里插入图片描述
表格中给出了不同模型训练过程的输入图像大小以及数据增强的范围（输入图像尺寸和正则强度的最大、最小值），这里的训练最大 image size 大约比推理时小30%：380 vs 480。z作者主要研究了以下三种正则：Dropout、RandAugment 以及 Mixup。

作者将渐进式学习策略抽象成了一个公式来设置不同训练阶段使用的训练尺寸以及正则化强度。对于不同阶段直接使用线性插值的方法递增。具体流程如下：
在这里插入图片描述
假设整个训练过程有N步，目标训练尺寸(最终训练尺度)是 $S_e$ ，正则化列表(最终正则强度) $\phi_e = {[ \phi_e^k ]}$ ，其中k代表k种正则方法(三种正则：Dropout、RandAugment 以及 Mixup。)。初始化训练尺寸 $S_0$ ，初始化正则化强度为 $\phi_0 = [{\phi_0^k}]$ 。

然后将整个训练过程划分成M个阶段，对于第i个阶段(1 $\leq$ i $\leq$ M)，模型的训练尺寸为 $S_i$ ，正则化强度为 $\phi_i = [{\phi_i^k}]$ ，对于不同阶段直接使用线性插值的方法递增。具体流程如上所示。

5.EfficientNetV2网络框架

如下图是EfficientNetV2-S的网络结构，需要注意的是：
在这里插入图片描述通过上表可以看到EfficientNetV2-S分为Stage0到Stage7（EfficientNetV1中是Stage1到Stage9）。Operator表示在当前Stage中使用的模块：

$\star$ Conv3x3: 普通的3x3卷积 + 激活函数（SiLU）+ BN

$\star$ Fused-MBConv模块：模块名称后的1，4表示expansion ratio，k3x3表示kenel_size为3x3。如下图所示
在这里插入图片描述
当expansion ratio =1时，模块中没有expand conv的，这里也没有SE(原论文图中有SE)。

当stride=1且输入输出Channels相等时才有shortcut连接。当有shortcut连接时才有Dropout层，而且这里的Dropout层是Stochastic Depth，即会随机丢掉整个block的主分支，只shortcut分支，减少了网络的深度。

$\star$ MBConv模块: EfficientNetV1中是一样的,如下图所示，
在这里插入图片描述
其中模块名称后跟的4，6表示expansion ratio

$\star$ SE0.25表示使用了SE模块，0.25表示SE模块中第一个全连接层的节点个数是输入该MBConv模块特征矩阵channels的 1/4；

$\star$ stride表示每个stage中第一个MBConv中的stride，其余的MBConv的stride=1；注意当stride=1且输入输出Channels相等且有Dropout时才有shortcut连接；

$\star$ channels表示该Stage输出的特征矩阵的；

$\star$ Layers表示每一个stage中Operator的重复次数；

EfficientNet-V1和EfficientNet-V2的网络结构对比：

①EfficientNetV2中除了使用到MBConv模块外，还使用了Fused-MBConv模块（主要是在网络浅层中使用）。

②EfficientNetV2使用较小的expansion ratio，比如4。在EfficientNetV1中基本是6，这样能够减少内存访问开销。

③EfficientNetV2中更偏向使用更小(3x3)的kernel_size，在EfficientNetV1中使用了很多5x5的kernel_size。通过上表可以看到使用的kernel_size全是3x3的，由于3x3的感受野是要比5x5小的，所以需要堆叠更多的层结构以增加感受野。

④移除了EfficientNetV1中最后一个步距为1的stage，可能是因为它的参数数量过多并且内存访问开销过大。

二、网络实现

1.构建EfficientNetV2网络

from collections import OrderedDict
from functools import partial
from typing import Callable, Optional

import torch.nn as nn
import torch
from torch import Tensor


# 在训练期间随机丢弃网络中的某些部分以防止过拟合
def drop_path(x, drop_prob: float = 0., training: bool = False):
    """
    Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
    "Deep Networks with Stochastic Depth", https://arxiv.org/pdf/1603.09382.pdf

    This function is taken from the rwightman.
    It can be seen here:
    https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/layers/drop.py#L140
    """
    if drop_prob == 0. or not training:
        return x
    keep_prob = 1 - drop_prob
    shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
    random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
    random_tensor.floor_()  # binarize
    output = x.div(keep_prob) * random_tensor
    return output


class DropPath(nn.Module):
    """
    Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).
    "Deep Networks with Stochastic Depth", https://arxiv.org/pdf/1603.09382.pdf
    """
    def __init__(self, drop_prob=None):
        super(DropPath, self).__init__()
        self.drop_prob = drop_prob

    def forward(self, x):
        return drop_path(x, self.drop_prob, self.training)


# 处理流程：Conv-->BN-->Activation
class ConvBNAct(nn.Module):
    def __init__(self,
                 in_planes: int,
                 out_planes: int,
                 kernel_size: int = 3,
                 stride: int = 1,
                 groups: int = 1,
                 norm_layer: Optional[Callable[..., nn.Module]] = None,
                 activation_layer: Optional[Callable[..., nn.Module]] = None):
        super(ConvBNAct, self).__init__()

        padding = (kernel_size - 1) // 2
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.SiLU  # alias Swish  (torch>=1.7)

        self.conv = nn.Conv2d(in_channels=in_planes,
                              out_channels=out_planes,
                              kernel_size=kernel_size,
                              stride=stride,
                              padding=padding,
                              groups=groups,
                              bias=False)

        self.bn = norm_layer(out_planes)
        self.act = activation_layer()

    def forward(self, x):
        result = self.conv(x)
        result = self.bn(result)
        result = self.act(result)

        return result

# se模块
class SqueezeExcite(nn.Module):
    def __init__(self,
                 input_c: int,   # block input channel
                 expand_c: int,  # block expand channel
                 se_ratio: float = 0.25):
        super(SqueezeExcite, self).__init__()
        squeeze_c = int(input_c * se_ratio)
        self.conv_reduce = nn.Conv2d(expand_c, squeeze_c, 1)
        self.act1 = nn.SiLU()  # alias Swish
        self.conv_expand = nn.Conv2d(squeeze_c, expand_c, 1)
        self.act2 = nn.Sigmoid()

    def forward(self, x: Tensor) -> Tensor:
        scale = x.mean((2, 3), keepdim=True)  # 在高和宽上求均值
        scale = self.conv_reduce(scale)
        scale = self.act1(scale)
        scale = self.conv_expand(scale)
        scale = self.act2(scale)
        return scale * x

# MBConv模块
class MBConv(nn.Module):
    def __init__(self,
                 kernel_size: int,
                 input_c: int,
                 out_c: int,
                 expand_ratio: int,
                 stride: int,
                 se_ratio: float,
                 drop_rate: float,
                 norm_layer: Callable[..., nn.Module]):
        super(MBConv, self).__init__()

        if stride not in [1, 2]:
            raise ValueError("illegal stride value.")

        # shortcut分支
        self.has_shortcut = (stride == 1 and input_c == out_c)

        activation_layer = nn.SiLU  # alias Swish
        expanded_c = input_c * expand_ratio

        # 在EfficientNetV2中，MBConv中不存在expansion=1的情况所以conv_pw肯定存在
        assert expand_ratio != 1
        # Point-wise expansion
        self.expand_conv = ConvBNAct(input_c,
                                     expanded_c,
                                     kernel_size=1,
                                     norm_layer=norm_layer,
                                     activation_layer=activation_layer)

        # Depth-wise convolution
        self.dwconv = ConvBNAct(expanded_c,
                                expanded_c,
                                kernel_size=kernel_size,
                                stride=stride,
                                groups=expanded_c,
                                norm_layer=norm_layer,
                                activation_layer=activation_layer)

        self.se = SqueezeExcite(input_c, expanded_c, se_ratio) if se_ratio > 0 else nn.Identity()

        # Point-wise linear projection
        self.project_conv = ConvBNAct(expanded_c,
                                      out_planes=out_c,
                                      kernel_size=1,
                                      norm_layer=norm_layer,
                                      activation_layer=nn.Identity)  # 注意这里没有激活函数，所以传入Identity

        self.out_channels = out_c

        # 只有在使用shortcut连接时才使用dropout层
        self.drop_rate = drop_rate
        if self.has_shortcut and drop_rate > 0:
            self.dropout = DropPath(drop_rate)

    def forward(self, x: Tensor) -> Tensor:
        result = self.expand_conv(x)
        result = self.dwconv(result)
        result = self.se(result)
        result = self.project_conv(result)

        # 是否有shortcut分支
        if self.has_shortcut:
            if self.drop_rate > 0:
                result = self.dropout(result)
            result += x

        return result


# FusedMBConv模块
class FusedMBConv(nn.Module):
    def __init__(self,
                 kernel_size: int,
                 input_c: int,
                 out_c: int,
                 expand_ratio: int,
                 stride: int,
                 se_ratio: float,
                 drop_rate: float,
                 norm_layer: Callable[..., nn.Module]):
        super(FusedMBConv, self).__init__()

        assert stride in [1, 2]
        assert se_ratio == 0

        self.has_shortcut = stride == 1 and input_c == out_c
        self.drop_rate = drop_rate

        self.has_expansion = expand_ratio != 1

        activation_layer = nn.SiLU  # alias Swish
        expanded_c = input_c * expand_ratio

        # 只有当expand ratio不等于1时才有expand conv
        if self.has_expansion:
            # Expansion convolution
            self.expand_conv = ConvBNAct(input_c,
                                         expanded_c,
                                         kernel_size=kernel_size,
                                         stride=stride,
                                         norm_layer=norm_layer,
                                         activation_layer=activation_layer)

            self.project_conv = ConvBNAct(expanded_c,
                                          out_c,
                                          kernel_size=1,
                                          norm_layer=norm_layer,
                                          activation_layer=nn.Identity)  # 注意没有激活函数
        else:
            # xpand ratio等于1时 只有project_conv时的情况
            self.project_conv = ConvBNAct(input_c,
                                          out_c,
                                          kernel_size=kernel_size,
                                          stride=stride,
                                          norm_layer=norm_layer,
                                          activation_layer=activation_layer)  # 注意有激活函数

        self.out_channels = out_c

        # 只有在使用shortcut连接时才使用dropout层
        self.drop_rate = drop_rate
        if self.has_shortcut and drop_rate > 0:
            self.dropout = DropPath(drop_rate)

    def forward(self, x: Tensor) -> Tensor:
        if self.has_expansion:  # xpand ratio等于1
            result = self.expand_conv(x)
            result = self.project_conv(result)
        else:    # xpand ratio 不等于1
            result = self.project_conv(x)

        if self.has_shortcut:
            if self.drop_rate > 0:
                result = self.dropout(result)

            result += x

        return result


class EfficientNetV2(nn.Module):
    def __init__(self,
                 model_cnf: list,
                 num_classes: int = 1000,
                 num_features: int = 1280,
                 dropout_rate: float = 0.2,
                 drop_connect_rate: float = 0.2):
        super(EfficientNetV2, self).__init__()

        for cnf in model_cnf:
            assert len(cnf) == 8

        # 给BN层传入默认参数
        norm_layer = partial(nn.BatchNorm2d, eps=1e-3, momentum=0.1)

        stem_filter_num = model_cnf[0][4]

        # 第一个卷积层，ConvBNAct默认使用Silu,不用传相应的激活函数
        self.stem = ConvBNAct(3,
                              stem_filter_num,
                              kernel_size=3,
                              stride=2,
                              norm_layer=norm_layer)  # 激活函数默认是SiLU

        total_blocks = sum([i[0] for i in model_cnf])
        block_id = 0
        blocks = []
        for cnf in model_cnf:
            repeats = cnf[0]
            # 1：MBConv  0：FusedMBConv
            op = FusedMBConv if cnf[-2] == 0 else MBConv
            for i in range(repeats):
                blocks.append(op(kernel_size=cnf[1],
                                 input_c=cnf[4] if i == 0 else cnf[5],
                                 out_c=cnf[5],
                                 expand_ratio=cnf[3],
                                 stride=cnf[2] if i == 0 else 1,
                                 se_ratio=cnf[-1],
                                 drop_rate=drop_connect_rate * block_id / total_blocks,  # 0-->0.2渐变
                                 norm_layer=norm_layer))
                block_id += 1
        self.blocks = nn.Sequential(*blocks)

        head_input_c = model_cnf[-1][-3]
        head = OrderedDict()

        head.update({"project_conv": ConvBNAct(head_input_c,
                                               num_features,
                                               kernel_size=1,
                                               norm_layer=norm_layer)})  # 激活函数默认是SiLU

        head.update({"avgpool": nn.AdaptiveAvgPool2d(1)})
        head.update({"flatten": nn.Flatten()})

        if dropout_rate > 0:
            head.update({"dropout": nn.Dropout(p=dropout_rate, inplace=True)})
        head.update({"classifier": nn.Linear(num_features, num_classes)})

        self.head = nn.Sequential(head)

        # initial weights
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x: Tensor) -> Tensor:
        x = self.stem(x)
        x = self.blocks(x)
        x = self.head(x)

        return x


def efficientnetv2_s(num_classes: int = 1000):
    """
    EfficientNetV2
    https://arxiv.org/abs/2104.00298
    """
    # train_size: 300, eval_size: 384

    # repeat, kernel, stride, expansion, in_c, out_c, operator, se_ratio
    model_config = [[2, 3, 1, 1, 24, 24, 0, 0],  # 倒数第二个0：使用Fused-MBConv  1：MBConv
                    [4, 3, 2, 4, 24, 48, 0, 0],
                    [4, 3, 2, 4, 48, 64, 0, 0],
                    [6, 3, 2, 4, 64, 128, 1, 0.25],
                    [9, 3, 1, 6, 128, 160, 1, 0.25],
                    [15, 3, 2, 6, 160, 256, 1, 0.25]]

    model = EfficientNetV2(model_cnf=model_config,
                           num_classes=num_classes,
                           dropout_rate=0.2)
    return model


def efficientnetv2_m(num_classes: int = 1000):
    """
    EfficientNetV2
    https://arxiv.org/abs/2104.00298
    """
    # train_size: 384, eval_size: 480

    # repeat, kernel, stride, expansion, in_c, out_c, operator, se_ratio
    model_config = [[3, 3, 1, 1, 24, 24, 0, 0],
                    [5, 3, 2, 4, 24, 48, 0, 0],
                    [5, 3, 2, 4, 48, 80, 0, 0],
                    [7, 3, 2, 4, 80, 160, 1, 0.25],
                    [14, 3, 1, 6, 160, 176, 1, 0.25],
                    [18, 3, 2, 6, 176, 304, 1, 0.25],
                    [5, 3, 1, 6, 304, 512, 1, 0.25]]

    model = EfficientNetV2(model_cnf=model_config,
                           num_classes=num_classes,
                           dropout_rate=0.3)
    return model


def efficientnetv2_l(num_classes: int = 1000):
    """
    EfficientNetV2
    https://arxiv.org/abs/2104.00298
    """
    # train_size: 384, eval_size: 480

    # repeat, kernel, stride, expansion, in_c, out_c, operator, se_ratio
    model_config = [[4, 3, 1, 1, 32, 32, 0, 0],
                    [7, 3, 2, 4, 32, 64, 0, 0],
                    [7, 3, 2, 4, 64, 96, 0, 0],
                    [10, 3, 2, 4, 96, 192, 1, 0.25],
                    [19, 3, 1, 6, 192, 224, 1, 0.25],
                    [25, 3, 2, 6, 224, 384, 1, 0.25],
                    [7, 3, 1, 6, 384, 640, 1, 0.25]]

    model = EfficientNetV2(model_cnf=model_config,
                           num_classes=num_classes,
                           dropout_rate=0.4)
    return model

2.训练和测试模型


from model import efficientnetv2_s as create_model   # as 关键字将导入的模块重命名为 create_model
from my_dataset import MyDataSet
from utils import read_split_data, train_one_epoch, evaluate

import torchvision.models.efficientnet

def main(args):
    # 检测是否支持CUDA，如果支持则使用第一个可用的GPU设备，否则使用CPU
    device = torch.device(args.device if torch.cuda.is_available() else "cpu")

    print(args)
    print('Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/')
    # tensorboard --logdir=F:/NN/Learn_Pytorch/ShuffleNetV2/runs/Oct11_13-22-17_DESKTOP-64L888R
    # 记录训练过程中的指标和可视化结果
    tb_writer = SummaryWriter()
    # 创建一个用于存储模型权重文件的目录
    if os.path.exists("./weights") is False:
        os.makedirs("./weights")

    # 获取训练和验证数据集的文件路径和标签
    train_images_path, train_images_label, val_images_path, val_images_label = read_split_data(args.data_path)

    # 不同的版本对应的输入的图片大小不一样的
    img_size = {"s": [300, 384],  # train_size, val_size
                "m": [384, 480],
                "l": [384, 480]}

    num_model = "s"

    # 数据预处理/增强的操作
    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(img_size[num_model][0]),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]),
        "val": transforms.Compose([transforms.Resize(img_size[num_model][1]),
                                   transforms.CenterCrop(img_size[num_model][1]),
                                   transforms.ToTensor(),
                                   transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])])}

    # 实例化训练数据集
    train_dataset = MyDataSet(images_path=train_images_path,
                              images_class=train_images_label,
                              transform=data_transform["train"])

    # 实例化验证数据集
    val_dataset = MyDataSet(images_path=val_images_path,
                            images_class=val_images_label,
                            transform=data_transform["val"])

    batch_size = args.batch_size
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print('Using {} dataloader workers every process'.format(nw))

    # 加载数据集，指定了批处理大小、是否打乱数据、数据加载的并行工作进程数（num_workers）
    # 以及如何合并批次数据的函数（collate_fn）
    train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=batch_size,
                                               shuffle=True,
                                               pin_memory=True,
                                               num_workers=nw,
                                               collate_fn=train_dataset.collate_fn)

    val_loader = torch.utils.data.DataLoader(val_dataset,
                                             batch_size=batch_size,
                                             shuffle=False,
                                             pin_memory=True,
                                             num_workers=nw,
                                             collate_fn=val_dataset.collate_fn)

    # 如果存在预训练权重则载入
    model = create_model(num_classes=args.num_classes).to(device)
    if args.weights != "":
        if os.path.exists(args.weights):
            # 加载权重文件
            weights_dict = torch.load(args.weights, map_location=device)
            # 仅包含与模型结构相匹配的权重，
            # 遍历预训练权重字典（weights），
            # 只保留那些与当前模型（net）中同名参数具有相同尺寸的键-值对，并将它们保存在load_weights_dict中
            load_weights_dict = {k: v for k, v in weights_dict.items()
                                 if model.state_dict()[k].numel() == v.numel()}
            # # 将上一步筛选出的pre_dict中的权重加载到模型net中，
            # strict=False表示允许加载不完全匹配的权重，可能会有一些不匹配的权重被忽略
            print(model.load_state_dict(load_weights_dict, strict=False))
        else:
            raise FileNotFoundError("not found weights file: {}".format(args.weights))

        # 是否冻结权重
        if args.freeze_layers:
            for name, para in model.named_parameters():
                # 除head外，其他权重全部冻结
                if "head" not in name:
                    para.requires_grad_(False)
                else:
                    print("training {}".format(name))

    # 创建一个包含所有需要进行梯度更新的参数的列表
    pg = [p for p in model.parameters() if p.requires_grad]
    optimizer = optim.SGD(pg, lr=args.lr, momentum=0.9, weight_decay=4E-5)
    # Scheduler https://arxiv.org/pdf/1812.01187.pdf
    # 学习率调度策略，将学习率在训练过程中按余弦函数的方式进行调整
    lf = lambda x: ((1 + math.cos(x * math.pi / args.epochs)) / 2) * (1 - args.lrf) + args.lrf  # cosine
    # 根据余弦函数的形状调整学习率
    scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)

    best_acc = 0.0
    for epoch in range(args.epochs):
        # train
        mean_loss = train_one_epoch(model=model,
                                    optimizer=optimizer,
                                    data_loader=train_loader,
                                    device=device,
                                    epoch=epoch)

        scheduler.step()

        # validate
        acc = evaluate(model=model,
                       data_loader=val_loader,
                       device=device)

        print("[epoch {}] accuracy: {}".format(epoch, round(acc, 3)))
        tags = ["loss", "accuracy", "learning_rate"]
        tb_writer.add_scalar(tags[0], mean_loss, epoch)
        tb_writer.add_scalar(tags[1], acc, epoch)
        tb_writer.add_scalar(tags[2], optimizer.param_groups[0]["lr"], epoch)

        # 保存准确率最高的权重
        if round(acc, 3) > best_acc:
            best_acc = round(acc, 3)
            torch.save(model.state_dict(), "./weights/model-{}.pth".format(epoch))



if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--num_classes', type=int, default=5)
    parser.add_argument('--epochs', type=int, default=100)
    parser.add_argument('--batch-size', type=int, default=16)
    parser.add_argument('--lr', type=float, default=0.01)
    parser.add_argument('--lrf', type=float, default=0.1)

    # 数据集所在根目录
    # https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
    parser.add_argument('--data-path', type=str,
                        default=r"F:/NN/Learn_Pytorch/flower_photos")


    # https://pan.baidu.com/s/1uZX36rvrfEss-JGj4yfzbQ  密码: 5gu1
    parser.add_argument('--weights', type=str, default='./pre_efficientnetv2-s.pth',
                        help='initial weights path')
    parser.add_argument('--freeze-layers', type=bool, default=False)
    parser.add_argument('--device', default='cuda:0', help='device id (i.e. 0 or 0,1 or cpu)')

    opt = parser.parse_args()

    main(opt)

这里使用了预训练权重，在其基础上训练自己的数据集。训练100epoch的准确率能到达98%左右。
在这里插入图片描述

三、实现图像分类

这里使用花朵数据集，下载连接：https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    # 与训练的预处理一样
    data_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

    # 加载图片
    img_path = 'diasy.jpg'
    assert os.path.exists(img_path), "file: '{}' does not exist.".format(img_path)
    image = Image.open(img_path)

    # image.show()
    # [N, C, H, W]
    img = data_transform(image)
    # 扩展维度
    img = torch.unsqueeze(img, dim=0)

    # 获取标签
    json_path = 'class_indices.json'
    assert os.path.exists(json_path), "file: '{}' does not exist.".format(json_path)
    with open(json_path, 'r') as f:
        # 使用json.load()函数加载JSON文件的内容并将其存储在一个Python字典中
        class_indict = json.load(f)

    # create model
    model = create_model(num_classes=5).to(device)
    # load model weights
    model_weight_path = "./weights/model-44.pth"
    model.load_state_dict(torch.load(model_weight_path, map_location=device))


    model.eval()
    with torch.no_grad():
        # 对输入图像进行预测
        output = torch.squeeze(model(img.to(device))).cpu()
        # 对模型的输出进行 softmax 操作，将输出转换为类别概率
        predict = torch.softmax(output, dim=0)
        # 得到高概率的类别的索引
        predict_cla = torch.argmax(predict).numpy()

    res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)], predict[predict_cla].numpy())
    draw = ImageDraw.Draw(image)
    # 文本的左上角位置
    position = (10, 10)
    # fill 指定文本颜色
    draw.text(position, res, fill='red')
    image.show()
    for i in range(len(predict)):
        print("class: {:10}   prob: {:.3}".format(class_indict[str(i)], predict[i].numpy()))


if __name__ == '__main__':
    main()