经典CNN模型（十二）：EfficientNetV1（PyTorch详细注释版）

自在极意功登峰造极

于 2024-07-28 11:01:04 发布

阅读量1.0k

点赞数 10

分类专栏：深度学习文章标签： cnn pytorch 人工智能神经网络深度学习 EfficientNetV1

本文链接：https://blog.csdn.net/qq_51872445/article/details/140747145

版权

深度学习专栏收录该内容

16 篇文章 4 订阅

订阅专栏

一. EfficientNet V1 神经网络介绍

EfficientNet 是 Google 提出的一种高效的神经网络架构，它的核心思想是通过比例缩放网络的宽度（通道数）、高度和深度（层数）来平衡计算资源和准确性。EfficientNet 使用了一种称为“复合缩放法”（compound scaling method），这种方法基于模型规模和训练数据量动态调整网络的宽度、高度和深度，以获得最佳性能。这种缩放方法使得模型能够根据可用的计算资源和数据量自适应地调整自身，从而在不同的硬件和数据集上都能达到较好的效果。

在论文中提到，本文提出的 EfficientNet-B7 在 Imagenet top-1 上达到了当年最高准确率 84.3% ，与之前准确率最高的 GPipe 相比，参数数量仅为其 1/8.4 ，推理速度提升了 6.1 倍。

在这里插入图片描述

EfficientNet 的主要特点如下：

复合缩放法：
- EfficientNet 根据训练数据量和计算资源动态调整网络的宽度、高度和深度。它使用一个固定的缩放系数来同时缩放网络的宽度、深度和分辨率，以找到最优的资源分配策略。
MobileNetV2 风格的倒置残差块：
- EfficientNet 使用了类似于 MobileNetV2 的倒置残差块作为基本构建模块，其中包含了分组卷积和线性瓶颈结构，以及 Squeeze-and-Excitation 注意力机制。
AutoML：
- EfficientNet 利用了 AutoML 技术来确定最佳的网络结构，通过自动化搜索得到最优的缩放系数。
混合精度训练：
- EfficientNet 使用混合精度训练技术，即使用半精度（FP16）和单精度（FP32）混合的数据类型进行训练，以加速训练过程并在 GPU 上节省内存。
梯度带宽缩减：
- 为了避免梯度爆炸或消失的问题，EfficientNet 应用了梯度带宽缩减技术，这是一种改良版的梯度剪裁。

EfficientNet 有多个变体，从 EfficientNet-B0 到 EfficientNet-B7，它们的规模逐渐增大，性能也相应提升。这些变体通过复合缩放法生成，B0 是基础模型，其他变体则通过增加宽度、深度和输入图像分辨率来扩展 B0 的规模。

EfficientNet 的成功在于它能够在保持高准确率的同时，显著减小模型大小和计算需求，这对于资源有限的环境（如移动设备）非常有用。由于其良好的性能和效率，EfficientNet 已成为计算机视觉任务中的流行选择之一。

二. EfficientNet V1 神经网络细节

EfficientNet V1 是一种高效的神经网络架构，它通过比例缩放网络的宽度、高度和深度来平衡计算资源和准确性。

1. 如何提升网络的准确率

图中展示了模型缩放的不同方式。从左至右依次为：

( a ) 基线网络示例；
( b ) 增加网络宽度（即特征矩阵的 channel 数目）；
( c ) 增加网络深度（层的数量）；
( d ) 增加输入分辨率（即输入高和宽，这会导致后续所有特征矩阵的高和宽相应增加）；
( e ) 我们提出的复合缩放方法，均匀地按固定比率缩放所有三个维度。
在这里插入图片描述

为了提高网络的准确性，通常会考虑增加网络的宽度、深度和输入图像的分辨率。那么这三个因素如何影响网络性能？

增加网络的深度可以获得更丰富的、复杂的特征（高级语义），并且能够很好地应用于其他任务。然而，更深的网络也因为梯度消失问题而更难训练。直觉上，更深的卷积神经网络（CNN）可以捕获更丰富的特征，并且在新的任务上有更好的泛化表现。然而，更深的网络也更难以训练，因为存在梯度消失问题。

增加网络的宽度可以获得更高粒度的特征，并且更容易训练。然而，极端宽但浅的网络可能难以捕获高层特征。例如，即使只有一个 $\times 3$ 卷积层，如果输出通道数为 10000，也无法获得更抽象的高级语义。更宽的网络倾向于捕获更精细的特征，并且更容易训练。然而，非常宽但浅的网络往往难以捕获高层次特征。

增加输入图像的分辨率可以潜在地捕获更精细的模式，因为更高的分辨率可以看到更多的细节，从而增强辨别能力。然而，对于非常高的分辨率，准确性的增益会递减，并且大分辨率图像会增加网络的计算量（请注意这不是参数量）。

在这里插入图片描述

实验结果显示，单独增加这三个因素在达到约 80％的准确率后几乎饱和。观察发现，同时调整这三个因素可以在达到 80％的准确率后继续提高性能。此外，在相同的 FLOPs 下，同时增加网络深度和输入分辨率的效果最好。

EfficientNetV1 是一种深度学习模型，它的设计目标是在有限的计算资源下，尽可能提高模型的准确性。为了达到这一目标，研究者采用了 Neural Architecture Search（NAS）技术来自动调整网络的宽度、深度和输入图像的分辨率。这些参数的变化会影响模型的复杂度和性能。

宽度（Width）：宽度指的是网络中每一层的通道数量（channel）。增加宽度可以增加模型的表达能力，但也可能导致计算量增大。
深度（Depth）：深度是指网络的层数。更深的网络通常可以捕获更复杂的模式，但也容易出现梯度消失或爆炸等问题。
分辨率（Resolution）：分辨率指输入图像的大小。更高的分辨率可以捕捉更多细节，但也需要更大的计算资源。

在 EfficientNetV1 中，研究者提出了一个抽象化的优化问题，旨在在给定的资源限制下最大化模型的准确性。他们引入了三个缩放因子 α、β 和 γ，分别用于缩放宽度、深度和分辨率。通过复合缩放方法，研究者可以在保证计算资源不超限的情况下，同时调整这三个参数，以获得最佳的模型性能。

在实际应用中，研究者首先在较小的基准网络 EfficientNetB-0 上进行了搜索，得到了最佳的缩放系数 α=1.2, β=1.1, γ=1.15。接着，他们在这些系数的基础上，通过改变 ϕ 参数，生成了一系列具有不同缩放程度的模型，如 EfficientNetB-1 至 EfficientNetB-7。这些模型在保持计算资源相对稳定的情况下，实现了不同程度的性能提升。

需要注意的是，尽管这种复合缩放方法在 EfficientNetB-0 上取得了良好的效果，但在其他基准网络上，搜索出的最佳缩放系数可能会有所不同。这是因为每种网络结构都有自己的特点，适合的缩放策略也会有所差异。因此，研究人员需要针对不同的网络结构进行相应的调整。

2. MBConv

MBConv 是 MobileNetV3 中提出的新型卷积模块，它是 Squeeze-and-Excitation 卷积块（SEBlock）和 Depthwise 卷积块（DWBlock）的结合体。MBConv 结构的设计目的是在保持计算效率的同时，提高模型的性能和灵活性。

在这里插入图片描述

第一个升维的 1x1 卷积层，它的卷积核个数是输入特征矩阵 channel 的 n 倍
当 n = 1 时，不要第一个升维的 1x1卷积层，即Stage2中的 MBConv 结构都没有第一个升维的 1x1 卷积层（这和 MobileNetV3 网络类似）
关于shortcut 连接，仅当输入 MBConv 结构的特征矩阵与输出的特征矩阵 shape 相同时才存在

MBConv 结构的主要组成部分包括以下几个部分：

升维的 1x1 卷积层（Pointwise Convolution）：这个卷积层的作用是将输入特征矩阵的通道数扩大到某个倍数（n 倍）。这个升维的过程可以帮助模型提取更高层次的特征。在某些情况下，特别是当 n=1 时，可以省略这个升维的 1x1 卷积层，例如在 MobileNetV3 的 Stage2 中。
深度可分离卷积（Depthwise Convolution）：这是 MBConv 的核心部分，它分为两个步骤：首先是一个深度卷积（Depthwise Convolution），每个输入通道独立地进行卷积；然后是一个逐点卷积（Pointwise Convolution），将所有通道的结果合并在一起。这种卷积方式大大减少了计算量，提高了模型的运行速度。
Squeeze-and-Excitation（SE）模块：这是一个自注意力机制，它可以动态地调整特征映射中各个通道的重要性。SE 模块通过对特征映射进行全局池化和激活函数处理，增强了模型的特征选择能力。
减维的 1x1 卷积层（Pointwise Convolution）：这个卷积层的作用是将特征矩阵的通道数减小到期望的输出通道数。这个过程可以减少计算量，同时保留重要的特征信息。
Dropout 层：这是一种正则化技术，用于防止过拟合。Dropout 在训练过程中随机丢弃一部分神经元，使模型更加鲁棒。

MBConv 结构还支持 shortcut 连接，也就是残差连接。只有当输入和输出特征矩阵形状相同时，才会存在 shortcut 连接。这种连接方式可以加速模型的收敛并缓解梯度消失问题。

3. SE模块

SE 模块如图所示，由一个全局平均池化，两个全连接层组成。第一个全连接层的节点个数是输入该 MBConv 特征矩阵 channels 的 1/4 ，且使用 Swish 激活函数。第二个全连接层的节点个数等于 Depthwise Conv 层输出的特征矩阵 channels ，且使用 Sigmoid 激活函数。
在这里插入图片描述

三. EfficientNet V1 神经网络结构

下图展示的是EfficientNet-B0神经网络的架构，EfficientNet 是一种深度学习模型，它结合了移动倒置块（MBConv）和线性瓶颈以及逐点卷积。这个模型的主要特点是其缩放方法，通过宽度、高度和深度来平衡网络规模。

在这里插入图片描述

以下是 EfficientNet-B0 的基本结构：

Stage 1: 使用一个 3x3 卷积层作为输入层，将输入图像从 224x224 分辨率转换为 32 个输出通道。
Stage 2: 使用 MBConv1 , k3x3 操作符，将输入分辨率保持在 112x112 ，同时将输出通道数减少到 16 。
Stage 3: 使用 MBConv6 , k3x3 操作符，将输入分辨率保持在 112x112 ，并增加输出通道数至 24 。
Stage 4: 使用 MBConv6 , k5x5 操作符，将输入分辨率减半至 56x56 ，并将输出通道数提高到 40 。
Stage 5: 使用 MBConv6 , k3x3 操作符，再次将输入分辨率减半至 28x28 ，并进一步增加输出通道数至 80 。
Stage 6: 使用 MBConv6 , k5x5 操作符，将输入分辨率减半至 14x14 ，并继续增加输出通道数至 112 。
Stage 7: 再次使用 MBConv6 , k5x5 操作符，保持输入分辨率为 14x14 ，并将输出通道数提升至 192 。
Stage 8: 使用 MBConv6 , k3x3 操作符，将输入分辨率减半至 7x7 ，并将输出通道数增加到 320 。
Stage 9: 最后，使用 Conv1x1 & Pooling & FC 操作符对 7x7 的输入进行全局平均池化并连接全连接层，得到最终的分类结果。

每个阶段都有特定数量的层（#Layers），这些层由指定的操作符组成，如 MBConv6 或 Conv1x1 等。每个操作符都具有不同的参数，例如内核大小（ k3x3 或 k5x5 ）和输出通道数。整个模型的设计旨在实现高效的计算资源利用和高性能的准确性。

四. EfficientNet V1 代码实现

开发环境配置说明：本项目使用 Python 3.6.13 和 PyTorch 1.10.2 构建，适用于CPU环境。

model.py：定义网络模型
train.py：加载数据集并训练，计算 loss 和 accuracy，保存训练好的网络参数
predict.py：用自己的数据集进行分类测试
utils.py：依赖脚本
my_dataset.py：依赖脚本

model.py

import math
import copy
from functools import partial
from collections import OrderedDict
from typing import Optional, Callable

import torch
import torch.nn as nn
from torch import Tensor
from torch.nn import functional as F

def _make_divisible(ch, divisor=8, min_ch=None):
    """
        将传入的channel数调整到距离它最近的8的整数倍
    :param ch:
    :param divisor:
    :param min_ch:
    :return:
    """
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch


class ConvBNActivation(nn.Sequential):
    def __init__(self,
                 in_planes: int,
                 out_planes: int,
                 kernel_size: int = 3,
                 stride: int = 1,
                 groups: int =1,
                 norm_layer: Optional[Callable[..., nn.Module]] = None,
                 activation_layer: Optional[Callable[..., nn.Module]] = None):
        padding = (kernel_size - 1) // 2
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.SiLU  # Swish

        super(ConvBNActivation, self).__init__(nn.Conv2d(
            in_channels=in_planes,
            out_channels=out_planes,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            groups=groups,
            bias=False),
            norm_layer(out_planes),
            activation_layer())

class SqueezeExcitation(nn.Module):
    def __init__(self,
                 input_c: int,   # block input channel
                 expand_c: int,  # block expand channel
                 squeeze_factor: int = 4):
        super(SqueezeExcitation, self).__init__()
        squeeze_c = input_c // squeeze_factor
        self.fc1 = nn.Conv2d(expand_c, squeeze_c, 1)
        self.ac1 = nn.SiLU()  # alias Swish
        self.fc2 = nn.Conv2d(squeeze_c, expand_c, 1)
        self.ac2 = nn.Sigmoid()

    def forward(self, x: Tensor) -> Tensor:
        scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
        scale = self.fc1(scale)
        scale = self.ac1(scale)
        scale = self.fc2(scale)
        scale = self.ac2(scale)
        return scale * x


class InvertedResidualConfig:
    # kernel_size, in_channel, out_channel, exp_ratio, stride, use_SE, dropout_ratio
    def __init__(self,
                 kernel: int,          # 3 or 5
                 input_c: int,
                 out_c: int,
                 expanded_ratio: int,  # 1 or 6
                 stride: int,          # 1 or 2
                 use_se: bool,         # True
                 drop_rate: float,
                 index: str,           # 1a, 2a, 2b, ...
                 width_coefficient: float):
        self.input_c = self.adjust_channels(input_c, width_coefficient)
        self.kernel = kernel
        self.expanded_c = self.input_c * expanded_ratio
        self.out_c = self.adjust_channels(out_c, width_coefficient)
        self.use_se = use_se
        self.stride = stride
        self.drop_rate = drop_rate
        self.index = index

    @staticmethod
    def adjust_channels(channels: int, width_coefficient: float):
        return _make_divisible(channels * width_coefficient, 8)


class InvertedResidual(nn.Module):
    def __init__(self,
                 cnf: InvertedResidualConfig,
                 norm_layer: Callable[..., nn.Module]):
        super(InvertedResidual, self).__init__()

        #   检测输入步幅是否有误
        if cnf.stride not in [1, 2]:
            raise ValueError("illegal stride value")

        #   判断是否需要shortcut连接
        self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)

        #   初始化layers为有序字典
        layers = OrderedDict()
        activation_layer = nn.SiLU

        # expand
        if cnf.expanded_c != cnf.input_c:
            layers.update({"expand_conv": ConvBNActivation(cnf.input_c,
                                                           cnf.expanded_c,
                                                           kernel_size=1,
                                                           norm_layer=norm_layer,
                                                           activation_layer=activation_layer)})

        # depthwise
        layers.update({"dwconv": ConvBNActivation(cnf.expanded_c,
                                                  cnf.expanded_c,
                                                  kernel_size=cnf.kernel,
                                                  stride=cnf.stride,
                                                  groups=cnf.expanded_c,
                                                  norm_layer=norm_layer,
                                                  activation_layer=activation_layer)})

        if cnf.use_se:
            layers.update({"se": SqueezeExcitation(cnf.input_c,
                                                   cnf.expanded_c)})

        # project
        layers.update({"project_conv": ConvBNActivation(cnf.expanded_c,
                                                        cnf.out_c,
                                                        kernel_size=1,
                                                        norm_layer=norm_layer,
                                                        # nn.Identity 不做任何处理
                                                        activation_layer=nn.Identity)})

        self.block = nn.Sequential(layers)
        self.out_channels = cnf.out_c
        # stride=2 -> True  stride=1 -> False
        self.is_stride = cnf.stride > 1

        if cnf.drop_rate > 0:
            self.dropout = nn.Dropout2d(p=cnf.drop_rate, inplace=True)
        else:
            self.dropout = nn.Identity()

    def forward(self, x: Tensor) -> Tensor:
        result = self.block(x)
        result = self.dropout(result)
        if self.use_res_connect:
            result += x

        return result


class EfficientNet(nn.Module):
    def __init__(self,
                 width_coefficient: float,
                 depth_coefficient: float,
                 num_classes: int = 1000,
                 dropout_rate: float = 0.2,
                 drop_connect_rate: float = 0.2,
                 block: Optional[Callable[..., nn.Module]] = None,
                 norm_layer: Optional[Callable[..., nn.Module]] = None
                 ):
        super(EfficientNet, self).__init__()

        # kernel_size, in_channel, out_channel, exp_ratio, strides, use_SE, drop_connect_rate, repeats
        default_cnf = [[3, 32, 16, 1, 1, True, drop_connect_rate, 1],
                       [3, 16, 24, 6, 2, True, drop_connect_rate, 2],
                       [5, 24, 40, 6, 2, True, drop_connect_rate, 2],
                       [3, 40, 80, 6, 2, True, drop_connect_rate, 3],
                       [5, 80, 112, 6, 1, True, drop_connect_rate, 3],
                       [5, 112, 192, 6, 2, True, drop_connect_rate, 4],
                       [3, 192, 320, 6, 1, True, drop_connect_rate, 1]]

        def round_repeats(repeats):
            """Round number of repeats based on depth multiplier."""
            return int(math.ceil(depth_coefficient * repeats))

        if block is None:
            block = InvertedResidual

        if norm_layer is None:
            norm_layer = partial(nn.BatchNorm2d, eps=1e-3, momentum=0.1)

        adjust_channels = partial(InvertedResidualConfig.adjust_channels,
                                  width_coefficient=width_coefficient)

        # build inverted_residual_setting
        bneck_conf = partial(InvertedResidualConfig,
                             width_coefficient=width_coefficient)

        b = 0
        num_blocks = float(sum(round_repeats(i[-1]) for i in default_cnf))
        inverted_residual_setting = []
        for stage, args in enumerate(default_cnf):
            cnf = copy.copy(args)
            for i in range(round_repeats(cnf.pop(-1))):
                if i > 0:
                    # strides equal 1 except first cnf
                    cnf[-3] = 1  # strides
                    cnf[1] = cnf[2]  # input_channel equal output_channel

                cnf[-1] = args[-2] * b / num_blocks  # update dropout ratio
                index = str(stage + 1) + chr(i + 97)  # 1a, 2a, 2b, ...
                inverted_residual_setting.append(bneck_conf(*cnf, index))
                b += 1

        # create layers
        layers = OrderedDict()

        # first conv
        layers.update({"stem_conv": ConvBNActivation(in_planes=3,
                                                     out_planes=adjust_channels(32),
                                                     kernel_size=3,
                                                     stride=2,
                                                     norm_layer=norm_layer)})

        # building inverted residual blocks
        for cnf in inverted_residual_setting:
            layers.update({cnf.index: block(cnf, norm_layer)})

        # build top
        last_conv_input_c = inverted_residual_setting[-1].out_c
        last_conv_output_c = adjust_channels(1280)
        layers.update({"top": ConvBNActivation(in_planes=last_conv_input_c,
                                               out_planes=last_conv_output_c,
                                               kernel_size=1,
                                               norm_layer=norm_layer)})

        self.features = nn.Sequential(layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)

        classifier = []
        if dropout_rate > 0:
            classifier.append(nn.Dropout(p=dropout_rate, inplace=True))
        classifier.append(nn.Linear(last_conv_output_c, num_classes))
        self.classifier = nn.Sequential(*classifier)

        # initial weights
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def _forward_impl(self, x: Tensor) -> Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)

        return x

    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)


def efficientnet_b0(num_classes=1000):
    # input image size 224x224
    return EfficientNet(width_coefficient=1.0,
                        depth_coefficient=1.0,
                        dropout_rate=0.2,
                        num_classes=num_classes)


def efficientnet_b1(num_classes=1000):
    # input image size 240x240
    return EfficientNet(width_coefficient=1.0,
                        depth_coefficient=1.1,
                        dropout_rate=0.2,
                        num_classes=num_classes)


def efficientnet_b2(num_classes=1000):
    # input image size 260x260
    return EfficientNet(width_coefficient=1.1,
                        depth_coefficient=1.2,
                        dropout_rate=0.3,
                        num_classes=num_classes)


def efficientnet_b3(num_classes=1000):
    # input image size 300x300
    return EfficientNet(width_coefficient=1.2,
                        depth_coefficient=1.4,
                        dropout_rate=0.3,
                        num_classes=num_classes)


def efficientnet_b4(num_classes=1000):
    # input image size 380x380
    return EfficientNet(width_coefficient=1.4,
                        depth_coefficient=1.8,
                        dropout_rate=0.4,
                        num_classes=num_classes)


def efficientnet_b5(num_classes=1000):
    # input image size 456x456
    return EfficientNet(width_coefficient=1.6,
                        depth_coefficient=2.2,
                        dropout_rate=0.4,
                        num_classes=num_classes)


def efficientnet_b6(num_classes=1000):
    # input image size 528x528
    return EfficientNet(width_coefficient=1.8,
                        depth_coefficient=2.6,
                        dropout_rate=0.5,
                        num_classes=num_classes)


def efficientnet_b7(num_classes=1000):
    # input image size 600x600
    return EfficientNet(width_coefficient=2.0,
                        depth_coefficient=3.1,
                        dropout_rate=0.5,
                        num_classes=num_classes)

train.py

import os
import math
import argparse

import torch
import torch.optim as optim
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms
import torch.optim.lr_scheduler as lr_scheduler

from model import efficientnet_b0 as create_model
from my_dataset import MyDataSet
from utils import read_split_data, train_one_epoch, evaluate


def main(args):
    device = torch.device(args.device if torch.cuda.is_available() else "cpu")

    print(args)
    print('Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/')
    tb_writer = SummaryWriter()
    if os.path.exists("./weights") is False:
        os.makedirs("./weights")

    train_images_path, train_images_label, val_images_path, val_images_label = read_split_data(args.data_path)

    img_size = {"B0": 224,
                "B1": 240,
                "B2": 260,
                "B3": 300,
                "B4": 380,
                "B5": 456,
                "B6": 528,
                "B7": 600}
    num_model = "B0"

    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(img_size[num_model]),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
        "val": transforms.Compose([transforms.Resize(img_size[num_model]),
                                   transforms.CenterCrop(img_size[num_model]),
                                   transforms.ToTensor(),
                                   transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}

    # 实例化训练数据集
    train_dataset = MyDataSet(images_path=train_images_path,
                              images_class=train_images_label,
                              transform=data_transform["train"])

    # 实例化验证数据集
    val_dataset = MyDataSet(images_path=val_images_path,
                            images_class=val_images_label,
                            transform=data_transform["val"])

    batch_size = args.batch_size
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print('Using {} dataloader workers every process'.format(nw))
    train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=batch_size,
                                               shuffle=True,
                                               pin_memory=True,
                                               num_workers=nw,
                                               collate_fn=train_dataset.collate_fn)

    val_loader = torch.utils.data.DataLoader(val_dataset,
                                             batch_size=batch_size,
                                             shuffle=False,
                                             pin_memory=True,
                                             num_workers=nw,
                                             collate_fn=val_dataset.collate_fn)

    # 如果存在预训练权重则载入
    model = create_model(num_classes=args.num_classes).to(device)
    if args.weights != "":
        if os.path.exists(args.weights):
            weights_dict = torch.load(args.weights, map_location=device)
            load_weights_dict = {k: v for k, v in weights_dict.items()
                                 if model.state_dict()[k].numel() == v.numel()}
            print(model.load_state_dict(load_weights_dict, strict=False))
        else:
            raise FileNotFoundError("not found weights file: {}".format(args.weights))

    # 是否冻结权重
    if args.freeze_layers:
        for name, para in model.named_parameters():
            # 除最后一个卷积层和全连接层外，其他权重全部冻结
            if ("features.top" not in name) and ("classifier" not in name):
                para.requires_grad_(False)
            else:
                print("training {}".format(name))

    pg = [p for p in model.parameters() if p.requires_grad]
    optimizer = optim.SGD(pg, lr=args.lr, momentum=0.9, weight_decay=1E-4)
    # Scheduler https://arxiv.org/pdf/1812.01187.pdf
    lf = lambda x: ((1 + math.cos(x * math.pi / args.epochs)) / 2) * (1 - args.lrf) + args.lrf  # cosine
    scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)

    for epoch in range(args.epochs):
        # train
        mean_loss = train_one_epoch(model=model,
                                    optimizer=optimizer,
                                    data_loader=train_loader,
                                    device=device,
                                    epoch=epoch)

        scheduler.step()

        # validate
        acc = evaluate(model=model,
                       data_loader=val_loader,
                       device=device)
        print("[epoch {}] accuracy: {}".format(epoch, round(acc, 3)))
        tags = ["loss", "accuracy", "learning_rate"]
        tb_writer.add_scalar(tags[0], mean_loss, epoch)
        tb_writer.add_scalar(tags[1], acc, epoch)
        tb_writer.add_scalar(tags[2], optimizer.param_groups[0]["lr"], epoch)

        torch.save(model.state_dict(), "./weights/model-{}.pth".format(epoch))


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--num_classes', type=int, default=5)
    parser.add_argument('--epochs', type=int, default=1)
    parser.add_argument('--batch-size', type=int, default=16)
    parser.add_argument('--lr', type=float, default=0.01)
    parser.add_argument('--lrf', type=float, default=0.01)

    # 数据集所在根目录
    # https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
    parser.add_argument('--data-path', type=str,
                        default="E:/code/PyCharm_Projects/deep_learning/data_set/flower_data/flower_photos")

    # download model weights
    # 链接: https://pan.baidu.com/s/1ouX0UmjCsmSx3ZrqXbowjw  密码: 090i
    parser.add_argument('--weights', type=str, default='./efficientnetb0.pth',
                        help='initial weights path')
    parser.add_argument('--freeze-layers', type=bool, default=False)
    parser.add_argument('--device', default='cuda:0', help='device id (i.e. 0 or 0,1 or cpu)')

    opt = parser.parse_args()

    main(opt)

predict.py

import os
import json

import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

from model import efficientnet_b0 as create_model


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    img_size = {"B0": 224,
                "B1": 240,
                "B2": 260,
                "B3": 300,
                "B4": 380,
                "B5": 456,
                "B6": 528,
                "B7": 600}
    num_model = "B0"

    data_transform = transforms.Compose(
        [transforms.Resize(img_size[num_model]),
         transforms.CenterCrop(img_size[num_model]),
         transforms.ToTensor(),
         transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

    # load image
    img_path = "郁金香.png"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path)
    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    # read class_indict
    json_path = './class_indices.json'
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)

    with open(json_path, "r") as f:
        class_indict = json.load(f)

    # create model
    model = create_model(num_classes=5).to(device)
    # load model weights
    model_weight_path = "./weights/model-0.pth"
    model.load_state_dict(torch.load(model_weight_path, map_location=device))
    model.eval()
    with torch.no_grad():
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()

    print_res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    for i in range(len(predict)):
        print("class: {:10}   prob: {:.3}".format(class_indict[str(i)],
                                                  predict[i].numpy()))
    plt.show()


if __name__ == '__main__':
    main()

utils.py

import os
import sys
import json
import pickle
import random

import torch
from tqdm import tqdm

import matplotlib.pyplot as plt


def read_split_data(root: str, val_rate: float = 0.2):
    random.seed(0)  # 保证随机结果可复现
    assert os.path.exists(root), "dataset root: {} does not exist.".format(root)

    # 遍历文件夹，一个文件夹对应一个类别
    flower_class = [cla for cla in os.listdir(root) if os.path.isdir(os.path.join(root, cla))]
    # 排序，保证各平台顺序一致
    flower_class.sort()
    # 生成类别名称以及对应的数字索引
    class_indices = dict((k, v) for v, k in enumerate(flower_class))
    json_str = json.dumps(dict((val, key) for key, val in class_indices.items()), indent=4)
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)

    train_images_path = []  # 存储训练集的所有图片路径
    train_images_label = []  # 存储训练集图片对应索引信息
    val_images_path = []  # 存储验证集的所有图片路径
    val_images_label = []  # 存储验证集图片对应索引信息
    every_class_num = []  # 存储每个类别的样本总数
    supported = [".jpg", ".JPG", ".png", ".PNG"]  # 支持的文件后缀类型
    # 遍历每个文件夹下的文件
    for cla in flower_class:
        cla_path = os.path.join(root, cla)
        # 遍历获取supported支持的所有文件路径
        images = [os.path.join(root, cla, i) for i in os.listdir(cla_path)
                  if os.path.splitext(i)[-1] in supported]
        # 排序，保证各平台顺序一致
        images.sort()
        # 获取该类别对应的索引
        image_class = class_indices[cla]
        # 记录该类别的样本数量
        every_class_num.append(len(images))
        # 按比例随机采样验证样本
        val_path = random.sample(images, k=int(len(images) * val_rate))

        for img_path in images:
            if img_path in val_path:  # 如果该路径在采样的验证集样本中则存入验证集
                val_images_path.append(img_path)
                val_images_label.append(image_class)
            else:  # 否则存入训练集
                train_images_path.append(img_path)
                train_images_label.append(image_class)

    print("{} images were found in the dataset.".format(sum(every_class_num)))
    print("{} images for training.".format(len(train_images_path)))
    print("{} images for validation.".format(len(val_images_path)))
    assert len(train_images_path) > 0, "number of training images must greater than 0."
    assert len(val_images_path) > 0, "number of validation images must greater than 0."

    plot_image = False
    if plot_image:
        # 绘制每种类别个数柱状图
        plt.bar(range(len(flower_class)), every_class_num, align='center')
        # 将横坐标0,1,2,3,4替换为相应的类别名称
        plt.xticks(range(len(flower_class)), flower_class)
        # 在柱状图上添加数值标签
        for i, v in enumerate(every_class_num):
            plt.text(x=i, y=v + 5, s=str(v), ha='center')
        # 设置x坐标
        plt.xlabel('image class')
        # 设置y坐标
        plt.ylabel('number of images')
        # 设置柱状图的标题
        plt.title('flower class distribution')
        plt.show()

    return train_images_path, train_images_label, val_images_path, val_images_label


def plot_data_loader_image(data_loader):
    batch_size = data_loader.batch_size
    plot_num = min(batch_size, 4)

    json_path = './class_indices.json'
    assert os.path.exists(json_path), json_path + " does not exist."
    json_file = open(json_path, 'r')
    class_indices = json.load(json_file)

    for data in data_loader:
        images, labels = data
        for i in range(plot_num):
            # [C, H, W] -> [H, W, C]
            img = images[i].numpy().transpose(1, 2, 0)
            # 反Normalize操作
            img = (img * [0.229, 0.224, 0.225] + [0.485, 0.456, 0.406]) * 255
            label = labels[i].item()
            plt.subplot(1, plot_num, i+1)
            plt.xlabel(class_indices[str(label)])
            plt.xticks([])  # 去掉x轴的刻度
            plt.yticks([])  # 去掉y轴的刻度
            plt.imshow(img.astype('uint8'))
        plt.show()


def write_pickle(list_info: list, file_name: str):
    with open(file_name, 'wb') as f:
        pickle.dump(list_info, f)


def read_pickle(file_name: str) -> list:
    with open(file_name, 'rb') as f:
        info_list = pickle.load(f)
        return info_list


def train_one_epoch(model, optimizer, data_loader, device, epoch):
    model.train()
    loss_function = torch.nn.CrossEntropyLoss()
    mean_loss = torch.zeros(1).to(device)
    optimizer.zero_grad()

    data_loader = tqdm(data_loader, file=sys.stdout)

    for step, data in enumerate(data_loader):
        images, labels = data

        pred = model(images.to(device))

        loss = loss_function(pred, labels.to(device))
        loss.backward()
        mean_loss = (mean_loss * step + loss.detach()) / (step + 1)  # update mean losses

        data_loader.desc = "[epoch {}] mean loss {}".format(epoch, round(mean_loss.item(), 3))

        if not torch.isfinite(loss):
            print('WARNING: non-finite loss, ending training ', loss)
            sys.exit(1)

        optimizer.step()
        optimizer.zero_grad()

    return mean_loss.item()


@torch.no_grad()
def evaluate(model, data_loader, device):
    model.eval()

    # 验证样本总个数
    total_num = len(data_loader.dataset)

    # 用于存储预测正确的样本个数
    sum_num = torch.zeros(1).to(device)

    data_loader = tqdm(data_loader, file=sys.stdout)

    for step, data in enumerate(data_loader):
        images, labels = data
        pred = model(images.to(device))
        pred = torch.max(pred, dim=1)[1]
        sum_num += torch.eq(pred, labels.to(device)).sum()

    return sum_num.item() / total_num

my_dataset.py

from PIL import Image
import torch
from torch.utils.data import Dataset


class MyDataSet(Dataset):
    """自定义数据集"""

    def __init__(self, images_path: list, images_class: list, transform=None):
        self.images_path = images_path
        self.images_class = images_class
        self.transform = transform

    def __len__(self):
        return len(self.images_path)

    def __getitem__(self, item):
        img = Image.open(self.images_path[item])
        # RGB为彩色图片，L为灰度图片
        if img.mode != 'RGB':
            raise ValueError("image: {} isn't RGB mode.".format(self.images_path[item]))
        label = self.images_class[item]

        if self.transform is not None:
            img = self.transform(img)

        return img, label

    @staticmethod
    def collate_fn(batch):
        # 官方实现的default_collate可以参考
        # https://github.com/pytorch/pytorch/blob/67b7e751e6b5931a9f45274653f4f653a4e6cdf6/torch/utils/data/_utils/collate.py
        images, labels = tuple(zip(*batch))

        images = torch.stack(images, dim=0)
        labels = torch.as_tensor(labels)
        return images, labels