YOLOv5改进（十）-- 轻量化模型MobileNetv4

wisdom_zhe

于 2025-04-24 17:06:18 发布

阅读量429

点赞数 3

分类专栏：目标检测文章标签： YOLO 深度学习人工智能目标检测

本文链接：https://blog.csdn.net/qq_44231797/article/details/147473518

版权

目标检测专栏收录该内容

28 篇文章

订阅专栏

文章目录

一、原理
二、YOLOv5 添加 MobileNetV4 模块
三、目标检测系列文章

一、原理

在这里插入图片描述

论文地址：MobileNetV4 - Universal Models for the Mobile Ecosystem

官方代码: MobileNetV4代码仓库

**MobileNetV4（MNv4）**是最新一代的MobileNet，其核心原理在于实现了跨各种移动设备的高效性能,旨在优化各种移动设备的性能。它引入了通用倒瓶颈（UIB）搜索块，这是一种统一且灵活的结构，结合了倒瓶颈、ConvNext、前馈网络（FFN）和新的额外深度可分离卷积（ExtraDW）变体。与此同时，MobileNetV4还推出了专为移动加速器优化的Mobile MQA注意力块，提高了39%的推理速度。通过改进的神经架构搜索（NAS）方法，MobileNetV4显著提高了搜索效率，创建了在CPU、DSP、GPU以及专用加速器（如Apple Neural Engine和Google Pixel EdgeTPU）上表现优异的模型。此外，MobileNetV4还采用了一种新的蒸馏技术，进一步提高了模型的准确性。整体上，MobileNetV4通过整合UIB、Mobile MQA和改进的NAS方法，成功打造出一系列在移动设备上表现最优的模型，兼顾计算效率和精度，实现了在多种硬件平台上的帕累托最优性能。

以下是其架构背后的关键原则和组件：

1）通用倒置瓶颈 (UIB)

倒置瓶颈块：MobileNetV4 以倒置瓶颈 (IB) 块的成功为基础，这是之前 MobileNets 的核心功能。
灵活性和统一性：UIB 结合了多种微架构，包括倒置瓶颈、ConvNext、前馈网络 (FFN) 和创新的额外深度 (ExtraDW) 变体。这种统一使其能够有效地处理不同类型的操作。
可选深度卷积：UIB 包括可选深度卷积，可增强其混合空间和通道信息的计算效率和灵活性。

2）移动 MQA（移动量化注意力）

效率：此注意力模块专门针对移动加速器进行了优化，与传统的多头注意力 (MHSA) 相比，推理速度显著提高了 39%。
移动设备上的性能：移动 MQA 模块经过量身定制，可充分利用移动硬件的功能，确保更快的处理速度和更低的延迟。

3）神经架构搜索 (NAS)

优化的搜索方案：MobileNetV4 引入了一种改进的 NAS 流程，其中包括一个两阶段方法：粗粒度搜索和细粒度搜索。此方法提高了搜索过程的效率和有效性，从而可以创建高性能模型。
硬件感知：NAS 方法考虑了各种移动硬件平台的限制和功能，确保生成的模型针对各种设备进行了优化，从 CPU 到专用加速器，如 Apple Neural Engine 和 Google Pixel EdgeTPU。

4）蒸馏技术

准确度增强：MobileNetV4 采用了一种新颖的蒸馏技术，该技术使用具有不同增强的混合数据集和平衡的类内数据。该技术提高了模型的泛化能力和准确性。
性能：通过这种蒸馏技术增强的 MNv4-Hybrid-Large 模型在 ImageNet-1K 上实现了令人印象深刻的 87% 的准确率，在 Pixel 8 EdgeTPU 上的运行时间仅为 3.8 毫秒。

5）帕累托最优

性能平衡：MobileNetV4 模型旨在在各种硬件平台上实现帕累托最优，这意味着它们在计算效率和准确性之间提供了最佳平衡。
独立于硬件的效率：这些模型经过测试，证明在不同类型的硬件上表现良好，从 CPU 和 GPU 到 DSP 和专用加速器，使其用途广泛且适用性广泛。

6）设计考虑

运算强度：MobileNetV4 的设计考虑了不同硬件的运算强度，平衡了计算负载和内存带宽，以最大限度地提高性能。
层优化：MobileNetV4 的初始层设计为计算密集型，以提高模型容量和准确性，而最终层则专注于即使在高 RP（脊点）硬件上也能保持准确性。

总之，MobileNetV4 集成了先进的技术和创新，创建了一系列在各种移动设备上都高效准确的模型。其设计原则强调灵活性、硬件感知优化以及计算效率和模型性能之间的平衡。

二、YOLOv5 添加 MobileNetV4 模块

MobileNetV4处理图像的主要步骤可以概括为以下几个关键阶段：

1）图像预处理

在输入模型之前，图像会经过一些预处理步骤，这通常包括：

**缩放：**将图像调整到指定的输入大小（如224x224）。
**归一化：**对图像像素值进行归一化处理，使其值在特定范围内（如0到1或-1到1）。

2）初始卷积层

图像首先通过一个标准卷积层，该层具有较大的卷积核（如3x3或5x5），用于捕捉低级别的特征，例如边缘和纹理。这一层通常是计算密集型的，以便在早期捕获更多的信息。

3）倒瓶颈块（Inverted Bottleneck Block）

在MobileNetV4中，倒瓶颈块（IB块）是核心组件之一。每个倒瓶颈块包含以下子步骤：

**扩展卷积：**首先使用1x1卷积扩展特征图的通道数。
**深度可分离卷积：**然后使用深度可分离卷积（Depthwise Separable Convolution）在空间维度上进行卷积操作。
**压缩卷积：**最后通过1x1卷积将通道数压缩回去。

4）通用倒瓶颈块（Universal Inverted Bottleneck, UIB）

MobileNetV4引入了UIB，这种块可以灵活地调整，以适应不同的计算需求：

**标准倒瓶颈：**与传统倒瓶颈类似，但增加了更多的灵活性。
**ConvNext变体：**结合了现代卷积架构的优点。
**前馈网络（FFN）：**适用于需要更多非线性变换的场景。
**额外深度可分离卷积（ExtraDW）：**用于提高计算效率和特征提取能力。

5）Mobile MQA注意力块

这个模块是优化后的注意力机制，专门设计用于移动设备：

**注意力计算：**对特征图进行加权操作，以增强重要特征并抑制不重要的部分。
**快速推理：**该模块在硬件加速器上实现了快速的推理速度，提高了整体的处理效率。

6）全局平均池化

在最后的卷积层之后，模型会对特征图进行全局平均池化（Global Average Pooling），将特征图转化为固定长度的向量。这一步骤通过取每个特征图的平均值，将空间维度消除，仅保留通道维度的信息。

7）全连接层和分类层

最后，通过全连接层（Fully Connected Layer）和Softmax分类层，将特征向量转换为最终的分类结果。

通过上述步骤，MobileNetV4能够高效地处理和分类输入图像，特别适用于资源受限的移动设备上，同时保证了较高的精度和性能。

在这里插入图片描述

2.1 MobileNetV4的代码实现

在yolov5/models/下新建mobilenetv4.py，并粘贴下面的代码

"""
Creates a MobileNetV4 Model as defined in:
Danfeng Qin, Chas Leichner, Manolis Delakis, Marco Fornoni, Shixin Luo, Fan Yang, Weijun Wang, Colby Banbury, Chengxi Ye, Berkin Akin, Vaibhav Aggarwal, Tenghui Zhu, Daniele Moro, Andrew Howard. (2024).
MobileNetV4 - Universal Models for the Mobile Ecosystem
arXiv preprint arXiv:2404.10518.
"""
 
from typing import Any, Callable, Dict, List, Mapping, Optional, Tuple, Union
 
import torch
import torch.nn as nn
 
 
MNV4ConvSmall_BLOCK_SPECS = {
    "conv0": {
        "block_name": "convbn",
        "num_blocks": 1,
        "block_specs": [
            [3, 32, 3, 2]
        ]
    },
    "layer1": {
        "block_name": "convbn",
        "num_blocks": 2,
        "block_specs": [
            [32, 32, 3, 2],
            [32, 32, 1, 1]
        ]
    },
    "layer2": {
        "block_name": "convbn",
        "num_blocks": 2,
        "block_specs": [
            [32, 96, 3, 2],
            [96, 64, 1, 1]
        ]
    },
    "layer3": {
        "block_name": "uib",
        "num_blocks": 6,
        "block_specs": [
            [64, 96, 5, 5, True, 2, 3],
            [96, 96, 0, 3, True, 1, 2],
            [96, 96, 0, 3, True, 1, 2],
            [96, 96, 0, 3, True, 1, 2],
            [96, 96, 0, 3, True, 1, 2],
            [96, 96, 3, 0, True, 1, 4],
        ]
    },
    "layer4": {
        "block_name": "uib",
        "num_blocks": 6,
        "block_specs": [
            [96,  128, 3, 3, True, 2, 6],
            [128, 128, 5, 5, True, 1, 4],
            [128, 128, 0, 5, True, 1, 4],
            [128, 128, 0, 5, True, 1, 3],
            [128, 128, 0, 3, True, 1, 4],
            [128, 128, 0, 3, True, 1, 4],
        ]
    },  
    "layer5": {
        "block_name": "convbn",
        "num_blocks": 2,
        "block_specs": [
            [128, 960, 1, 1],
            [960, 1280, 1, 1]
        ]
    }
}
 
MNV4ConvMedium_BLOCK_SPECS = {
    "conv0": {
        "block_name": "convbn",
        "num_blocks": 1,
        "block_specs": [
            [3, 32, 3, 2]
        ]
    },
    "layer1": {
        "block_name": "fused_ib",
        "num_blocks": 1,
        "block_specs": [
            [32, 48, 2, 4.0, True]
        ]
    },
    "layer2": {
        "block_name": "uib",
        "num_blocks": 2,
        "block_specs": [
            [48, 80, 3, 5, True, 2, 4],
            [80, 80, 3, 3, True, 1, 2]
        ]
    },
    "layer3": {
        "block_name": "uib",
        "num_blocks": 8,
        "block_specs": [
            [80,  160, 3, 5, True, 2, 6],
            [160, 160, 3, 3, True, 1, 4],
            [160, 160, 3, 3, True, 1, 4],
            [160, 160, 3, 5, True, 1, 4],
            [160, 160, 3, 3, True, 1, 4],
            [160, 160, 3, 0, True, 1, 4],
            [160, 160, 0, 0, True, 1, 2],
            [160, 160, 3, 0, True, 1, 4]
        ]
    },
    "layer4": {
        "block_name": "uib",
        "num_blocks": 11,
        "block_specs": [
            [160, 256, 5, 5, True, 2, 6],
            [256, 256, 5, 5, True, 1, 4],
            [256, 256, 3, 5, True, 1, 4],
            [256, 256, 3, 5, True, 1, 4],
            [256, 256, 0, 0, True, 1, 4],
            [256, 256, 3, 0, True, 1, 4],
            [256, 256, 3, 5, True, 1, 2],
            [256, 256, 5, 5, True, 1, 4],
            [256, 256, 0, 0, True, 1, 4],
            [256, 256, 0, 0, True, 1, 4],
            [256, 256, 5, 0, True, 1, 2]
        ]
    },  
    "layer5": {
        "block_name": "convbn",
        "num_blocks": 2,
        "block_specs": [
            [256, 960, 1, 1],
            [960, 1280, 1, 1]
        ]
    }
}
 
MNV4ConvLarge_BLOCK_SPECS = {
    "conv0": {
        "block_name": "convbn",
        "num_blocks": 1,
        "block_specs": [
            [3, 24, 3, 2]
        ]
    },
    "layer1": {
        "block_name": "fused_ib",
        "num_blocks": 1,
        "block_specs": [
            [24, 48, 2, 4.0, True]
        ]
    },
    "layer2": {
        "block_name": "uib",
        "num_blocks": 2,
        "block_specs": [
            [48, 96, 3, 5, True, 2, 4],
            [96, 96, 3, 3, True, 1, 4]
        ]
    },
    "layer3": {
        "block_name": "uib",
        "num_blocks": 11,
        "block_specs": [
            [96,  192, 3, 5, True, 2, 4],
            [192, 192, 3, 3, True, 1, 4],
            [192, 192, 3, 3, True, 1, 4],
            [192, 192, 3, 3, True, 1, 4],
            [192, 192, 3, 5, True, 1, 4],
            [192, 192, 5, 3, True, 1, 4],
            [192, 192, 5, 3, True, 1, 4],
            [192, 192, 5, 3, True, 1, 4],
            [192, 192, 5, 3, True, 1, 4],
            [192, 192, 5, 3, True, 1, 4],
            [192, 192, 3, 0, True, 1, 4]
        ]
    },
    "layer4": {
        "block_name": "uib",
        "num_blocks": 13,
        "block_specs": [
            [192, 512, 5, 5, True, 2, 4],
            [512, 512, 5, 5, True, 1, 4],
            [512, 512, 5, 5, True, 1, 4],
            [512, 512, 5, 5, True, 1, 4],
            [512, 512, 5, 0, True, 1, 4],
            [512, 512, 5, 3, True, 1, 4],
            [512, 512, 5, 0, True, 1, 4],
            [512, 512, 5, 0, True, 1, 4],
            [512, 512, 5, 3, True, 1, 4],
            [512, 512, 5, 5, True, 1, 4],
            [512, 512, 5, 0, True, 1, 4],
            [512, 512, 5, 0, True, 1, 4],
            [512, 512, 5, 0, True, 1, 4]
        ]
    },  
    "layer5": {
        "block_name": "convbn",
        "num_blocks": 2,
        "block_specs": [
            [512, 960, 1, 1],
            [960, 1280, 1, 1]
        ]
    }
}
 
MNV4HybridConvMedium_BLOCK_SPECS = {
 
}
 
MNV4HybridConvLarge_BLOCK_SPECS = {
 
}
 
MODEL_SPECS = {
    "MobileNetV4ConvSmall": MNV4ConvSmall_BLOCK_SPECS,
    "MobileNetV4ConvMedium": MNV4ConvMedium_BLOCK_SPECS,
    "MobileNetV4ConvLarge": MNV4ConvLarge_BLOCK_SPECS,
    "MobileNetV4HybridMedium": MNV4HybridConvMedium_BLOCK_SPECS,
    "MobileNetV4HybridLarge": MNV4HybridConvLarge_BLOCK_SPECS,
}
 
def make_divisible(
        value: float,
        divisor: int,
        min_value: Optional[float] = None,
        round_down_protect: bool = True,
    ) -> int:
    """
    This function is copied from here 
    "https://github.com/tensorflow/models/blob/master/official/vision/modeling/layers/nn_layers.py"
    
    This is to ensure that all layers have channels that are divisible by 8.
    Args:
        value: A `float` of original value.
        divisor: An `int` of the divisor that need to be checked upon.
        min_value: A `float` of  minimum value threshold.
        round_down_protect: A `bool` indicating whether round down more than 10%
        will be allowed.
    Returns:
        The adjusted value in `int` that is divisible against divisor.
    """
    if min_value is None:
        min_value = divisor
    new_value = max(min_value, int(value + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if round_down_protect and new_value < 0.9 * value:
        new_value += divisor
    return int(new_value)
 
def conv_2d(inp, oup, kernel_size=3, stride=1, groups=1, bias=False, norm=True, act=True):
    conv = nn.Sequential()
    padding = (kernel_size - 1) // 2
    conv.add_module('conv', nn.Conv2d(inp, oup, kernel_size, stride, padding, bias=bias, groups=groups))
    if norm:
        conv.add_module('BatchNorm2d', nn.BatchNorm2d(oup))
    if act:
        conv.add_module('Activation', nn.ReLU6())
    return conv
 
class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio, act=False):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]
        hidden_dim = int(round(inp * expand_ratio))
        self.block = nn.Sequential()
        if expand_ratio != 1:
            self.block.add_module('exp_1x1', conv_2d(inp, hidden_dim, kernel_size=1, stride=1))
        self.block.add_module('conv_3x3', conv_2d(hidden_dim, hidden_dim, kernel_size=3, stride=stride, groups=hidden_dim))
        self.block.add_module('red_1x1', conv_2d(hidden_dim, oup, kernel_size=1, stride=1, act=act))
        self.use_res_connect = self.stride == 1 and inp == oup
 
    def forward(self, x):
        if self.use_res_connect:
            return x + self.block(x)
        else:
            return self.block(x)
 
class UniversalInvertedBottleneckBlock(nn.Module):
    def __init__(self, 
            inp, 
            oup, 
            start_dw_kernel_size, 
            middle_dw_kernel_size, 
            middle_dw_downsample,
            stride,
            expand_ratio
        ):
        super().__init__()
        # Starting depthwise conv.
        self.start_dw_kernel_size = start_dw_kernel_size
        if self.start_dw_kernel_size:            
            stride_ = stride if not middle_dw_downsample else 1
            self._start_dw_ = conv_2d(inp, inp, kernel_size=start_dw_kernel_size, stride=stride_, groups=inp, act=False)
        # Expansion with 1x1 convs.
        expand_filters = make_divisible(inp * expand_ratio, 8)
        self._expand_conv = conv_2d(inp, expand_filters, kernel_size=1)
        # Middle depthwise conv.
        self.middle_dw_kernel_size = middle_dw_kernel_size
        if self.middle_dw_kernel_size:
            stride_ = stride if middle_dw_downsample else 1
            self._middle_dw = conv_2d(expand_filters, expand_filters, kernel_size=middle_dw_kernel_size, stride=stride_, groups=expand_filters)
        # Projection with 1x1 convs.
        self._proj_conv = conv_2d(expand_filters, oup, kernel_size=1, stride=1, act=False)
        
        # Ending depthwise conv.
        # this not used
        # _end_dw_kernel_size = 0
        # self._end_dw = conv_2d(oup, oup, kernel_size=_end_dw_kernel_size, stride=stride, groups=inp, act=False)
        
    def forward(self, x):
        if self.start_dw_kernel_size:
            x = self._start_dw_(x)
            # print("_start_dw_", x.shape)
        x = self._expand_conv(x)
        # print("_expand_conv", x.shape)
        if self.middle_dw_kernel_size:
            x = self._middle_dw(x)
            # print("_middle_dw", x.shape)
        x = self._proj_conv(x)
        # print("_proj_conv", x.shape)
        return x
 
def build_blocks(layer_spec):
    if not layer_spec.get('block_name'):
        return nn.Sequential()
    block_names = layer_spec['block_name']
    layers = nn.Sequential()
    if block_names == "convbn":
        schema_ = ['inp', 'oup', 'kernel_size', 'stride']
        args = {}
        for i in range(layer_spec['num_blocks']):
            args = dict(zip(schema_, layer_spec['block_specs'][i]))
            layers.add_module(f"convbn_{i}", conv_2d(**args))
    elif block_names == "uib":
        schema_ =  ['inp', 'oup', 'start_dw_kernel_size', 'middle_dw_kernel_size', 'middle_dw_downsample', 'stride', 'expand_ratio']
        args = {}
        for i in range(layer_spec['num_blocks']):
            args = dict(zip(schema_, layer_spec['block_specs'][i]))
            layers.add_module(f"uib_{i}", UniversalInvertedBottleneckBlock(**args))
    elif block_names == "fused_ib":
        schema_ = ['inp', 'oup', 'stride', 'expand_ratio', 'act']
        args = {}
        for i in range(layer_spec['num_blocks']):
            args = dict(zip(schema_, layer_spec['block_specs'][i]))
            layers.add_module(f"fused_ib_{i}", InvertedResidual(**args))
    else:
        raise NotImplementedError
    return layers
 
 
class MobileNetV4(nn.Module):
    def __init__(self, model):
        # MobileNetV4ConvSmall  MobileNetV4ConvMedium  MobileNetV4ConvLarge
        # MobileNetV4HybridMedium  MobileNetV4HybridLarge
        """Params to initiate MobilenNetV4
        Args:
            model : support 5 types of models as indicated in 
            "https://github.com/tensorflow/models/blob/master/official/vision/modeling/backbones/mobilenet.py"        
        """
        super().__init__()
        assert model in MODEL_SPECS.keys()
        self.model = model
        self.spec = MODEL_SPECS[self.model]
       
        # conv0
        self.conv0 = build_blocks(self.spec['conv0'])
        # layer1
        self.layer1 = build_blocks(self.spec['layer1'])
        # layer2
        self.layer2 = build_blocks(self.spec['layer2'])
        # layer3
        self.layer3 = build_blocks(self.spec['layer3'])
        # layer4
        self.layer4 = build_blocks(self.spec['layer4'])
        # layer5   
        self.layer5 = build_blocks(self.spec['layer5'])
        self.features = nn.ModuleList([self.conv0, self.layer1, self.layer2, self.layer3, self.layer4, self.layer5])     
        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
        
    def forward(self, x):
        input_size = x.size(2)
        scale = [4, 8, 16, 32]
        features = [None, None, None, None]
        for f in self.features:
            x = f(x)
            if input_size // x.size(2) in scale:
                features[scale.index(input_size // x.size(2))] = x
        return features
 
def MobileNetV4ConvSmall():
    model = MobileNetV4('MobileNetV4ConvSmall')
    return model
 
def MobileNetV4ConvMedium():
    model = MobileNetV4('MobileNetV4ConvMedium')
    return model
 
def MobileNetV4ConvLarge():
    model = MobileNetV4('MobileNetV4ConvLarge')
    return model
 
def MobileNetV4HybridMedium():
    model = MobileNetV4('MobileNetV4HybridMedium')
    return model
 
def MobileNetV4HybridLarge():
    model = MobileNetV4('MobileNetV4HybridLarge')
    return model

2.2 新增 yaml文件

在下/yolov5/models下新建文件 **yolov5_MobileNetv4.yaml**并将下面代码复制进去

# YOLOv5 🚀 by Ultralytics, AGPL-3.0 license
 
# Parameters
nc: 80  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32
 
# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, MobileNetV4ConvSmall, []],#4
   [-1, 1, SPPF, [1024, 5]],  # 5
  ]
 
# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 3], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 9
 
   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 2], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 13 (P3/8-small)
 
   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 16 (P4/16-medium)
 
   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 5], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 19 (P5/32-large)
 
   [[13, 16, 19], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

2.3 注册模块

替换yolo.py的parse_model函数

def parse_model(d, ch):  # model_dict, input_channels(3)
    # Parse a YOLOv5 model.yaml dictionary
    LOGGER.info(f"\n{'':>3}{'from':>18}{'n':>3}{'params':>10}  {'module':<40}{'arguments':<30}")
    anchors, nc, gd, gw, act = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple'], d.get('activation')
    if act:
        Conv.default_act = eval(act)  # redefine default activation, i.e. Conv.default_act = nn.SiLU()
        LOGGER.info(f"{colorstr('activation:')} {act}")  # print
    na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors  # number of anchors
    no = na * (nc + 5)  # number of outputs = anchors * (classes + 5)
 
    is_backbone = False
    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out
    for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):  # from, number, module, args
        try:
            t = m
            m = eval(m) if isinstance(m, str) else m  # eval strings
        except:
            pass
        for j, a in enumerate(args):
            with contextlib.suppress(NameError):
                try:
                    args[j] = eval(a) if isinstance(a, str) else a  # eval strings
                except:
                    args[j] = a
 
        n = n_ = max(round(n * gd), 1) if n > 1 else n  # depth gain
        if m in {
                Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,
                BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x}:
            c1, c2 = ch[f], args[0]
            if c2 != no:  # if not output
                c2 = make_divisible(c2 * gw, 8)
 
            args = [c1, c2, *args[1:]]
            if m in {BottleneckCSP, C3, C3TR, C3Ghost, C3x}:
                args.insert(2, n)  # number of repeats
                n = 1
        elif m is nn.BatchNorm2d:
            args = [ch[f]]
        elif m is Concat:
            c2 = sum(ch[x] for x in f)
        # TODO: channel, gw, gd
        elif m in {Detect, Segment}:
            args.append([ch[x] for x in f])
            if isinstance(args[1], int):  # number of anchors
                args[1] = [list(range(args[1] * 2))] * len(f)
            if m is Segment:
                args[3] = make_divisible(args[3] * gw, 8)
        elif m is Contract:
            c2 = ch[f] * args[0] ** 2
        elif m is Expand:
            c2 = ch[f] // args[0] ** 2
        elif isinstance(m, str): 
            t = m
            m = timm.create_model(m, pretrained=args[0], features_only=True)
            c2 = m.feature_info.channels()
        elif m in {MobileNetV4ConvSmall}:
            m = m(*args)
            c2 = m.channel
        else:
            c2 = ch[f]
        if isinstance(c2, list):
            is_backbone = True
            m_ = m
            m_.backbone = True
        else:
            m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module
            t = str(m)[8:-2].replace('__main__.', '')  # module type
        np = sum(x.numel() for x in m_.parameters())  # number params
        m_.i, m_.f, m_.type, m_.np = i + 4 if is_backbone else i, f, t, np  # attach index, 'from' index, type, number params
        LOGGER.info(f'{i:>3}{str(f):>18}{n_:>3}{np:10.0f}  {t:<40}{str(args):<30}')  # print
        save.extend(x % (i + 4 if is_backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist
        layers.append(m_)
        if i == 0:
            ch = []
        if isinstance(c2, list):
            ch.extend(c2)
            for _ in range(5 - len(ch)):
                ch.insert(0, 0)
        else:
            ch.append(c2)
    return nn.Sequential(*layers), sorted(save)

2.4 替换函数

替换yolo.py的_forward_once函数

    def _forward_once(self, x, profile=False, visualize=False):
        y, dt = [], []  # outputs
        for m in self.model:
            if m.f != -1:  # if not from previous layer
                x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers
            if profile:
                self._profile_one_layer(m, x, dt)
            if hasattr(m, 'backbone'):
                x = m(x)
                for _ in range(5 - len(x)):
                    x.insert(0, None)
                for i_idx, i in enumerate(x):
                    if i_idx in self.save:
                        y.append(i)
                    else:
                        y.append(None)
                x = x[-1]
            else:
                x = m(x)  # run
                y.append(x if m.i in self.save else None)  # save output
            if visualize:
                feature_visualization(x, m.type, m.i, save_dir=visualize)
        return x

2.5 执行程序

自行替换 --data 数据集，cfg/classification.yaml
自行替换 --weights 模型权重， weights/yolov5s.pt

nohup python train.py --batch-size 16 --epochs 200 --cfg models/yolov5_mobilenetv4.yaml --data cfg/classification.yaml --weights weights/yolov5s.pt --device 0 > myout.file 2>&1 &