【YOLOv8】YOLOv8改进系列（4）----替换C2f之FasterNet中的FasterBlock替换C2f中的Bottleneck

HABuo

已于 2025-03-24 23:06:21 修改

阅读量2.5k

点赞数 47

分类专栏： YOLOv8入门+改进文章标签： YOLO 目标检测深度学习计算机视觉人工智能

于 2025-03-08 11:23:16 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_53706413/article/details/146107582

版权

YOLOv8入门+改进专栏收录该内容

13 篇文章

订阅专栏

主页：HABUO🍁主页：HABUO

🍁YOLOv8入门+改进专栏🍁

🍁如果再也不能见到你，祝你早安，午安，晚安🍁

【YOLOv8改进系列】：

YOLOv8改进系列（2）----替换主干网络之FasterNet

YOLOv8改进系列（3）----替换主干网络之ConvNeXt V2

YOLOv8改进系列（4）----替换C2f之FasterNet中的FasterBlock替换C2f中的Bottleneck

YOLOv8改进系列（5）----替换主干网络之EfficientFormerV2

YOLOv8改进系列（6）----替换主干网络之VanillaNet

YOLOv8改进系列（7）----替换主干网络之LSKNet

YOLOv8改进系列（8）----替换主干网络之Swin Transformer

YOLOv8改进系列（9）----替换主干网络之RepViT

目录

💯一、FasterNet介绍

2.1. DWConv的问题

2.2. 部分卷积（PConv）

3. FasterNet架构

💯二、具体添加方法

第①步：定位到block.py

第②步：进行声明

(1)定位到block.py

(2)定位到_init_.py

(3)定位到tasks.py

第③步：yolov8.yaml文件修改

第④步：验证是否加入成功

💯一、FasterNet介绍

论文题目：《Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks》
论文地址：https://arxiv.org/pdf/2303.03667

1. 简介

论文提出了一种新的神经网络架构 FasterNet，旨在通过提高浮点运算每秒(FLOPS)来实现更快的网络速度，同时不牺牲准确性。通过重新审视流行的卷积操作，发现深度可分离卷积(DWConv)等操作虽然减少了浮点运算(FLOPs)，但频繁的内存访问导致了低效的FLOPS。为此，作者提出了一种新的部分卷积(PConv)，通过减少冗余计算和内存访问，提高了计算效率。基于PConv，FasterNet在多种设备上实现了显著更高的运行速度，并在各种视觉任务上保持了高准确性。

2. PConv

2.1. DWConv的问题

内存访问：DWConv虽然减少了FLOPs，但频繁的内存访问导致了低效的FLOPS。
计算复杂度：为了补偿精度损失，DWConv通常需要增加网络宽度，这进一步增加了内存访问。

2.2. 部分卷积（PConv）

设计：PConv通过仅在部分输入通道上应用卷积，同时保持其他通道不变，从而减少冗余计算和内存访问。
优势：相比常规卷积，PConv的FLOPs更低，而相比DWConv/GConv，PConv的FLOPS更高，能更有效地利用设备的计算能力。
实现：PConv通过利用特征图中的冗余信息，仅对部分通道进行卷积，然后通过逐点卷积（PWConv）聚合信息，形成T形的接受野，集中处理中心位置。

3. FasterNet架构

结构：FasterNet基于PConv构建，包含四个层次化的阶段，每个阶段由嵌入层或合并层进行空间下采样和通道扩展，随后是多个FasterNet块。
特点：
- 硬件友好：设计简洁，适用于多种设备（GPU、CPU、ARM处理器）。
- 高效计算：通过PConv和PWConv的组合，实现了高效的特征提取和信息聚合。
- 多种变体：提供了从Tiny到Large不同规模的FasterNet变体，以适应不同的计算预算。
具体实现：
- 嵌入层：使用4×4卷积进行空间下采样。
- 合并层：使用2×2卷积进行通道扩展。
- FasterNet块：每个块包含一个PConv层，后接两个PWConv层，形成倒残差块结构。

4. FasterBlock

FasterBlock 是 FasterNet 的核心构建模块，它结合了部分卷积（PConv）和逐点卷积（PWConv）来实现高效的特征提取和信息聚合。以下是 FasterBlock 的详细设计和功能：
FasterBlock 的基本结构如下：
- PConv 层：部分卷积层，仅在部分输入通道上应用卷积，同时保持其他通道不变。
- 两个 PWConv 层：逐点卷积层，用于在通道维度上进行特征变换和信息聚合。
- 残差连接：在块的末尾使用残差连接，以帮助梯度流动和提高训练稳定性。
PConv 层：
- 部分卷积：PConv 通过仅对部分输入通道进行卷积操作，减少了冗余计算和内存访问。具体来说，PConv 选择一部分通道（例如，1/4 的通道）进行卷积，而其他通道保持不变。
- 计算效率：PConv 的计算复杂度显著降低，同时通过减少内存访问提高了 FLOPS。
PWConv 层：
- 逐点卷积：PWConv 是 1×1 卷积，用于在通道维度上进行特征变换和信息聚合。它可以帮助将 PConv 提取的特征信息更好地整合到所有通道中。
- 两个 PWConv 层：第一个 PWConv 层用于扩展通道数，第二个 PWConv 层用于减少通道数，形成倒残差块结构。
残差连接：
- 残差连接：在块的末尾使用残差连接，将输入特征图与输出特征图相加，以帮助梯度流动和提高训练稳定性。
特征提取：PConv 层通过部分卷积提取空间特征，减少冗余计算和内存访问。
信息聚合：第一个 PWConv 层扩展通道数，第二个 PWConv 层减少通道数，形成倒残差块结构，帮助更好地聚合信息。
残差连接：残差连接帮助梯度流动，提高训练稳定性。

💯二、具体添加方法

第①步：定位到block.py

找到之后，将下面代码直接复制到末尾：

from timm.models.layers import DropPath
class Partial_conv3(nn.Module):
    def __init__(self, dim, n_div=4, forward='split_cat'):
        super().__init__()
        self.dim_conv3 = dim // n_div
        self.dim_untouched = dim - self.dim_conv3
        self.partial_conv3 = nn.Conv2d(self.dim_conv3, self.dim_conv3, 3, 1, 1, bias=False)

        if forward == 'slicing':
            self.forward = self.forward_slicing
        elif forward == 'split_cat':
            self.forward = self.forward_split_cat
        else:
            raise NotImplementedError

    def forward_slicing(self, x):
        # only for inference
        x = x.clone()   # !!! Keep the original input intact for the residual connection later
        x[:, :self.dim_conv3, :, :] = self.partial_conv3(x[:, :self.dim_conv3, :, :])
        return x

    def forward_split_cat(self, x):
        # for training/inference
        x1, x2 = torch.split(x, [self.dim_conv3, self.dim_untouched], dim=1)
        x1 = self.partial_conv3(x1)
        x = torch.cat((x1, x2), 1)
        return x

class Faster_Block(nn.Module):
    def __init__(self,
                 inc,
                 dim,
                 n_div=4,
                 mlp_ratio=2,
                 drop_path=0.1,
                 layer_scale_init_value=0.0,
                 pconv_fw_type='split_cat'
                 ):
        super().__init__()
        self.dim = dim
        self.mlp_ratio = mlp_ratio
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
        self.n_div = n_div

        mlp_hidden_dim = int(dim * mlp_ratio)

        mlp_layer = [
            Conv(dim, mlp_hidden_dim, 1),
            nn.Conv2d(mlp_hidden_dim, dim, 1, bias=False)
        ]

        self.mlp = nn.Sequential(*mlp_layer)

        self.spatial_mixing = Partial_conv3(
            dim,
            n_div,
            pconv_fw_type
        )
        
        self.adjust_channel = None
        if inc != dim:
            self.adjust_channel = Conv(inc, dim, 1)

        if layer_scale_init_value > 0:
            self.layer_scale = nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True)
            self.forward = self.forward_layer_scale
        else:
            self.forward = self.forward

    def forward(self, x):
        if self.adjust_channel is not None:
            x = self.adjust_channel(x)
        shortcut = x
        x = self.spatial_mixing(x)
        x = shortcut + self.drop_path(self.mlp(x))
        return x

    def forward_layer_scale(self, x):
        shortcut = x
        x = self.spatial_mixing(x)
        x = shortcut + self.drop_path(
            self.layer_scale.unsqueeze(-1).unsqueeze(-1) * self.mlp(x))
        return x

class C3_Faster(C3):
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        super().__init__(c1, c2, n, shortcut, g, e)
        c_ = int(c2 * e)  # hidden channels
        self.m = nn.Sequential(*(Faster_Block(c_, c_) for _ in range(n)))

class C2f_Faster(C2f):
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        super().__init__(c1, c2, n, shortcut, g, e)
        self.m = nn.ModuleList(Faster_Block(self.c, self.c) for _ in range(n))

第②步：进行声明

(1)定位到block.py

(2)定位到_init_.py

(3)定位到tasks.py

修改parse_model函数

可以直接把下面的代码粘贴到对应的位置中

def parse_model(d, ch, verbose=True):  # model_dict, input_channels(3)
    """
    Parse a YOLO model.yaml dictionary into a PyTorch model.

    Args:
        d (dict): Model dictionary.
        ch (int): Input channels.
        verbose (bool): Whether to print model details.

    Returns:
        (tuple): Tuple containing the PyTorch model and sorted list of output layers.
    """
    import ast

    # Args
    max_channels = float("inf")
    nc, act, scales = (d.get(x) for x in ("nc", "activation", "scales"))
    depth, width, kpt_shape = (d.get(x, 1.0) for x in ("depth_multiple", "width_multiple", "kpt_shape"))
    if scales:
        scale = d.get("scale")
        if not scale:
            scale = tuple(scales.keys())[0]
            LOGGER.warning(f"WARNING ⚠️ no model scale passed. Assuming scale='{scale}'.")
        if len(scales[scale]) == 3:
            depth, width, max_channels = scales[scale]
        elif len(scales[scale]) == 4:
            depth, width, max_channels, threshold = scales[scale]

    if act:
        Conv.default_act = eval(act)  # redefine default activation, i.e. Conv.default_act = nn.SiLU()
        if verbose:
            LOGGER.info(f"{colorstr('activation:')} {act}")  # print

    if verbose:
        LOGGER.info(f"\n{'':>3}{'from':>20}{'n':>3}{'params':>10}  {'module':<60}{'arguments':<50}")
    ch = [ch]
    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out
    is_backbone = False
    for i, (f, n, m, args) in enumerate(d["backbone"] + d["head"]):  # from, number, module, args
        try:
            if m == 'node_mode':
                m = d[m]
                if len(args) > 0:
                    if args[0] == 'head_channel':
                        args[0] = int(d[args[0]])
            t = m
            m = getattr(torch.nn, m[3:]) if 'nn.' in m else globals()[m]  # get module
        except:
            pass
        for j, a in enumerate(args):
            if isinstance(a, str):
                with contextlib.suppress(ValueError):
                    try:
                        args[j] = locals()[a] if a in locals() else ast.literal_eval(a)
                    except:
                        args[j] = a
        n = n_ = max(round(n * depth), 1) if n > 1 else n  # depth gain
        if m in {
            Classify, Conv, ConvTranspose, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, Focus,
            BottleneckCSP, C1, C2, C2f, ELAN1, AConv, SPPELAN, C2fAttn, C3, C3TR,
            C3Ghost, nn.Conv2d, nn.ConvTranspose2d, DWConvTranspose2d, C3x, RepC3, PSA, SCDown, C2fCIB,
            C2f_Faster
        }:
            if args[0] == 'head_channel':
                args[0] = d[args[0]]
            c1, c2 = ch[f], args[0]
            if c2 != nc:  # if c2 not equal to number of classes (i.e. for Classify() output)
                c2 = make_divisible(min(c2, max_channels) * width, 8)
            if m is C2fAttn:
                args[1] = make_divisible(min(args[1], max_channels // 2) * width, 8)  # embed channels
                args[2] = int(
                    max(round(min(args[2], max_channels // 2 // 32)) * width, 1) if args[2] > 1 else args[2]
                )  # num heads

            args = [c1, c2, *args[1:]]
            if m in {C2f_Faster}:
                args.insert(2, n)  # number of repeats
                n = 1

        elif m in {AIFI}:
            args = [ch[f], *args]
            c2 = args[0]
        elif m in (HGStem, HGBlock):
            c1, cm, c2 = ch[f], args[0], args[1]
            if c2 != nc:  # if c2 not equal to number of classes (i.e. for Classify() output)
                c2 = make_divisible(min(c2, max_channels) * width, 8)
                cm = make_divisible(min(cm, max_channels) * width, 8)
            args = [c1, cm, c2, *args[2:]]
            if m in (HGBlock):
                args.insert(4, n)  # number of repeats
                n = 1
        elif m is ResNetLayer:
            c2 = args[1] if args[3] else args[1] * 4
        elif m is nn.BatchNorm2d:
            args = [ch[f]]
        elif m is Concat:
            c2 = sum(ch[x] for x in f)
        elif m in frozenset({Detect, WorldDetect, Segment, Pose, OBB, ImagePoolingAttn, v10Detect}):
            args.append([ch[x] for x in f])
        elif m is RTDETRDecoder:  # special case, channels arg must be passed in index 1
            args.insert(1, [ch[x] for x in f])
        elif m is CBLinear:
            c2 = make_divisible(min(args[0][-1], max_channels) * width, 8)
            c1 = ch[f]
            args = [c1, [make_divisible(min(c2_, max_channels) * width, 8) for c2_ in args[0]], *args[1:]]
        elif m is CBFuse:
            c2 = ch[f[-1]]
        elif isinstance(m, str):
            t = m
            if len(args) == 2:
                m = timm.create_model(m, pretrained=args[0], pretrained_cfg_overlay={'file': args[1]},
                                      features_only=True)
            elif len(args) == 1:
                m = timm.create_model(m, pretrained=args[0], features_only=True)
            c2 = m.feature_info.channels()
        # elif m in {SwinTransformer_Tiny
        #            }:
        #     m = m(*args)
        #     c2 = m.channel
        else:
            c2 = ch[f]


        if isinstance(c2, list):
            is_backbone = True
            m_ = m
            m_.backbone = True
        else:
            m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module
            t = str(m)[8:-2].replace('__main__.', '')  # module type
        m.np = sum(x.numel() for x in m_.parameters())  # number params
        m_.i, m_.f, m_.type = i + 4 if is_backbone else i, f, t  # attach index, 'from' index, type
        if verbose:
            LOGGER.info(f"{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f}  {t:<60}{str(args):<50}")  # print
        save.extend(x % (i + 4 if is_backbone else i) for x in ([f] if isinstance(f, int) else f) if
                    x != -1)  # append to savelist
        layers.append(m_)
        if i == 0:
            ch = []
        if isinstance(c2, list):
            ch.extend(c2)
            for _ in range(5 - len(ch)):
                ch.insert(0, 0)
        else:
            ch.append(c2)
    return nn.Sequential(*layers), sorted(save)

具体改进差别如下图所示:

第③步：yolov8.yaml文件修改

在下述文件夹中创立yolov8-C2f-Faster.yaml

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4
  - [-1, 3, C2f_Faster, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8
  - [-1, 6, C2f_Faster, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16
  - [-1, 6, C2f_Faster, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]]  # 7-P5/32
  - [-1, 3, C2f_Faster, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]]  # 9

# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f_Faster, [512]]  # 12

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 4], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f_Faster, [256]]  # 15 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f_Faster, [512]]  # 18 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f_Faster, [1024]]  # 21 (P5/32-large)

  - [[15, 18, 21], 1, Detect, [nc]]  # Detect(P3, P4, P5)

第④步：验证是否加入成功

将train.py中的配置文件进行修改，并运行

🏋不是每一粒种子都能开花，但播下种子就比荒芜的旷野强百倍🏋

🍁YOLOv8入门+改进专栏🍁

【YOLOv8改进系列】：

YOLOv8改进系列（2）----替换主干网络之FasterNet

YOLOv8改进系列（3）----替换主干网络之ConvNeXt V2

YOLOv8改进系列（4）----替换C2f之FasterNet中的FasterBlock替换C2f中的Bottleneck

YOLOv8改进系列（5）----替换主干网络之EfficientFormerV2

YOLOv8改进系列（6）----替换主干网络之VanillaNet

YOLOv8改进系列（7）----替换主干网络之LSKNet

YOLOv8改进系列（8）----替换主干网络之Swin Transformer

YOLOv8改进系列（9）----替换主干网络之RepViT

评论 7

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。