【YOLOv8】YOLOv8改进系列(4)----替换C2f之FasterNet中的FasterBlock替换C2f中的Bottleneck

主页:HABUO🍁主页:HABUO

🍁YOLOv8入门+改进专栏🍁

🍁如果再也不能见到你,祝你早安,午安,晚安🍁


  【YOLOv8改进系列】: 

 

YOLOv8改进系列(2)----替换主干网络之FasterNet

YOLOv8改进系列(3)----替换主干网络之ConvNeXt V2

YOLOv8改进系列(4)----替换C2f之FasterNet中的FasterBlock替换C2f中的Bottleneck 

YOLOv8改进系列(5)----替换主干网络之EfficientFormerV2 

YOLOv8改进系列(6)----替换主干网络之VanillaNet

YOLOv8改进系列(7)----替换主干网络之LSKNet

YOLOv8改进系列(8)----替换主干网络之Swin Transformer 

YOLOv8改进系列(9)----替换主干网络之RepViT


目录

💯一、FasterNet介绍

1. 简介

2. PConv

2.1. DWConv的问题

2.2. 部分卷积(PConv)

3. FasterNet架构

4. FasterBlock

💯二、具体添加方法 

第①步:定位到block.py

第②步:进行声明 

(1)定位到block.py

​(2)定位到_init_.py

(3)定位到tasks.py 

第③步:yolov8.yaml文件修改   

第④步:验证是否加入成功  


💯一、FasterNet介绍

1. 简介

论文提出了一种新的神经网络架构 FasterNet,旨在通过提高浮点运算每秒(FLOPS)来实现更快的网络速度,同时不牺牲准确性。通过重新审视流行的卷积操作,发现深度可分离卷积(DWConv)等操作虽然减少了浮点运算(FLOPs),但频繁的内存访问导致了低效的FLOPS。为此,作者提出了一种新的部分卷积(PConv),通过减少冗余计算和内存访问,提高了计算效率。基于PConv,FasterNet在多种设备上实现了显著更高的运行速度,并在各种视觉任务上保持了高准确性。

2. PConv

2.1. DWConv的问题

  • 内存访问:DWConv虽然减少了FLOPs,但频繁的内存访问导致了低效的FLOPS。

  • 计算复杂度:为了补偿精度损失,DWConv通常需要增加网络宽度,这进一步增加了内存访问。

2.2. 部分卷积(PConv)

  • 设计:PConv通过仅在部分输入通道上应用卷积,同时保持其他通道不变,从而减少冗余计算和内存访问。

  • 优势:相比常规卷积,PConv的FLOPs更低,而相比DWConv/GConv,PConv的FLOPS更高,能更有效地利用设备的计算能力。

  • 实现:PConv通过利用特征图中的冗余信息,仅对部分通道进行卷积,然后通过逐点卷积(PWConv)聚合信息,形成T形的接受野,集中处理中心位置。


3. FasterNet架构

  • 结构:FasterNet基于PConv构建,包含四个层次化的阶段,每个阶段由嵌入层或合并层进行空间下采样和通道扩展,随后是多个FasterNet块。

  • 特点

    • 硬件友好:设计简洁,适用于多种设备(GPU、CPU、ARM处理器)。

    • 高效计算:通过PConv和PWConv的组合,实现了高效的特征提取和信息聚合。

    • 多种变体:提供了从Tiny到Large不同规模的FasterNet变体,以适应不同的计算预算。

  • 具体实现

    • 嵌入层:使用4×4卷积进行空间下采样。

    • 合并层:使用2×2卷积进行通道扩展。

    • FasterNet块:每个块包含一个PConv层,后接两个PWConv层,形成倒残差块结构。


4. FasterBlock

  • FasterBlock 是 FasterNet 的核心构建模块,它结合了部分卷积(PConv)和逐点卷积(PWConv)来实现高效的特征提取和信息聚合。以下是 FasterBlock 的详细设计和功能:

  • FasterBlock 的基本结构如下:

    • PConv 层:部分卷积层,仅在部分输入通道上应用卷积,同时保持其他通道不变。

    • 两个 PWConv 层:逐点卷积层,用于在通道维度上进行特征变换和信息聚合。

    • 残差连接:在块的末尾使用残差连接,以帮助梯度流动和提高训练稳定性。

  • PConv 层

    • 部分卷积:PConv 通过仅对部分输入通道进行卷积操作,减少了冗余计算和内存访问。具体来说,PConv 选择一部分通道(例如,1/4 的通道)进行卷积,而其他通道保持不变。

    • 计算效率:PConv 的计算复杂度显著降低,同时通过减少内存访问提高了 FLOPS。

  • PWConv 层

    • 逐点卷积:PWConv 是 1×1 卷积,用于在通道维度上进行特征变换和信息聚合。它可以帮助将 PConv 提取的特征信息更好地整合到所有通道中。

    • 两个 PWConv 层:第一个 PWConv 层用于扩展通道数,第二个 PWConv 层用于减少通道数,形成倒残差块结构。

  • 残差连接

    • 残差连接:在块的末尾使用残差连接,将输入特征图与输出特征图相加,以帮助梯度流动和提高训练稳定性。

  • 特征提取:PConv 层通过部分卷积提取空间特征,减少冗余计算和内存访问。

  • 信息聚合:第一个 PWConv 层扩展通道数,第二个 PWConv 层减少通道数,形成倒残差块结构,帮助更好地聚合信息。

  • 残差连接:残差连接帮助梯度流动,提高训练稳定性。


💯二、具体添加方法 

第①步:定位到block.py

找到之后,将下面代码直接复制到末尾:

from timm.models.layers import DropPath
class Partial_conv3(nn.Module):
    def __init__(self, dim, n_div=4, forward='split_cat'):
        super().__init__()
        self.dim_conv3 = dim // n_div
        self.dim_untouched = dim - self.dim_conv3
        self.partial_conv3 = nn.Conv2d(self.dim_conv3, self.dim_conv3, 3, 1, 1, bias=False)

        if forward == 'slicing':
            self.forward = self.forward_slicing
        elif forward == 'split_cat':
            self.forward = self.forward_split_cat
        else:
            raise NotImplementedError

    def forward_slicing(self, x):
        # only for inference
        x = x.clone()   # !!! Keep the original input intact for the residual connection later
        x[:, :self.dim_conv3, :, :] = self.partial_conv3(x[:, :self.dim_conv3, :, :])
        return x

    def forward_split_cat(self, x):
        # for training/inference
        x1, x2 = torch.split(x, [self.dim_conv3, self.dim_untouched], dim=1)
        x1 = self.partial_conv3(x1)
        x = torch.cat((x1, x2), 1)
        return x

class Faster_Block(nn.Module):
    def __init__(self,
                 inc,
                 dim,
                 n_div=4,
                 mlp_ratio=2,
                 drop_path=0.1,
                 layer_scale_init_value=0.0,
                 pconv_fw_type='split_cat'
                 ):
        super().__init__()
        self.dim = dim
        self.mlp_ratio = mlp_ratio
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
        self.n_div = n_div

        mlp_hidden_dim = int(dim * mlp_ratio)

        mlp_layer = [
            Conv(dim, mlp_hidden_dim, 1),
            nn.Conv2d(mlp_hidden_dim, dim, 1, bias=False)
        ]

        self.mlp = nn.Sequential(*mlp_layer)

        self.spatial_mixing = Partial_conv3(
            dim,
            n_div,
            pconv_fw_type
        )
        
        self.adjust_channel = None
        if inc != dim:
            self.adjust_channel = Conv(inc, dim, 1)

        if layer_scale_init_value > 0:
            self.layer_scale = nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True)
            self.forward = self.forward_layer_scale
        else:
            self.forward = self.forward

    def forward(self, x):
        if self.adjust_channel is not None:
            x = self.adjust_channel(x)
        shortcut = x
        x = self.spatial_mixing(x)
        x = shortcut + self.drop_path(self.mlp(x))
        return x

    def forward_layer_scale(self, x):
        shortcut = x
        x = self.spatial_mixing(x)
        x = shortcut + self.drop_path(
            self.layer_scale.unsqueeze(-1).unsqueeze(-1) * self.mlp(x))
        return x

class C3_Faster(C3):
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        super().__init__(c1, c2, n, shortcut, g, e)
        c_ = int(c2 * e)  # hidden channels
        self.m = nn.Sequential(*(Faster_Block(c_, c_) for _ in range(n)))

class C2f_Faster(C2f):
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        super().__init__(c1, c2, n, shortcut, g, e)
        self.m = nn.ModuleList(Faster_Block(self.c, self.c) for _ in range(n))

第②步:进行声明 

(1)定位到block.py

​(2)定位到_init_.py

(3)定位到tasks.py 

修改parse_model函数

可以直接把下面的代码粘贴到对应的位置中

def parse_model(d, ch, verbose=True):  # model_dict, input_channels(3)
    """
    Parse a YOLO model.yaml dictionary into a PyTorch model.

    Args:
        d (dict): Model dictionary.
        ch (int): Input channels.
        verbose (bool): Whether to print model details.

    Returns:
        (tuple): Tuple containing the PyTorch model and sorted list of output layers.
    """
    import ast

    # Args
    max_channels = float("inf")
    nc, act, scales = (d.get(x) for x in ("nc", "activation", "scales"))
    depth, width, kpt_shape = (d.get(x, 1.0) for x in ("depth_multiple", "width_multiple", "kpt_shape"))
    if scales:
        scale = d.get("scale")
        if not scale:
            scale = tuple(scales.keys())[0]
            LOGGER.warning(f"WARNING ⚠️ no model scale passed. Assuming scale='{scale}'.")
        if len(scales[scale]) == 3:
            depth, width, max_channels = scales[scale]
        elif len(scales[scale]) == 4:
            depth, width, max_channels, threshold = scales[scale]

    if act:
        Conv.default_act = eval(act)  # redefine default activation, i.e. Conv.default_act = nn.SiLU()
        if verbose:
            LOGGER.info(f"{colorstr('activation:')} {act}")  # print

    if verbose:
        LOGGER.info(f"\n{'':>3}{'from':>20}{'n':>3}{'params':>10}  {'module':<60}{'arguments':<50}")
    ch = [ch]
    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out
    is_backbone = False
    for i, (f, n, m, args) in enumerate(d["backbone"] + d["head"]):  # from, number, module, args
        try:
            if m == 'node_mode':
                m = d[m]
                if len(args) > 0:
                    if args[0] == 'head_channel':
                        args[0] = int(d[args[0]])
            t = m
            m = getattr(torch.nn, m[3:]) if 'nn.' in m else globals()[m]  # get module
        except:
            pass
        for j, a in enumerate(args):
            if isinstance(a, str):
                with contextlib.suppress(ValueError):
                    try:
                        args[j] = locals()[a] if a in locals() else ast.literal_eval(a)
                    except:
                        args[j] = a
        n = n_ = max(round(n * depth), 1) if n > 1 else n  # depth gain
        if m in {
            Classify, Conv, ConvTranspose, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, Focus,
            BottleneckCSP, C1, C2, C2f, ELAN1, AConv, SPPELAN, C2fAttn, C3, C3TR,
            C3Ghost, nn.Conv2d, nn.ConvTranspose2d, DWConvTranspose2d, C3x, RepC3, PSA, SCDown, C2fCIB,
            C2f_Faster
        }:
            if args[0] == 'head_channel':
                args[0] = d[args[0]]
            c1, c2 = ch[f], args[0]
            if c2 != nc:  # if c2 not equal to number of classes (i.e. for Classify() output)
                c2 = make_divisible(min(c2, max_channels) * width, 8)
            if m is C2fAttn:
                args[1] = make_divisible(min(args[1], max_channels // 2) * width, 8)  # embed channels
                args[2] = int(
                    max(round(min(args[2], max_channels // 2 // 32)) * width, 1) if args[2] > 1 else args[2]
                )  # num heads

            args = [c1, c2, *args[1:]]
            if m in {C2f_Faster}:
                args.insert(2, n)  # number of repeats
                n = 1

        elif m in {AIFI}:
            args = [ch[f], *args]
            c2 = args[0]
        elif m in (HGStem, HGBlock):
            c1, cm, c2 = ch[f], args[0], args[1]
            if c2 != nc:  # if c2 not equal to number of classes (i.e. for Classify() output)
                c2 = make_divisible(min(c2, max_channels) * width, 8)
                cm = make_divisible(min(cm, max_channels) * width, 8)
            args = [c1, cm, c2, *args[2:]]
            if m in (HGBlock):
                args.insert(4, n)  # number of repeats
                n = 1
        elif m is ResNetLayer:
            c2 = args[1] if args[3] else args[1] * 4
        elif m is nn.BatchNorm2d:
            args = [ch[f]]
        elif m is Concat:
            c2 = sum(ch[x] for x in f)
        elif m in frozenset({Detect, WorldDetect, Segment, Pose, OBB, ImagePoolingAttn, v10Detect}):
            args.append([ch[x] for x in f])
        elif m is RTDETRDecoder:  # special case, channels arg must be passed in index 1
            args.insert(1, [ch[x] for x in f])
        elif m is CBLinear:
            c2 = make_divisible(min(args[0][-1], max_channels) * width, 8)
            c1 = ch[f]
            args = [c1, [make_divisible(min(c2_, max_channels) * width, 8) for c2_ in args[0]], *args[1:]]
        elif m is CBFuse:
            c2 = ch[f[-1]]
        elif isinstance(m, str):
            t = m
            if len(args) == 2:
                m = timm.create_model(m, pretrained=args[0], pretrained_cfg_overlay={'file': args[1]},
                                      features_only=True)
            elif len(args) == 1:
                m = timm.create_model(m, pretrained=args[0], features_only=True)
            c2 = m.feature_info.channels()
        # elif m in {SwinTransformer_Tiny
        #            }:
        #     m = m(*args)
        #     c2 = m.channel
        else:
            c2 = ch[f]


        if isinstance(c2, list):
            is_backbone = True
            m_ = m
            m_.backbone = True
        else:
            m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module
            t = str(m)[8:-2].replace('__main__.', '')  # module type
        m.np = sum(x.numel() for x in m_.parameters())  # number params
        m_.i, m_.f, m_.type = i + 4 if is_backbone else i, f, t  # attach index, 'from' index, type
        if verbose:
            LOGGER.info(f"{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f}  {t:<60}{str(args):<50}")  # print
        save.extend(x % (i + 4 if is_backbone else i) for x in ([f] if isinstance(f, int) else f) if
                    x != -1)  # append to savelist
        layers.append(m_)
        if i == 0:
            ch = []
        if isinstance(c2, list):
            ch.extend(c2)
            for _ in range(5 - len(ch)):
                ch.insert(0, 0)
        else:
            ch.append(c2)
    return nn.Sequential(*layers), sorted(save)

 具体改进差别如下图所示:

 

第③步:yolov8.yaml文件修改   

在下述文件夹中创立yolov8-C2f-Faster.yaml 

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4
  - [-1, 3, C2f_Faster, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8
  - [-1, 6, C2f_Faster, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16
  - [-1, 6, C2f_Faster, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]]  # 7-P5/32
  - [-1, 3, C2f_Faster, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]]  # 9

# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f_Faster, [512]]  # 12

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 4], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f_Faster, [256]]  # 15 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f_Faster, [512]]  # 18 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f_Faster, [1024]]  # 21 (P5/32-large)

  - [[15, 18, 21], 1, Detect, [nc]]  # Detect(P3, P4, P5)

第④步:验证是否加入成功   

将train.py中的配置文件进行修改,并运行  


🏋不是每一粒种子都能开花,但播下种子就比荒芜的旷野强百倍🏋

🍁YOLOv8入门+改进专栏🍁


 【YOLOv8改进系列】: 

 

YOLOv8改进系列(2)----替换主干网络之FasterNet

YOLOv8改进系列(3)----替换主干网络之ConvNeXt V2

YOLOv8改进系列(4)----替换C2f之FasterNet中的FasterBlock替换C2f中的Bottleneck 

YOLOv8改进系列(5)----替换主干网络之EfficientFormerV2 

YOLOv8改进系列(6)----替换主干网络之VanillaNet

YOLOv8改进系列(7)----替换主干网络之LSKNet

YOLOv8改进系列(8)----替换主干网络之Swin Transformer 

YOLOv8改进系列(9)----替换主干网络之RepViT


评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值