YOLOv5改进 | 注意力机制 | 轻量高效的反向残差注意力机制

kay_545

于 2024-08-05 11:32:14 发布

阅读量442

点赞数 10

分类专栏： YOLOv5入门 + 改进涨点文章标签： YOLO 人工智能面试目标检测 python yolov5改进网络

本文链接：https://blog.csdn.net/m0_67647321/article/details/140922140

版权

YOLOv5入门 + 改进涨点专栏收录该内容

62 篇文章 52 订阅

订阅专栏

💡💡💡本专栏所有程序均经过测试，可成功执行💡💡💡

专栏目录： 《YOLOv5入门 + 改进涨点》专栏介绍 & 专栏目录 |目前已有60+篇内容，内含各种Head检测头、损失函数Loss、Backbone、Neck、NMS等创新点改进

本文介绍一种专注于开发现代、高效、轻量级的密集预测模型，在参数、FLOPs和性能之间进行权衡。倒置残差块（IRB）作为轻量级CNN的基础设施，但在基于注意力的研究中还没有相对应的识别。从高效IRB和Transformer的有效组件的统一视角重新思考轻量级基础设施，将基于CNN的IRB扩展到基于注意力的模型，并抽象出一个一残差元移动块（MMB）用于轻量级模型设计。遵循简单但有效的设计准则，推导出现代化的倒置残差移动块（iRMB），并仅用iRMB构建了类似ResNet的高效模型（EMO）用于下游任务。文章在介绍主要的原理后，将手把手教学如何进行模块的代码添加和修改，并将修改后的完整代码放在文章的最后，方便大家一键运行，小白也可轻松上手实践。以帮助您更好地学习深度学习目标检测YOLO系列的挑战。

专栏地址： YOLOv5改进+入门——持续更新各种有效涨点方法点击即可跳转

1.原理

论文地址：Rethinking Mobile Block for Efficient Attention-based Models——点击即可跳转

官方代码： 官方代码仓库——点击即可跳转

在提供的文档中，介绍的关键概念是倒置残差移动块 (iRMB)，它是轻量级 CNN 中使用的倒置残差块 (IRB)的现代改编，旨在提高基于注意力的模型的效率。下面是对 IRMB 背后主要原理的简化解释，无需深入研究复杂的公式：

倒置残差移动块 (IRMB) 的关键原理：

计算效率：

深度卷积 (DW-Conv)：与标准卷积相比，这种技术显著减少了参数数量和计算复杂度。它分别对每个输入通道进行操作，使其更加高效。
改进的多头自注意力 (EW-MHSA)：MHSA 的增强版本有助于更有效地捕获数据中远距离元素之间的依赖关系，这对于基于注意力的模型至关重要。

简单和统一：

简单设计：IRMB 避免使用复杂的结构或运算符，使其易于实现和优化各种应用。
统一核心模块：通过使用尽可能少的核心模块，IRMB 降低了整体模型复杂性，便于更轻松地部署和更快地计算。

多功能性和性能：

元移动块 (MMB)：此块是一个多功能构建块，可用于构建不同的模块，包括 IRB、MHSA 和前馈网络。它确保在各种任务中保持一致和高效的性能。
类似 ResNet 的架构 (EMO)：使用 IRMB 构建的高效模型 (EMO) 利用类似 ResNet 的 4 阶段架构，这对短距离和长距离依赖关系都有效，从而提高了整体模型性能。

实际结果：

基准性能：大量实验表明，使用 IRMB（例如 EMO-1M、EMO-2M 和 EMO-5M）构建的模型在准确性、效率（以 FLOP 衡量）和速度方面优于许多最先进的轻量级模型。

IRMB 优势总结：

提高效率：通过结合 DW-Conv 和改进的 MHSA，IRMB 实现了更高的计算效率。
简化设计：使用简单统一的设计使模型易于实现和部署。
提高性能：基于 IRMB 的模型在各种基准测试中表现出色，在参数、效率和准确性之间表现出更好的权衡。

这些原则使倒置残差移动块 (IRMB) 成为开发轻量级高效模型的强大框架，用于基于注意力的系统中进行密集预测。

2. 将IRMB添加到YOLOv5中

2.1 IRMB的代码实现

关键步骤一: 将下面代码添加到 yolov5/models/common.py中

import math
import torch.nn.functional as F
from functools import partial
from einops import rearrange
from timm.models._efficientnet_blocks import SqueezeExcite
from timm.models.layers import DropPath

inplace = True

class LayerNorm2d(nn.Module):

    def __init__(self, normalized_shape, eps=1e-6, elementwise_affine=True):
        super().__init__()
        self.norm = nn.LayerNorm(normalized_shape, eps, elementwise_affine)

    def forward(self, x):
        x = rearrange(x, 'b c h w -> b h w c').contiguous()
        x = self.norm(x)
        x = rearrange(x, 'b h w c -> b c h w').contiguous()
        return x


def get_norm(norm_layer='in_1d'):
    eps = 1e-6
    norm_dict = {
        'none': nn.Identity,
        'in_1d': partial(nn.InstanceNorm1d, eps=eps),
        'in_2d': partial(nn.InstanceNorm2d, eps=eps),
        'in_3d': partial(nn.InstanceNorm3d, eps=eps),
        'bn_1d': partial(nn.BatchNorm1d, eps=eps),
        'bn_2d': partial(nn.BatchNorm2d, eps=eps),
        # 'bn_2d': partial(nn.SyncBatchNorm, eps=eps),
        'bn_3d': partial(nn.BatchNorm3d, eps=eps),
        'gn': partial(nn.GroupNorm, eps=eps),
        'ln_1d': partial(nn.LayerNorm, eps=eps),
        'ln_2d': partial(LayerNorm2d, eps=eps),
    }
    return norm_dict[norm_layer]


def get_act(act_layer='relu'):
    act_dict = {
        'none': nn.Identity,
        'relu': nn.ReLU,
        'relu6': nn.ReLU6,
        'silu': nn.SiLU
    }
    return act_dict[act_layer]


class ConvNormAct(nn.Module):

    def __init__(self, dim_in, dim_out, kernel_size, stride=1, dilation=1, groups=1, bias=False,
                 skip=False, norm_layer='bn_2d', act_layer='relu', inplace=True, drop_path_rate=0.):
        super(ConvNormAct, self).__init__()
        self.has_skip = skip and dim_in == dim_out
        padding = math.ceil((kernel_size - stride) / 2)
        self.conv = nn.Conv2d(dim_in, dim_out, kernel_size, stride, padding, dilation, groups, bias)
        self.norm = get_norm(norm_layer)(dim_out)
        self.act = get_act(act_layer)(inplace=inplace)
        self.drop_path = DropPath(drop_path_rate) if drop_path_rate else nn.Identity()

    def forward(self, x):
        shortcut = x
        x = self.conv(x)
        x = self.norm(x)
        x = self.act(x)
        if self.has_skip:
            x = self.drop_path(x) + shortcut
        return x



class iRMB(nn.Module):

    def __init__(self, dim_in, norm_in=True, has_skip=True, exp_ratio=1.0, norm_layer='bn_2d',
                 act_layer='relu', v_proj=True, dw_ks=3, stride=1, dilation=1, se_ratio=0.0, dim_head=8, window_size=7,
                 attn_s=True, qkv_bias=False, attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False):
        super().__init__()
        dim_out = dim_in
        self.norm = get_norm(norm_layer)(dim_in) if norm_in else nn.Identity()
        dim_mid = int(dim_in * exp_ratio)
        self.has_skip = (dim_in == dim_out and stride == 1) and has_skip
        self.attn_s = attn_s
        if self.attn_s:
            assert dim_in % dim_head == 0, 'dim should be divisible by num_heads'
            self.dim_head = dim_head
            self.window_size = window_size
            self.num_head = dim_in // dim_head
            self.scale = self.dim_head ** -0.5
            self.attn_pre = attn_pre
            self.qk = ConvNormAct(dim_in, int(dim_in * 2), kernel_size=1, bias=qkv_bias, norm_layer='none',
                                  act_layer='none')
            self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, groups=self.num_head if v_group else 1, bias=qkv_bias,
                                 norm_layer='none', act_layer=act_layer, inplace=inplace)
            self.attn_drop = nn.Dropout(attn_drop)
        else:
            if v_proj:
                self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, bias=qkv_bias, norm_layer='none',
                                     act_layer=act_layer, inplace=inplace)
            else:
                self.v = nn.Identity()
        self.conv_local = ConvNormAct(dim_mid, dim_mid, kernel_size=dw_ks, stride=stride, dilation=dilation,
                                      groups=dim_mid, norm_layer='bn_2d', act_layer='silu', inplace=inplace)
        self.se = SqueezeExcite(dim_mid, rd_ratio=se_ratio, act_layer=get_act(act_layer)) if se_ratio > 0.0 else nn.Identity()

        self.proj_drop = nn.Dropout(drop)
        self.proj = ConvNormAct(dim_mid, dim_out, kernel_size=1, norm_layer='none', act_layer='none', inplace=inplace)
        self.drop_path = DropPath(drop_path) if drop_path else nn.Identity()

    def forward(self, x):
        shortcut = x
        x = self.norm(x)
        B, C, H, W = x.shape
        if self.attn_s:
            # padding
            if self.window_size <= 0:
                window_size_W, window_size_H = W, H
            else:
                window_size_W, window_size_H = self.window_size, self.window_size
            pad_l, pad_t = 0, 0
            pad_r = (window_size_W - W % window_size_W) % window_size_W
            pad_b = (window_size_H - H % window_size_H) % window_size_H
            x = F.pad(x, (pad_l, pad_r, pad_t, pad_b, 0, 0,))
            n1, n2 = (H + pad_b) // window_size_H, (W + pad_r) // window_size_W
            x = rearrange(x, 'b c (h1 n1) (w1 n2) -> (b n1 n2) c h1 w1', n1=n1, n2=n2).contiguous()
            # attention
            b, c, h, w = x.shape
            qk = self.qk(x)
            qk = rearrange(qk, 'b (qk heads dim_head) h w -> qk b heads (h w) dim_head', qk=2, heads=self.num_head,
                           dim_head=self.dim_head).contiguous()
            q, k = qk[0], qk[1]
            attn_spa = (q @ k.transpose(-2, -1)) * self.scale
            attn_spa = attn_spa.softmax(dim=-1)
            attn_spa = self.attn_drop(attn_spa)
            if self.attn_pre:
                x = rearrange(x, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()
                x_spa = attn_spa @ x
                x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,
                                  w=w).contiguous()
                x_spa = self.v(x_spa)
            else:
                v = self.v(x)
                v = rearrange(v, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()
                x_spa = attn_spa @ v
                x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,
                                  w=w).contiguous()
            # unpadding
            x = rearrange(x_spa, '(b n1 n2) c h1 w1 -> b c (h1 n1) (w1 n2)', n1=n1, n2=n2).contiguous()
            if pad_r > 0 or pad_b > 0:
                x = x[:, :, :H, :W].contiguous()
        else:
            x = self.v(x)

        x = x + self.se(self.conv_local(x)) if self.has_skip else self.se(self.conv_local(x))

        x = self.proj_drop(x)
        x = self.proj(x)

        x = (shortcut + self.drop_path(x)) if self.has_skip else x
        return x

class Bottleneck(nn.Module):
    """Standard bottleneck."""

    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
        """Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and
        expansion.
        """
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, k[0], 1)
        self.cv2 = Conv(c_, c2, k[1], 1, g=g)
        self.add = shortcut and c1 == c2
        self.iRMB = iRMB(c2)

    def forward(self, x):
        """'forward()' applies the YOLO FPN to input data."""
        return x + self.iRMB(self.cv2(self.cv1(x))) if self.add else self.iRMB(self.cv2(self.cv1(x)))



class C2f_iRMB(nn.Module):
    """Faster Implementation of CSP Bottleneck with 2 convolutions."""

    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        """Initialize CSP bottleneck layer with two convolutions with arguments ch_in, ch_out, number, shortcut, groups,
        expansion.
        """
        super().__init__()
        self.c = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))

    def forward(self, x):
        """Forward pass through C2f layer."""
        y = list(self.cv1(x).chunk(2, 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

    def forward_split(self, x):
        """Forward pass using split() instead of chunk()."""
        y = list(self.cv1(x).split((self.c, self.c), 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

在处理图像的背景下，倒置残差移动块 (iRMB) 遵循结构化方法以实现高效准确的结果。以下是 iRMB 应用于图像处理时的主要流程概述：

iRMB 在图像处理中的主要流程

输入图像：

该过程从需要处理的输入图像开始。此图像通常表示为像素值的多维数组。

初始卷积：

1x1 卷积：输入图像首先经过逐点 1x1 卷积层。此层减少了输入图像中的通道数（或特征），使后续的深度卷积更加高效。

深度卷积 (DW-Conv)：

3x3 深度卷积：然后通过深度卷积处理 1x1 卷积层的输出。与跨所有通道运行的传统卷积不同，深度卷积对每个输入通道应用单个卷积滤波器。这显著减少了计算负荷和参数数量。
深度卷积在保持计算效率的同时捕获图像中的空间特征。

逐点卷积：

1x1 卷积：在深度卷积之后，应用另一个 1x1 卷积层。该层负责跨通道组合深度卷积的输出，从而有效地增加输出特征的维度。

多头自注意力 (MHSA)：

然后使用增强的多头自注意力机制处理逐点卷积的输出。MHSA 通过计算注意力分数并动态加权输入的不同部分，帮助模型关注图像的重要部分。
此步骤对于捕获图像中的长距离依赖关系和上下文信息至关重要。

前馈网络 (FFN)：

FFN 模块：注意力增强特征通过前馈网络传递，通常由两个线性变换组成，中间有一个 ReLU 激活。这有助于进一步细化和组合前几层提取的特征。

残差连接：

在整个 iRMB 中，残差连接用于将块的输入直接添加到其输出。这有助于保留来自初始层的信息，并允许在训练期间实现更好的梯度流。

输出特征图：

iRMB 的最终输出是一个特征图，该特征图经过深度卷积、点卷积和自注意力机制的有效处理。然后，此特征图可用于各种下游任务，如图像分类、对象检测或分割。

iRMB 处理流程摘要：

高效卷积操作：1x1 和深度卷积的组合减少了参数数量和计算成本。
增强特征提取：多头自注意力捕获长距离依赖关系和重要的上下文信息。
残差连接：这些连接有助于保存信息和稳定训练。
前馈网络：细化特征以提高下游任务的性能。

通过遵循这种结构化方法，iRMB 可以有效地处理图像，平衡效率和准确性，使其适用于需要轻量级和高性能模型的应用程序。

2.2 新增yaml文件

关键步骤二：在下/yolov5/models下新建文件 yolov5_iRMB.yaml并将下面代码复制进去

目标检测yaml文件，可尝试将iRMB放在不同的位置

# Ultralytics YOLOv5 🚀, AGPL-3.0 license

# Parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
anchors:
  - [10, 13, 16, 30, 33, 23] # P3/8
  - [30, 61, 62, 45, 59, 119] # P4/16
  - [116, 90, 156, 198, 373, 326] # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [
    [-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
    [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
    [-1, 3, C3, [128]],
    [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
    [-1, 6, C3, [256]],
    [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
    [-1, 9, C3, [512]],
    [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
    [-1, 3, C3, [1024]],
    [-1, 1, SPPF, [1024, 5]], # 9
  ]

# YOLOv5 v6.0 head
head: [
    [-1, 1, Conv, [512, 1, 1]],
    [-1, 1, nn.Upsample, [None, 2, "nearest"]],
    [[-1, 6], 1, Concat, [1]], # cat backbone P4
    [-1, 3, C3, [512, False]], # 13

    [-1, 1, Conv, [256, 1, 1]],
    [-1, 1, nn.Upsample, [None, 2, "nearest"]],
    [[-1, 4], 1, Concat, [1]], # cat backbone P3
    [-1, 3, C3, [256, False]], # 17 (P3/8-small)
    [-1, 1, iRMB, []], # 18

    [-1, 1, Conv, [256, 3, 2]],
    [[-1, 14], 1, Concat, [1]], # cat head P4
    [-1, 3, C3, [512, False]], # 21 (P4/16-medium)
    [-1, 1, iRMB, []], # 22

    [-1, 1, Conv, [512, 3, 2]],
    [[-1, 10], 1, Concat, [1]], # cat head P5
    [-1, 3, C3, [1024, False]], # 25 (P5/32-large)
    [-1, 1, iRMB, []], # 26

    [[18, 22, 26], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
  ]

语义分割yaml文件

# Ultralytics YOLOv5 🚀, AGPL-3.0 license

# Parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
anchors:
  - [10, 13, 16, 30, 33, 23] # P3/8
  - [30, 61, 62, 45, 59, 119] # P4/16
  - [116, 90, 156, 198, 373, 326] # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [
    [-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
    [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
    [-1, 3, C3, [128]],
    [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
    [-1, 6, C3, [256]],
    [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
    [-1, 9, C3, [512]],
    [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
    [-1, 3, C3, [1024]],
    [-1, 1, SPPF, [1024, 5]], # 9
  ]

# YOLOv5 v6.0 head
head: [
    [-1, 1, Conv, [512, 1, 1]],
    [-1, 1, nn.Upsample, [None, 2, "nearest"]],
    [[-1, 6], 1, Concat, [1]], # cat backbone P4
    [-1, 3, C3, [512, False]], # 13

    [-1, 1, Conv, [256, 1, 1]],
    [-1, 1, nn.Upsample, [None, 2, "nearest"]],
    [[-1, 4], 1, Concat, [1]], # cat backbone P3
    [-1, 3, C3, [256, False]], # 17 (P3/8-small)
    [-1, 1, iRMB, []], # 18

    [-1, 1, Conv, [256, 3, 2]],
    [[-1, 14], 1, Concat, [1]], # cat head P4
    [-1, 3, C3, [512, False]], # 21 (P4/16-medium)
    [-1, 1, iRMB, []], # 22

    [-1, 1, Conv, [512, 3, 2]],
    [[-1, 10], 1, Concat, [1]], # cat head P5
    [-1, 3, C3, [1024, False]], # 25 (P5/32-large)
    [-1, 1, iRMB, []], # 26

    [[18, 22, 26], 1, Segment, [nc, anchors, 32, 256]], # Detect(P3, P4, P5)
  ]

温馨提示：本文只是对yolov5基础上添加模块，如果要对yolov5n/l/m/x进行添加则只需要指定对应的depth_multiple 和 width_multiple。

# YOLOv5n
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.25  # layer channel multiple
 
# YOLOv5s
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
 
# YOLOv5l 
depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple
 
# YOLOv5m
depth_multiple: 0.67  # model depth multiple
width_multiple: 0.75  # layer channel multiple
 
# YOLOv5x
depth_multiple: 1.33  # model depth multiple
width_multiple: 1.25  # layer channel multiple

2.3 注册模块

关键步骤三：在yolo.py的parse_model函数中注册添加“iRMB",

       elif m is iRMB:
            args = [ch[f], ch[f]]

2.4 执行程序

在train.py中，将cfg的参数路径设置为yolov5_iRMB.yaml的路径

建议大家写绝对路径，确保一定能找到

🚀运行程序，如果出现下面的内容则说明添加成功🚀

                 from  n    params  module                                  arguments
  0                -1  1      7040  models.common.Conv                      [3, 64, 6, 2, 2]
  1                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]
  2                -1  3    306880  models.common.C3                        [128, 128, 3]
  3                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]
  4                -1  6   2307840  models.common.C3                        [256, 256, 6]
  5                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]
  6                -1  9  13541632  models.common.C3                        [512, 512, 9]
  7                -1  1   4720640  models.common.Conv                      [512, 1024, 3, 2]
  8                -1  3  19428864  models.common.C3                        [1024, 1024, 3]
  9                -1  1   2624512  models.common.SPPF                      [1024, 1024, 5]
 10                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 12           [-1, 6]  1         0  models.common.Concat                    [1]
 13                -1  3   5126912  models.common.C3                        [1024, 512, 3, False]
 14                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 16           [-1, 4]  1         0  models.common.Concat                    [1]
 17                -1  3   1285504  models.common.C3                        [512, 256, 3, False]
 18                -1  1    265472  models.common.iRMB                      [256, 256]
 19                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]
 20          [-1, 14]  1         0  models.common.Concat                    [1]
 21                -1  3   4864768  models.common.C3                        [512, 512, 3, False]
 22                -1  1   1055232  models.common.iRMB                      [512, 512]
 23                -1  1   2360320  models.common.Conv                      [512, 512, 3, 2]
 24          [-1, 10]  1         0  models.common.Concat                    [1]
 25                -1  3  19428864  models.common.C3                        [1024, 1024, 3, False]        
 26                -1  1   4207616  models.common.iRMB                      [1024, 1024]
 27      [18, 22, 26]  1    457725  Detect                                  [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]]
YOLOv5_iRMB summary: 1304 layers, 84787133 parameters, 84787133 gradients, 660.7 GFLOPs

3. 完整代码分享

https://pan.baidu.com/s/1-5-_AfvIVjIg66wpwATVUA?pwd=puym

提取码: puym

4. GFLOPs

关于GFLOPs的计算方式可以查看：百面算法工程师 | 卷积基础知识——Convolution

未改进的GFLOPs

改进后的GFLOPs

~~现在手上没有卡了，等过段时候有卡了把这补上，需要的同学自己测一下~~

5. 进阶

可以结合损失函数或者卷积模块进行多重改进

YOLOv5改进 | 损失函数 | EIoU、SIoU、WIoU、DIoU、FocuSIoU等多种损失函数——点击即可跳转

6. 总结

倒置残差移动块 (iRMB) 是一种专门为高效高性能图像处理而设计的架构框架，结合了深度卷积、点卷积和多头自注意力机制。iRMB 的核心原理在于通过使用 1x1 点卷积来压缩和扩展特征维度，并结合在每个输入通道上独立运行的 3x3 深度卷积，来降低计算复杂度和参数数量。此设置可有效捕获空间特征。此外，还采用多头自注意力 (MHSA) 机制来关注图像的重要部分，捕获长距离依赖关系和上下文信息。残差连接集成在整个块中，以确保更好的梯度流并保留来自初始层的信息。这种组合使 iRMB 能够以效率和准确性的平衡来处理图像，使其成为轻量级和高性能图像处理任务的理想选择。