空间金字塔池化改进 SPP / SPPF / SimSPPF / ASPP / RFB / SPPCSPC / SPPFCSPC

本文深入探讨了空间金字塔池化(SPP)及其多种变体,包括SPPF、SimSPPF、ASPP、RFB等,分析它们的原理和在目标检测中的应用。SPP技术旨在解决图像失真和特征重复提取问题,提高候选框生成速度。文章还介绍了针对SPP的优化结构SPPCSPC和SPPFCSPC,并提供了参数量对比和改进方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


感谢大家一路以来的支持和陪伴!为了回馈大家对我们的关注和信任,我特别赠送 50 算力券,您可以免费体验 4090 显卡 24 小时的强劲性能:点击领取。希望在您的项目或学习中助您一臂之力!


我最近在哔哩哔哩上更新了视频版的讲解,有需要的同学可以关注一下~ 我的哔哩哔哩主页



1 原理

1.1 SPP(Spatial Pyramid Pooling)

SPP模块是何凯明大神在2015年的论文《Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition》中被提出。

SPP全程为空间金字塔池化结构,主要是为了解决两个问题:

  1. 有效避免了对图像区域裁剪、缩放操作导致的图像失真等问题;
  2. 解决了卷积神经网络对图相关重复特征提取的问题,大大提高了产生候选框的速度,且节省了计算成本。

在这里插入图片描述

请添加图片描述

class SPP(nn.Module):
    # Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729
    def __init__(self, c1, c2, k=(5, 9, 13)):
        super().__init__()
        c_ = c1 // 2  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])

    def forward(self, x):
        x = self.cv1(x)
        with warnings.catch_warnings():
            warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
            return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))

1.2 SPPF(Spatial Pyramid Pooling - Fast)

这个是YOLOv5作者Glenn Jocher基于SPP提出的,速度较SPP快很多,所以叫SPP-Fast

请添加图片描述

class SPPF(nn.Module):
    # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
    def __init__(self, c1, c2, k=5):  # equivalent to SPP(k=(5, 9, 13))
        super().__init__()
        c_ = c1 // 2  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_ * 4, c2, 1, 1)
        self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)

    def forward(self, x):
        x = self.cv1(x)
        with warnings.catch_warnings():
            warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
            y1 = self.m(x)
            y2 = self.m(y1)
            return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))

1.3 SimSPPF(Simplified SPPF)

美团YOLOv6提出的模块,感觉和SPPF只差了一个激活函数,简单测试了一下,单个ConvBNReLU速度要比ConvBNSiLU18%
请添加图片描述

class SimConv(nn.Module):
    '''Normal Conv with ReLU activation'''
    def __init__(self, in_channels, out_channels, kernel_size, stride, groups=1, bias=False):
        super().__init__()
        padding = kernel_size // 2
        self.conv = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            groups=groups,
            bias=bias,
        )
        self.bn = nn.BatchNorm2d(out_channels)
        self.act = nn.ReLU()

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        return self.act(self.conv(x))

class SimSPPF(nn.Module):
    '''Simplified SPPF with ReLU activation'''
    def __init__(self, in_channels, out_channels, kernel_size=5):
        super().__init__()
        c_ = in_channels // 2  # hidden channels
        self.cv1 = SimConv(in_channels, c_, 1, 1)
        self.cv2 = SimConv(c_ * 4, out_channels, 1, 1)
        self.m = nn.MaxPool2d(kernel_size=kernel_size, stride=1, padding=kernel_size // 2)

    def forward(self, x):
        x = self.cv1(x)
        with warnings.catch_warnings():
            warnings.simplefilter('ignore')
            y1 = self.m(x)
            y2 = self.m(y1)
            return self.cv2(torch.cat([x, y1, y2, self.m(y2)], 1))

1.4 ASPP(Atrous Spatial Pyramid Pooling)

受到SPP的启发,语义分割模型DeepLabv2中提出了ASPP模块(空洞空间卷积池化金字塔),该模块使用具有不同采样率的多个并行空洞卷积层。为每个采样率提取的特征在单独的分支中进一步处理,并融合以生成最终结果。该模块通过不同的空洞率构建不同感受野的卷积核,用来获取多尺度物体信息,具体结构比较简单如下图所示:

请添加图片描述

ASPP是在DeepLab中提出来的,在后续的DeepLab版本中对其做了改进,如加入BN层、加入深度可分离卷积等,但基本的思路还是没变。

# without BN version
class ASPP(nn.Module):
    def __init__(self, in_channel=512, out_channel=256):
        super(ASPP, self).__init__()
        self.mean = nn.AdaptiveAvgPool2d((1, 1))  # (1,1)means ouput_dim
        self.conv = nn.Conv2d(in_channel,out_channel, 1, 1)
        self.atrous_block1 = nn.Conv2d(in_channel, out_channel, 1, 1)
        self.atrous_block6 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=6, dilation=6)
        self.atrous_block12 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=12, dilation=12)
        self.atrous_block18 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=18, dilation=18)
        self.conv_1x1_output = nn.Conv2d(out_channel * 5, out_channel, 1, 1)

    def forward(self, x):
        size = x.shape[2:]

        image_features = self.mean(x)
        image_features = self.conv(image_features)
        image_features = F.upsample(image_features, size=size, mode='bilinear')

        atrous_block1 = self.atrous_block1(x)
        atrous_block6 = self.atrous_block6(x)
        atrous_block12 = self.atrous_block12(x)
        atrous_block18 = self.atrous_block18(x)

        net = self.conv_1x1_output(torch.cat([image_features, atrous_block1, atrous_block6,
                                              atrous_block12, atrous_block18], dim=1))
        return net

1.5 RFB(Receptive Field Block)

RFB模块是在《ECCV2018:Receptive Field Block Net for Accurate and Fast Object Detection》一文中提出的,该文的出发点是模拟人类视觉的感受野从而加强网络的特征提取能力,在结构上RFB借鉴了Inception的思想,主要是在Inception的基础上加入了空洞卷积,从而有效增大了感受野
在这里插入图片描述
请添加图片描述

RFBRFB-s的架构。RFB-s用于在浅层人类视网膜主题图中模拟较小的pRF,使用具有较小内核的更多分支。

class BasicConv(nn.Module):

    def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1, relu=True, bn=True):
        super(BasicConv, self).__init__()
        self.out_channels = out_planes
        if bn:
            self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=False)
            self.bn = nn.BatchNorm2d(out_planes, eps=1e-5, momentum=0.01, affine=True)
            self.relu = nn.ReLU(inplace=True) if relu else None
        else:
            self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=True)
            self.bn = None
            self.relu = nn.ReLU(inplace=True) if relu else None

    def forward(self, x):
        x = self.conv(x)
        if self.bn is not None:
            x = self.bn(x)
        if self.relu is not None:
            x = self.relu(x)
        return x


class BasicRFB(nn.Module):

    def __init__(self, in_planes, out_planes, stride=1, scale=0.1, map_reduce=8, vision=1, groups=1):
        super(BasicRFB, self).__init__()
        self.scale = scale
        self.out_channels = out_planes
        inter_planes = in_planes // map_reduce

        self.branch0 = nn.Sequential(
            BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
            BasicConv(inter_planes, 2 * inter_planes, kernel_size=(3, 3), stride=stride, padding=(1, 1), groups=groups),
            BasicConv(2 * inter_planes, 2 * inter_planes, kernel_size=3, stride=1, padding=vision, dilation=vision, relu=False, groups=groups)
        )
        self.branch1 = nn.Sequential(
            BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
            BasicConv(inter_planes, 2 * inter_planes, kernel_size=(3, 3), stride=stride, padding=(1, 1), groups=groups),
            BasicConv(2 * inter_planes, 2 * inter_planes, kernel_size=3, stride=1, padding=vision + 2, dilation=vision + 2, relu=False, groups=groups)
        )
        self.branch2 = nn.Sequential(
            BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
            BasicConv(inter_planes, (inter_planes // 2) * 3, kernel_size=3, stride=1, padding=1, groups=groups),
            BasicConv((inter_planes // 2) * 3, 2 * inter_planes, kernel_size=3, stride=stride, padding=1, groups=groups),
            BasicConv(2 * inter_planes, 2 * inter_planes, kernel_size=3, stride=1, padding=vision + 4, dilation=vision + 4, relu=False, groups=groups)
        )

        self.ConvLinear = BasicConv(6 * inter_planes, out_planes, kernel_size=1, stride=1, relu=False)
        self.shortcut = BasicConv(in_planes, out_planes, kernel_size=1, stride=stride, relu=False)
        self.relu = nn.ReLU(inplace=False)

    def forward(self, x):
        x0 = self.branch0(x)
        x1 = self.branch1(x)
        x2 = self.branch2(x)

        out = torch.cat((x0, x1, x2), 1)
        out = self.ConvLinear(out)
        short = self.shortcut(x)
        out = out * self.scale + short
        out = self.relu(out)

        return out



1.6 SPPCSPC

该模块是YOLOv7中使用的SPP结构,表现优于SPPF,但参数量和计算量提升了很多

请添加图片描述

class SPPCSPC(nn.Module):
    # CSP https://github.com/WongKinYiu/CrossStagePartialNetworks
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):
        super(SPPCSPC, self).__init__()
        c_ = int(2 * c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(c_, c_, 3, 1)
        self.cv4 = Conv(c_, c_, 1, 1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
        self.cv5 = Conv(4 * c_, c_, 1, 1)
        self.cv6 = Conv(c_, c_, 3, 1)
        self.cv7 = Conv(2 * c_, c2, 1, 1)

    def forward(self, x):
        x1 = self.cv4(self.cv3(self.cv1(x)))
        y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
        y2 = self.cv2(x)
        return self.cv7(torch.cat((y1, y2), dim=1))
#分组SPPCSPC 分组后参数量和计算量与原本差距不大,不知道效果怎么样
class SPPCSPC_group(nn.Module):
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):
        super(SPPCSPC_group, self).__init__()
        c_ = int(2 * c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1, g=4)
        self.cv2 = Conv(c1, c_, 1, 1, g=4)
        self.cv3 = Conv(c_, c_, 3, 1, g=4)
        self.cv4 = Conv(c_, c_, 1, 1, g=4)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
        self.cv5 = Conv(4 * c_, c_, 1, 1, g=4)
        self.cv6 = Conv(c_, c_, 3, 1, g=4)
        self.cv7 = Conv(2 * c_, c2, 1, 1, g=4)

    def forward(self, x):
        x1 = self.cv4(self.cv3(self.cv1(x)))
        y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
        y2 = self.cv2(x)
        return self.cv7(torch.cat((y1, y2), dim=1))

1.7 SPPFCSPC🍀

我借鉴了SPPF的思想将SPPCSPC优化了一下,得到了SPPFCSPC,在保持感受野不变的情况下获得速度提升;我把这个模块给v7作者看了,并没有得到否定,详细回答可以看4 Issue

目前这个结构被YOLOv6 3.0版本使用了,效果很不错,大家可以看一下YOLOv6 3.0的论文,里面有详细的实验结果。

请添加图片描述

class SPPFCSPC(nn.Module):
    
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=5):
        super(SPPFCSPC, self).__init__()
        c_ = int(2 * c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(c_, c_, 3, 1)
        self.cv4 = Conv(c_, c_, 1, 1)
        self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)
        self.cv5 = Conv(4 * c_, c_, 1, 1)
        self.cv6 = Conv(c_, c_, 3, 1)
        self.cv7 = Conv(2 * c_, c2, 1, 1)

    def forward(self, x):
        x1 = self.cv4(self.cv3(self.cv1(x)))
        x2 = self.m(x1)
        x3 = self.m(x2)
        y1 = self.cv6(self.cv5(torch.cat((x1,x2,x3, self.m(x3)),1)))
        y2 = self.cv2(x)
        return self.cv7(torch.cat((y1, y2), dim=1))

2 参数量对比

这里我在yolov5s.yaml中使用各个模型替换SPP模块

模型参数量(parameters)计算量(GFLOPs)
SPP722588516.5
SPPF723538916.5
SimSPPF723538916.5
ASPP1548572523.1
BasicRFB789542117.1
SPPCSPC1366354921.7
SPPFCSPC🍀1366354921.7
分组SPPCSPC835513317.4

3 YOLOv5 改进方式

第一步;各个代码放入common.py
第二步;yolo.py中加入类名
第三步;修改配置文件
yolov5配置文件如下:

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
   # [-1, 1, ASPP, [512]],  # 9
   # [-1, 1, SPP, [1024]],
   # [-1, 1, SimSPPF, [1024]],
   # [-1, 1, BasicRFB, [1024]],
   # [-1, 1, SPPCSPC, [1024]],
   # [-1, 1, SPPFCSPC, [1024]], # 🍀
  ]

4 Issue

Q:Why use SPPCSPC instead of SPPFCSPC? /
yolov5’s SPPF is much faster than SPP.
Why not try to replace SPPCSPC with SPPFCSPC?
请添加图片描述
A:
Max pooling uses very few computation, if you programming well, above one could run three max pool layers in parallel, while below one must process three max pool layers sequentially.
By the way, you could replace SPPCSPC by SPPFCSPC at inference time if your hardware is friendly to SPPFCSPC.

感兴趣的可以试一下


本人更多YOLOv5实战内容导航🍀🌟🚀

  1. 手把手带你调参Yolo v5 (v6.2)(推理)🌟强烈推荐

  2. 手把手带你调参Yolo v5 (v6.2)(训练)🚀

  3. 手把手带你调参Yolo v5 (v6.2)(验证)

  4. 如何快速使用自己的数据集训练Yolov5模型

  5. 手把手带你Yolov5 (v6.2)添加注意力机制(一)(并附上30多种顶会Attention原理图)🌟强烈推荐🍀新增8种

  6. 手把手带你Yolov5 (v6.2)添加注意力机制(二)(在C3模块中加入注意力机制)

  7. Yolov5如何更换激活函数?

  8. Yolov5如何更换BiFPN?

  9. Yolov5 (v6.2)数据增强方式解析

  10. Yolov5更换上采样方式( 最近邻 / 双线性 / 双立方 / 三线性 / 转置卷积)

  11. Yolov5如何更换EIOU / alpha IOU / SIoU?

  12. Yolov5更换主干网络之《旷视轻量化卷积神经网络ShuffleNetv2》

  13. YOLOv5应用轻量级通用上采样算子CARAFE

  14. 空间金字塔池化改进 SPP / SPPF / SimSPPF / ASPP / RFB / SPPCSPC / SPPFCSPC🚀

  15. 用于低分辨率图像和小物体的模块SPD-Conv

  16. GSConv+Slim-neck 减轻模型的复杂度同时提升精度🍀

  17. 头部解耦 | 将YOLOX解耦头添加到YOLOv5 | 涨点杀器🍀

  18. Stand-Alone Self-Attention | 搭建纯注意力FPN+PAN结构🍀

  19. YOLOv5模型剪枝实战🚀

  20. YOLOv5知识蒸馏实战🚀

  21. YOLOv7知识蒸馏实战🚀

  22. 改进YOLOv5 | 引入密集连接卷积网络DenseNet思想 | 搭建密集连接模块🍀


请添加图片描述

有问题欢迎大家指正,如果感觉有帮助的话请点赞支持下👍📖🌟


更新日志:2022年8月16日上午9:33分前在图片中增加感受野标注🍀

更新日志:2022年8月29日晚上11点40分在文中增加了SimSPPF模块,并测试了速度

更新日志:2022年8月30日修正了SPPCSPC的结构图

更新日志:2022年8月30日增加了SPPFCSPC的结构

更新日志:2023年5月19日修复了RFB的小错误

更新日志:2023年7月23日修复了RFB的Bug


参考文献:增强感受野SPP、ASPP、RFB、PPM

### simSPPF 的概述 `simSPPF` 是一种针对 YOLOv5 模型的优化模块,旨在通过简化空间金字塔池化层 (Spatial Pyramid Pooling, SPP) 来提升模型性能并减少计算复杂度[^1]。传统的 SPP 层能够增强网络的感受野,从而提高检测精度,但由于其复杂的结构可能导致推理速度变慢。而 `simSPPF` 则是对原始 SPP 结构的一种近似实现,在保持较高精度的同时显著加速了推理过程。 #### 实现细节 `simSPPF` 可以看作是一个简化的版本,它仅保留了一个最大池化操作,并将其重复多次来模拟多尺度特征融合的效果。这种设计不仅减少了参数量,还降低了运行时间开销。具体来说: - 原始 SPP 使用多个不同大小的最大池化核组合而成; - 而 `simSPPF` 将这些不同的池化核替换为单一尺寸的最大池化核,并对该结果进行堆叠处理。 以下是基于 PyTorch 的简单代码示例展示如何定义和使用该组件: ```python import torch.nn as nn class SimSPPF(nn.Module): def __init__(self, c1, k=5): # input channels, kernel size super(SimSPPF, self).__init__() c_ = c1 // 2 # hidden channels self.cv1 = nn.Conv2d(c1, c_, 1, stride=1) self.cv2 = nn.Conv2d(c_*4, c1, 1, stride=1) self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k//2) def forward(self, x): x = self.cv1(x) y1 = self.m(x) y2 = self.m(y1) y3 = self.m(y2) return self.cv2(torch.cat([x,y1,y2,y3], dim=1)) ``` 此段脚本展示了如何构建一个基本形式下的 `SimSPPF` 类及其前向传播逻辑[^2]。 #### 应用场景与优势分析 当考虑将 `simSPPF` 整合到实际项目当中时,可以预期获得以下几个方面的改进效果: 1. **更快的速度**: 对于资源受限设备上的实时应用尤为重要; 2. **更少内存占用**: 减少了不必要的中间变量存储需求; 3. **易于移植性**: 更简单的架构便于后续跨平台迁移工作开展. 不过需要注意的是,尽管上述优点明显存在,但在某些特定条件下可能会牺牲部分准确性作为代价。因此建议开发者们依据自身业务特点权衡取舍后再做决定。 ---
评论 339
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

迪菲赫尔曼

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值