秋招面试专栏推荐 :深度学习算法工程师面试问题总结【百面算法工程师】——点击即可跳转
💡💡💡本专栏所有程序均经过测试,可成功执行💡💡💡
专栏目录 :《YOLOv8改进有效涨点》专栏介绍 & 专栏目录 | 目前已有100+篇内容,内含各种Head检测头、损失函数Loss、Backbone、Neck、NMS等创新点改进——点击即可跳转
红外小目标检测是计算机视觉中的关键任务,面临目标微小和背景复杂的挑战。本文介绍了可以显著提升检测性能的C2f_PPA(C2f通过融合PPA模块)。PPA模块提取多尺度特征。文章在介绍主要的原理后,将手把手教学如何进行模块的代码添加和修改,并将修改后的完整代码放在文章的最后,方便大家一键运行,小白也可轻松上手实践。以帮助您更好地学习深度学习目标检测YOLO系列的挑战。
目录
1. 原理
论文地址:HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection——点击即可跳转
官方代码:官方代码仓库——点击即可跳转
PPA(Parallelized Patch-Aware Attention,平行化感知补丁注意力)模块是HCF-Net中用于红外小目标检测的关键组件之一,专注于通过多分支的特征提取策略和注意力机制提升检测精度。其主要原理如下:
-
多分支特征提取:PPA模块通过并行的局部分支、全局分支和串行卷积分支来捕捉不同尺度和层级的特征。这种多分支策略能够有效提取多尺度信息,尤其适用于红外小目标的检测,增强了对微小目标的定位能力。
-
特征融合与注意力机制:在提取到多分支特征后,PPA模块利用通道注意力和空间注意力机制对特征进行自适应增强。通道注意力用于选择最具区分性的特征通道,而空间注意力则有助于突出目标区域。通过这些注意力机制,PPA模块可以有效提升小目标在复杂背景下的显著性。
具体流程包括:
-
首先将输入特征通过点卷积调整,然后分成局部和全局特征分支,通过非重叠补丁的方式来实现不同尺度的特征聚合。
-
利用卷积层进一步处理这些特征,最后通过通道和空间维度的注意力机制对结果进行融合和增强。
该模块通过保留关键的局部和全局信息,有效减少了红外小目标在多次下采样过程中可能丢失的重要信息,从而提高检测的精度和鲁棒性。
2. 将C2f_PPA添加到yolov8网络中
2.1 C2f_PPA代码实现
关键步骤一: 将下面代码粘贴到在/ultralytics/ultralytics/nn/modules/block.py中,并在该文件的__all__中添加“C2f_PPA”
import math
class SpatialAttentionModule(nn.Module):
def __init__(self):
super(SpatialAttentionModule, self).__init__()
self.conv2d = nn.Conv2d(in_channels=2, out_channels=1, kernel_size=7, stride=1, padding=3)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avgout = torch.mean(x, dim=1, keepdim=True)
maxout, _ = torch.max(x, dim=1, keepdim=True)
out = torch.cat([avgout, maxout], dim=1)
out = self.sigmoid(self.conv2d(out))
return out * x
class LocalGlobalAttention(nn.Module):
def __init__(self, output_dim, patch_size):
super().__init__()
self.output_dim = output_dim
self.patch_size = patch_size
self.mlp1 = nn.Linear(patch_size*patch_size, output_dim // 2)
self.norm = nn.LayerNorm(output_dim // 2)
self.mlp2 = nn.Linear(output_dim // 2, output_dim)
self.conv = nn.Conv2d(output_dim, output_dim, kernel_size=1)
self.prompt = torch.nn.parameter.Parameter(torch.randn(output_dim, requires_grad=True))
self.top_down_transform = torch.nn.parameter.Parameter(torch.eye(output_dim), requires_grad=True)
def forward(self, x):
x = x.permute(0, 2, 3, 1)
B, H, W, C = x.shape
P = self.patch_size
# Local branch
local_patches = x.unfold(1, P, P).unfold(2, P, P) # (B, H/P, W/P, P, P, C)
local_patches = local_patches.reshape(B, -1, P*P, C) # (B, H/P*W/P, P*P, C)
local_patches = local_patches.mean(dim=-1) # (B, H/P*W/P, P*P)
local_patches = self.mlp1(local_patches) # (B, H/P*W/P, input_dim // 2)
local_patches = self.norm(local_patches) # (B, H/P*W/P, input_dim // 2)
local_patches = self.mlp2(local_patches) # (B, H/P*W/P, output_dim)
local_attention = F.softmax(local_patches, dim=-1) # (B, H/P*W/P, output_dim)
local_out = local_patches * local_attention # (B, H/P*W/P, output_dim)
cos_sim = F.normalize(local_out, dim=-1) @ F.normalize(self.prompt[None, ..., None], dim=1) # B, N, 1
mask = cos_sim.clamp(0, 1)
local_out = local_out * mask
local_out = local_out @ self.top_down_transform
# Restore shapes
local_out = local_out.reshape(B, H // P, W // P, self.output_dim) # (B, H/P, W/P, output_dim)
local_out = local_out.permute(0, 3, 1, 2)
local_out = F.interpolate(local_out, size=(H, W), mode='bilinear', align_corners=False)
output = self.conv(local_out)
return output
class ECA(nn.Module):
def __init__(self,in_channel,gamma=2,b=1):
super(ECA, self).__init__()
k=int(abs((math.log(in_channel,2)+b)/gamma))
kernel_size=k if k % 2 else k+1
padding=kernel_size//2
self.pool=nn.AdaptiveAvgPool2d(output_size=1)
self.conv=nn.Sequential(
nn.Conv1d(in_channels=1,out_channels=1,kernel_size=kernel_size,padding=padding,bias=False),
nn.Sigmoid()
)
def forward(self,x):
out=self.pool(x)
out=out.view(x.size(0),1,x.size(1))
out=self.conv(out)
out=out.view(x.size(0),x.size(1),1,1)
return out*x
class PPA(nn.Module):
def __init__(self, in_features, filters) -> None:
super().__init__()
self.skip = Conv(in_features, filters, act=False)
self.c1 = Conv(filters, filters, 3)
self.c2 = Conv(filters, filters, 3)
self.c3 = Conv(filters, filters, 3)
self.sa = SpatialAttentionModule()
self.cn = ECA(filters)
self.lga2 = LocalGlobalAttention(filters, 2)
self.lga4 = LocalGlobalAttention(filters, 4)
self.drop = nn.Dropout2d(0.1)
self.bn1 = nn.BatchNorm2d(filters)
self.silu = nn.SiLU()
def forward(self, x):
x_skip = self.skip(x)
x_lga2 = self.lga2(x_skip)
x_lga4 = self.lga4(x_skip)
x1 = self.c1(x)
x2 = self.c2(x1)
x3 = self.c3(x2)
x = x1 + x2 + x3 + x_skip + x_lga2 + x_lga4
x = self.cn(x)
x = self.sa(x)
x = self.drop(x)
x = self.bn1(x)
x = self.silu(x)
return x
class Bag(nn.Module):
def __init__(self):
super(Bag, self).__init__()
def forward(self, p, i, d):
edge_att = torch.sigmoid(d)
return edge_att * p + (1 - edge_att) * i
class DASI(nn.Module):
def __init__(self, in_features, out_features) -> None:
super().__init__()
self.bag = Bag()
self.tail_conv = nn.Conv2d(out_features, out_features, 1)
self.conv = nn.Conv2d(out_features // 2, out_features // 4, 1)
self.bns = nn.BatchNorm2d(out_features)
self.skips = nn.Conv2d(in_features[1], out_features, 1)
self.skips_2 = nn.Conv2d(in_features[0], out_features, 1)
self.skips_3 = nn.Conv2d(in_features[2], out_features, kernel_size=3, stride=2, dilation=2, padding=2)
self.silu = nn.SiLU()
def forward(self, x_list):
# x_high, x, x_low = x_list
x_low, x, x_high = x_list
if x_high != None:
x_high = self.skips_3(x_high)
x_high = torch.chunk(x_high, 4, dim=1)
if x_low != None:
x_low = self.skips_2(x_low)
x_low = F.interpolate(x_low, size=[x.size(2), x.size(3)], mode='bilinear', align_corners=True)
x_low = torch.chunk(x_low, 4, dim=1)
x = self.skips(x)
x_skip = x
x = torch.chunk(x, 4, dim=1)
if x_high == None:
x0 = self.conv(torch.cat((x[0], x_low[0]), dim=1))
x1 = self.conv(torch.cat((x[1], x_low[1]), dim=1))
x2 = self.conv(torch.cat((x[2], x_low[2]), dim=1))
x3 = self.conv(torch.cat((x[3], x_low[3]), dim=1))
elif x_low == None:
x0 = self.conv(torch.cat((x[0], x_high[0]), dim=1))
x1 = self.conv(torch.cat((x[0], x_high[1]), dim=1))
x2 = self.conv(torch.cat((x[0], x_high[2]), dim=1))
x3 = self.conv(torch.cat((x[0], x_high[3]), dim=1))
else:
x0 = self.bag(x_low[0], x_high[0], x[0])
x1 = self.bag(x_low[1], x_high[1], x[1])
x2 = self.bag(x_low[2], x_high[2], x[2])
x3 = self.bag(x_low[3], x_high[3], x[3])
x = torch.cat((x0, x1, x2, x3), dim=1)
x = self.tail_conv(x)
x += x_skip
x = self.bns(x)
x = self.silu(x)
return x
class C3_PPA(C3):
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
super().__init__(c1, c2, n, shortcut, g, e)
c_ = int(c2 * e) # hidden channels
self.m = nn.Sequential(*(PPA(c_, c_) for _ in range(n)))
class C2f_PPA(C2f):
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
super().__init__(c1, c2, n, shortcut, g, e)
self.m = nn.ModuleList(PPA(self.c, self.c) for _ in range(n))
2.2 C2f_PPA的神经网络模块代码解析
C2f_PPA
是一个继承自 C2f
类的自定义模块,它在网络中使用了 PPA(Parallelized Patch-Aware Attention)模块。让我们逐步解析其主要代码:
class C2f_PPA(C2f):
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
super().__init__(c1, c2, n, shortcut, g, e)
self.m = nn.ModuleList(PPA(self.c, self.c) for _ in range(n))
1. 继承与初始化
C2f_PPA
继承自 C2f
类,并在初始化方法中调用了父类的构造函数 (super().__init__(c1, c2, n, shortcut, g, e)
),初始化了基础网络层。构造函数参数如下:
-
c1
:输入通道数。 -
c2
:输出通道数。 -
n
:C2f_PPA
模块中的 PPA 模块数。 -
shortcut
:是否使用捷径连接(shortcut connection)。 -
g
:分组卷积的组数(groups)。 -
e
:膨胀系数,用于计算隐藏层通道数。
2. PPA 模块的使用
self.m = nn.ModuleList(PPA(self.c, self.c) for _ in range(n))
这里定义了一个 nn.ModuleList
,其中包含了 n
个 PPA 模块。每个 PPA 模块的输入和输出通道数均为 self.c
,其中 self.c
由父类 C2f
初始化。
3. 作用与特点
C2f_PPA
模块的主要特点是集成了 PPA 模块,通过 PPA
模块的多分支特征提取和注意力机制增强输入特征的表示能力。在网络中,这些模块可以帮助捕获多尺度的上下文信息,并通过注意力机制增强关键特征,从而提升模型在复杂任务(如目标检测或分割)中的表现。
C2f_PPA
结合了 C2f
的结构和 PPA 模块的增强能力,为网络提供更强大的特征学习能力和多尺度特征融合能力。
2.3 更改init.py文件
关键步骤二:修改modules文件夹下的__init__.py文件,先导入函数
然后在下面的__all__中声明函数
2.4 添加yaml文件
关键步骤三:在/ultralytics/ultralytics/cfg/models/v8下面新建文件yolov8_C2f_PPA.yaml文件,粘贴下面的内容
- OD【目标检测】
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f_PPA, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f_PPA, [256, True]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f_PPA, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f_PPA, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
# YOLOv8.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f, [512]] # 12
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f, [256]] # 15 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f, [512]] # 18 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f, [1024]] # 21 (P5/32-large)
- [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)
- Seg【语义分割】
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f_PPA, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f_PPA, [256, True]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f_PPA, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f_PPA, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
# YOLOv8.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f, [512]] # 12
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f, [256]] # 15 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f, [512]] # 18 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f, [1024]] # 21 (P5/32-large)
- [[15, 18, 21], 1, Segment, [nc, 32, 256]] # Segment(P3, P4, P5)
温馨提示:因为本文只是对yolov8基础上添加模块,如果要对yolov8n/l/m/x进行添加则只需要指定对应的depth_multiple 和 width_multiple。不明白的同学可以看这篇文章: yolov8yaml文件解读——点击即可跳转
# YOLOv8n
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.25 # layer channel multiple
max_channels: 1024 # max_channels
# YOLOv8s
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
max_channels: 1024 # max_channels
# YOLOv8l
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
max_channels: 512 # max_channels
# YOLOv8m
depth_multiple: 0.67 # model depth multiple
width_multiple: 0.75 # layer channel multiple
max_channels: 768 # max_channels
# YOLOv8x
depth_multiple: 1.33 # model depth multiple
width_multiple: 1.25 # layer channel multiple
max_channels: 512 # max_channels
2.5 注册模块
关键步骤四:在task.py的parse_model函数中注册
2.6 执行程序
在train.py中,将model的参数路径设置为yolov8_C2f_PPA.yaml的路径
建议大家写绝对路径,确保一定能找到
from ultralytics import YOLO
import warnings
warnings.filterwarnings('ignore')
from pathlib import Path
if __name__ == '__main__':
# 加载模型
model = YOLO("ultralytics/cfg/v8/yolov8.yaml") # 你要选择的模型yaml文件地址
# Use the model
results = model.train(data=r"你的数据集的yaml文件地址",
epochs=100, batch=16, imgsz=640, workers=4, name=Path(model.cfg).stem) # 训练模型
🚀运行程序,如果出现下面的内容则说明添加成功🚀
from n params module arguments
0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
2 -1 1 11702 ultralytics.nn.modules.block.C2f_PPA [32, 32, 1, True]
3 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
4 -1 2 82188 ultralytics.nn.modules.block.C2f_PPA [64, 64, 2, True]
5 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]
6 -1 2 323916 ultralytics.nn.modules.block.C2f_PPA [128, 128, 2, True]
7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2]
8 -1 1 709352 ultralytics.nn.modules.block.C2f_PPA [256, 256, 1, True]
9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 1 148224 ultralytics.nn.modules.block.C2f [384, 128, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 1 37248 ultralytics.nn.modules.block.C2f [192, 64, 1]
16 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 1 123648 ultralytics.nn.modules.block.C2f [192, 128, 1]
19 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 1 493056 ultralytics.nn.modules.block.C2f [384, 256, 1]
22 [15, 18, 21] 1 897664 ultralytics.nn.modules.head.Detect [80, [64, 128, 256]]
YOLOv8_C2f_Parc summary: 393 layers, 3569414 parameters, 3569398 gradients, 9.9 GFLOPs
3. 完整代码分享
https://pan.baidu.com/s/1jjdstRTh0HjG7X54CH84Mg?pwd=th26
提取码: th26
4. GFLOPs
关于GFLOPs的计算方式可以查看:百面算法工程师 | 卷积基础知识——Convolution
未改进的YOLOv8nGFLOPs
改进后的GFLOPs
5. 进阶
可以与其他的注意力机制或者损失函数等结合,进一步提升检测效果
6. 总结
PPA(Parallelized Patch-Aware Attention)模块通过并行的多分支特征提取策略来捕捉目标的多尺度特征,包含局部分支和全局分支,分别负责提取不同尺度的局部和全局特征。这些特征随后通过串联卷积分支进一步处理,以增强细节信息。然后,模块利用通道注意力和空间注意力机制进行自适应的特征融合与增强,选择最具区分性的特征通道并突出目标区域,从而在复杂背景下提升红外小目标的检测精度。通过这一多分支的特征提取和注意力机制,PPA模块能够有效减少信息丢失,保留关键特征,显著提高小目标检测的准确性和鲁棒性。