尽管基于深度学习的解决方案在图像超分辨率(SR)中取得了令人瞩目的重建性能,但这些模型通常较大且架构复杂,使其与许多具有计算和内存限制的低功耗设备不兼容。为了克服这些挑战,我们提出了一种用于高效 SR 设计的空间自适应特征调制(SAFM)机制。具体来说,SAFM 层使用独立计算来学习多尺度特征表示,并聚合这些特征以进行动态空间调制。由于 SAFM 优先利用非局部特征依赖性,我们进一步引入了卷积通道混合器(CCM)来编码局部上下文信息并同时混合通道。本文考虑到YOLO目标检测的neck对特征特征层上采用的时候的时候,会产生特征的丢失,同时增强模型的多尺度特征,本文将SAFMN模块替换neck层的upsample()。
代码:YOLOv8_improve/YOLOv11.md at master · tgf123/YOLOv8_improve
视频讲解:YOLOv11模型改进讲解,教您如何修改YOLOv11_哔哩哔哩_bilibili


1. 空间自适应特征调制网络SAFMN结构介绍
SAFMN(Spatial Adaptive Feature Modulation Network)即空间自适应特征调制网络。它是一种用于图像超分辨率(SR)的深度学习网络架构。在深度学习的图像超分辨率领域,虽然基于深度学习的方法能取得很好的重建效果,但往往模型复杂且庞大,不适合在低功耗设备上运行。SAFMN 旨在解决这一问题,提供一种高效的图像超分辨率方案。
1. 多尺度特征表示:
SAFMN 中的关键组件是空间自适应特征调制层(SAFM 层)。它通过独立计算来学习多尺度特征表示。具体来说,模型中可能包含多个 MFGU(多特征生成单元)模块,这些模块相互协作来处理不同尺度的特征。
2. 动态空间调制:
SAFM 层会聚合这些多尺度特征,并进行动态空间调制。这种调制有助于更好地利用特征信息,提升图像超分辨率的效果。
3. 非局部特征依赖性和局部上下文信息:
由于 SAFM 优先利用非局部特征依赖性,模型还引入了卷积通道混合器(CCM)。CCM 用于编码局部上下文信息并同时混合通道,这样可以在减少模型复杂度的同时,保证模型的性能。
2. YOLOv11与SAFMN的结合
1. 本文将SAFMN模块替换neck层的upsample(),增强模型的多尺度特征。
3. SAFMN代码部分
import torch
import torch.nn as nn
import torch.nn.functional as F
# https://openaccess.thecvf.com/content/ICCV2023/papers/Sun_Spatially-Adaptive_Feature_Modulation_for_Efficient_Image_Super-Resolution_ICCV_2023_paper.pdf
class SimpleSAFM(nn.Module):
def __init__(self, dim, ratio=4):
super().__init__()
self.dim = dim
self.chunk_dim = dim // ratio
self.proj = nn.Conv2d(dim, dim, 3, 1, 1, bias=False)
self.dwconv = nn.Conv2d(self.chunk_dim, self.chunk_dim, 3, 1, 1, groups=self.chunk_dim, bias=False)
self.out = nn.Conv2d(dim, dim, 1, 1, 0, bias=False)
self.act = nn.GELU()
def forward(self, x):
h, w = x.size()[-2:]
x0, x1 = self.proj(x).split([self.chunk_dim, self.dim - self.chunk_dim], dim=1)
x2 = F.adaptive_max_pool2d(x0, (h // 8, w // 8))
x2 = self.dwconv(x2)
x2 = F.interpolate(x2, size=(h, w), mode='bilinear')
x2 = self.act(x2) * x0
x = torch.cat([x1, x2], dim=1)
x = self.out(self.act(x))
return x
# Convolutional Channel Mixer
class CCM(nn.Module):
def __init__(self, dim, ffn_scale, use_se=False):
super().__init__()
self.use_se = use_se
hidden_dim = int(dim * ffn_scale)
self.conv1 = nn.Conv2d(dim, hidden_dim, 3, 1, 1, bias=False)
self.conv2 = nn.Conv2d(hidden_dim, dim, 1, 1, 0, bias=False)
self.act = nn.GELU()
def forward(self, x):
x = self.act(self.conv1(x))
x = self.conv2(x)
return x
class AttBlock(nn.Module):
def __init__(self, dim, ffn_scale, use_se=False):
super().__init__()
self.conv1 = SimpleSAFM(dim, ratio=3)
self.conv2 = CCM(dim, ffn_scale, use_se)
def forward(self, x):
out = self.conv1(x)
out = self.conv2(out)
return out + x
class SAFMNPP(nn.Module):
def __init__(self, input_dim, dim, n_blocks=6, ffn_scale=1.5, use_se=False, upscaling_factor=2):
super().__init__()
self.scale = upscaling_factor
self.to_feat = nn.Conv2d(input_dim, dim, 3, 1, 1, bias=False)
self.feats = nn.Sequential(*[AttBlock(dim, ffn_scale, use_se) for _ in range(n_blocks)])
self.to_img = nn.Sequential(
nn.Conv2d(dim, input_dim * upscaling_factor ** 2, 3, 1, 1, bias=False),
nn.PixelShuffle(upscaling_factor)
)
def forward(self, x):
res = F.interpolate(x, scale_factor=self.scale, mode='bilinear', align_corners=False)
x = self.to_feat(x)
x = self.feats(x)
return self.to_img(x) + res
if __name__ == '__main__':
#############Test Model Complexity #############
# from fvcore.nn import flop_count_table, FlopCountAnalysis, ActivationCountAnalysis
x = torch.randn(1, 256, 8, 8)
model = SAFMNPP(256,dim=256, n_blocks=6, ffn_scale=1.5, upscaling_factor=2)
print(model)
# print(flop_count_table(FlopCountAnalysis(model, x), activations=ActivationCountAnalysis(model, x)))
output = model(x)
print(output.shape)
4. 将SAFMN引入到YOLOv11中
第一: 将下面的核心代码复制到D:\model\yolov11\ultralytics\change_model路径下,如下图所示。
第二:在task.py中导入SAFM包
第三:在task.py中的模型配置部分下面代码
第四:将模型配置文件复制到YOLOV11.YAMY文件中
第一个改进的配置文件
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
# [depth, width, max_channels]
n: [ 0.50, 0.25, 1024 ] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
s: [ 0.50, 0.50, 1024 ] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
m: [ 0.50, 1.00, 512 ] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
l: [ 1.00, 1.00, 512 ] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
x: [ 1.00, 1.50, 512 ] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
# YOLO11n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]]# 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 2, C3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 2, C3k2, [512, False, 0.25]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 2, C3k2, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 2, C3k2, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
- [-1, 2, C2PSA, [1024]] # 10
# YOLO11n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2, [512, False]] # 13
- [-1, 1, SAFMNPP, [512]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
第五:运行成功
from ultralytics.models import NAS, RTDETR, SAM, YOLO, FastSAM, YOLOWorld
if __name__=="__main__":
# 使用YOLOv11.yamy文件搭建的模型训练
model = YOLO(r"D:\model\yolov11\ultralytics\cfg\models\11\yolo11_SAFM.yaml") # build a new model from YAML
model.train(data=r'D:\model\yolov11\ultralytics\cfg\datasets\VOC_my.yaml',
epochs=300, imgsz=640, batch=64
# , close_mosaic=10
)