YOLO11 | 一年更三版版版不一样 | 关键改进及网络结构图【全网首发】

kay_545

已于 2024-11-08 20:10:53 修改

阅读量1.1w

点赞数 41

分类专栏： YOLO11改进有效涨点文章标签： python YOLO 目标检测深度学习人工智能 YOLO11 ultralytics

于 2024-09-30 11:20:14 首次发布

本文链接：https://blog.csdn.net/m0_67647321/article/details/142649922

版权

YOLO11改进有效涨点专栏收录该内容

101 篇文章 ¥199.90 ¥299.90

订阅专栏

💡💡💡本专栏所有程序均经过测试，可成功执行💡💡💡

持续更新YOLO11的创新点，需要发论文的同学抓住机会，新模型，更好发文章！！！

👇👇👇YOLO11改进目录👇👇👇

《YOLO11改进有效涨点》专栏目录 | 目前已有80+篇内容，内含各种Head检测头、损失函数Loss、Backbone、Neck等创新点改进【持续更新】

专栏地址：YOLO11入门 + 改进涨点——点击即可跳转欢迎订阅

2024年9月27日，Ultralytics在线直播长达九小时，为YOLO11召开“发布会”

YOLO11 是 Ultralytics YOLO 系列实时物体检测器的最新版本，它以尖端的准确性、速度和效率重新定义了可能性。在之前 YOLO 版本的显著进步的基础上，YOLO11 在架构和训练方法方面进行了重大改进，使其成为各种计算机视觉任务的多功能选择。

YOLO11主要特点：

增强的特征提取：YOLO11 采用了改进的主干和颈部架构，增强了特征提取能力，可实现更精确的对象检测和复杂任务性能。
针对效率和速度进行了优化：YOLO11 引入了完善的架构设计和优化的训练流程，可提供更快的处理速度，并在准确度和性能之间保持最佳平衡。
更少的参数，更高的准确度：借助模型设计的进步，YOLO11m 在 COCO 数据集上实现了更高的平均准确度 (mAP)，同时使用的参数比 YOLOv8m 少 22%，从而提高了计算效率，同时又不影响准确度。
跨环境的适应性：YOLO11 可以无缝部署在各种环境中，包括边缘设备、云平台和支持 NVIDIA GPU 的系统，从而确保最大的灵活性。
支持的任务范围广泛：无论是对象检测、实例分割、图像分类、姿势估计还是定向对象检测 (OBB)，YOLO11 都旨在满足各种计算机视觉挑战。

支持的任务和模式

YOLO11 以 YOLOv8 中引入的多功能模型系列为基础，为各种计算机视觉任务提供增强的支持：

Model	Filenames	Task	Inference	Validation	Training	Export
YOLO11	yolol11n.pt, yolol11s.pt, yolol11m.pt, yolol11x.pt	Detection	✅	✅	✅	✅
YOLO11-seg	yolol11n-seg.pt, yolol11s-seg.pt, yolol11m-seg.pt, yolol11x-seg.pt	Instance Segmentation	✅	✅	✅	✅
YOLO11-pose	yolol11n-pose.pt, yolol11s-pose.pt, yolol11m-pose.pt, yolol11x-pose.pt	Pose/Keypoints	✅	✅	✅	✅
YOLO11-obb	yolol11n-obb.pt, yolol11s-obb.pt, yolol11m-obb.pt, yolol11x-obb.pt	Oriented Detection	✅	✅	✅	✅
YOLO11-cls	yolol11n-cls.pt, yolol11s-cls.pt, yolol11m-cls.pt, yolol11x-cls.pt	Classification	✅	✅	✅	✅

下表概述了 YOLO11 模型变体，展示了它们在特定任务中的适用性以及与推理、验证、训练和导出等操作模式的兼容性。这种灵活性使 YOLO11 适用于计算机视觉领域的广泛应用，从实时检测到复杂的分割任务。

评价指标

目标检测

Model	size (pixels)	mAPval 50-95	Speed CPU ONNX (ms)	Speed T4 TensorRT10 (ms)	params (M)	FLOPs (B)
YOLO11n	640	39.5	56.12 ± 0.82 ms	1.55 ± 0.01 ms	2.6	6.5
YOLO11s	640	47.0	90.01 ± 1.17 ms	2.46 ± 0.00 ms	9.4	21.5
YOLO11m	640	51.5	183.20 ± 2.04 ms	4.70 ± 0.06 ms	20.1	68.0
YOLO11l	640	53.4	238.64 ± 1.39 ms	6.16 ± 0.08 ms	25.3	86.9
YOLO11x	640	54.7	462.78 ± 6.66 ms	11.31 ± 0.24 ms	56.9	194.9

语义分割

Model	size (pixels)	mAPbox 50-95	mAPmask 50-95	Speed CPU ONNX (ms)	Speed T4 TensorRT10 (ms)	params (M)	FLOPs (B)
YOLO11n-seg	640	38.9	32.0	65.90 ± 1.14 ms	1.84 ± 0.00 ms	2.9	10.4
YOLO11s-seg	640	46.6	37.8	117.56 ± 4.89 ms	2.94 ± 0.01 ms	10.1	35.5
YOLO11m-seg	640	51.5	41.5	281.63 ± 1.16 ms	6.31 ± 0.09 ms	22.4	123.3
YOLO11l-seg	640	53.4	42.9	344.16 ± 3.17 ms	7.78 ± 0.16 ms	27.6	142.2
YOLO11x-seg	640	54.7	43.8	664.50 ± 3.24 ms	15.75 ± 0.67 ms	62.1	319.0

分类

Model	size (pixels)	acc top1	acc top5	Speed CPU ONNX (ms)	Speed T4 TensorRT10 (ms)	params (M)	FLOPs (B) at 640
YOLO11n-cls	224	70.0	89.4	5.03 ± 0.32 ms	1.10 ± 0.01 ms	1.6	3.3
YOLO11s-cls	224	75.4	92.7	7.89 ± 0.18 ms	1.34 ± 0.01 ms	5.5	12.1
YOLO11m-cls	224	77.3	93.9	17.17 ± 0.40 ms	1.95 ± 0.00 ms	10.4	39.3
YOLO11l-cls	224	78.3	94.3	23.17 ± 0.29 ms	2.76 ± 0.00 ms	12.9	49.4

Model	size (pixels)	mAPtest 50	Speed CPU ONNX (ms)	Speed T4 TensorRT10 (ms)	params (M)	FLOPs (B)
YOLO11n-obb	1024	78.4	117.56 ± 0.80 ms	4.43 ± 0.01 ms	2.7	17.2
YOLO11s-obb	1024	79.5	219.41 ± 4.00 ms	5.13 ± 0.02 ms	9.7	57.5
YOLO11m-obb	1024	80.9	562.81 ± 2.87 ms	10.07 ± 0.38 ms	20.9	183.5
YOLO11l-obb	1024	81.0	712.49 ± 4.98 ms	13.46 ± 0.55 ms	26.2	232.0
YOLO11x-obb	1024	81.3	1408.63 ± 7.67 ms	28.59 ± 0.96 ms	58.8	520.2

Pose

Model	size (pixels)	mAPpose 50-95	mAPpose 50	Speed CPU ONNX (ms)	Speed T4 TensorRT10 (ms)	params (M)	FLOPs (B)
YOLO11n-pose	640	50.0	81.0	52.40 ± 0.51 ms	1.72 ± 0.01 ms	2.9	7.6
YOLO11s-pose	640	58.9	86.3	90.54 ± 0.59 ms	2.57 ± 0.00 ms	9.9	23.2
YOLO11m-pose	640	64.9	89.4	187.28 ± 0.77 ms	4.94 ± 0.05 ms	20.9	71.7
YOLO11l-pose	640	66.1	89.9	247.69 ± 1.10 ms	6.42 ± 0.13 ms	26.2	90.7
YOLO11x-pose	640	69.5	91.1	487.97 ± 13.91 ms	12.06 ± 0.20 ms	58.8	203.3

简单的 YOLO11 训练和推理示例

以下示例适用于用于对象检测的 YOLO11 Detect 模型。

from ultralytics import YOLO

# Load a model
model = YOLO("yolo11n.pt")

# Train the model
train_results = model.train(
    data="coco8.yaml",  # path to dataset YAML
    epochs=100,  # number of training epochs
    imgsz=640,  # training image size
    device="cpu",  # device to run on, i.e. device=0 or device=0,1,2,3 or device=cpu
)

# Evaluate model performance on the validation set
metrics = model.val()

# Perform object detection on an image
results = model("path/to/image.jpg")
results[0].show()

# Export the model to ONNX format
path = model.export(format="onnx")  # return path to exported model

支持部署于边缘设备

YOLO11 专为适应各种环境而设计，包括边缘设备。其优化的架构和高效的处理能力使其适合部署在边缘设备、云平台和支持 NVIDIA GPU 的系统上。这种灵活性确保 YOLO11 可用于各种应用，从移动设备上的实时检测到云环境中的复杂分割任务。有关部署选项的更多详细信息，请参阅导出文档。

YOLOv11 yaml文件

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO11n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

YOLO11和YOLOv8 yaml文件的区别

改进模块代码

C3k2

class C3k2(C2f):
    """Faster Implementation of CSP Bottleneck with 2 convolutions."""

    def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
        """Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
        super().__init__(c1, c2, n, shortcut, g, e)
        self.m = nn.ModuleList(
            C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck(self.c, self.c, shortcut, g) for _ in range(n)
        )

C3k2，它是具有两个卷积的CSP（Partial Cross Stage）瓶颈架构的更快实现。

类继承：

C3k2继承自类C2f。这表明C2f很可能实现了经过修改的基本CSP结构，而C3k2进一步优化或修改了此结构。

构造函数（__init__）：

c1：输入通道。
c2：输出通道。
n：瓶颈层数（默认为1）。
c3k：一个布尔标志，确定是否使用C3k块或常规Bottleneck块。
e：扩展比率，控制隐藏层的宽度（默认为0.5）。
g：分组卷积的组归一化参数或组数（默认值为 1）。
shortcut：一个布尔值，用于确定是否在网络中包含快捷方式连接（默认值为 True）。

初始化：

super().__init__(c1, c2, n, short-cut, g, e) 调用父类 C2f 的构造函数，初始化标准 CSP 组件，如通道数、快捷方式、组等。

模块列表（self.m）：

nn.ModuleList 存储 C3k 或 Bottleneck 模块，具体取决于 c3k 的值。
如果 c3k 为 True，它会初始化 C3k 模块。C3k 模块接收以下参数：
self.c：通道数（源自 C2f）。
2：这表示在 C3k 块内使用了两个卷积层。
shortcut 和 g：从 C3k2 构造函数传递。
如果 c3k 为 False，则初始化标准 Bottleneck 模块。

for _ in range(n) 表示将创建 n 个这样的块。

总结：

C3k2 实现了 CSP 瓶颈架构，可以选择使用自定义 C3k 块（具有两个卷积）或标准 Bottleneck 块，具体取决于 c3k 标志。
C2PSA

class C2PSA(nn.Module):
    """
    C2PSA module with attention mechanism for enhanced feature extraction and processing.

    This module implements a convolutional block with attention mechanisms to enhance feature extraction and processing
    capabilities. It includes a series of PSABlock modules for self-attention and feed-forward operations.

    Attributes:
        c (int): Number of hidden channels.
        cv1 (Conv): 1x1 convolution layer to reduce the number of input channels to 2*c.
        cv2 (Conv): 1x1 convolution layer to reduce the number of output channels to c.
        m (nn.Sequential): Sequential container of PSABlock modules for attention and feed-forward operations.

    Methods:
        forward: Performs a forward pass through the C2PSA module, applying attention and feed-forward operations.

    Notes:
        This module essentially is the same as PSA module, but refactored to allow stacking more PSABlock modules.

    Examples:
        >>> c2psa = C2PSA(c1=256, c2=256, n=3, e=0.5)
        >>> input_tensor = torch.randn(1, 256, 64, 64)
        >>> output_tensor = c2psa(input_tensor)
    """

    def __init__(self, c1, c2, n=1, e=0.5):
        """Initializes the C2PSA module with specified input/output channels, number of layers, and expansion ratio."""
        super().__init__()
        assert c1 == c2
        self.c = int(c1 * e)
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv(2 * self.c, c1, 1)

        self.m = nn.Sequential(*(PSABlock(self.c, attn_ratio=0.5, num_heads=self.c // 64) for _ in range(n)))

    def forward(self, x):
        """Processes the input tensor 'x' through a series of PSA blocks and returns the transformed tensor."""
        a, b = self.cv1(x).split((self.c, self.c), dim=1)
        b = self.m(b)
        return self.cv2(torch.cat((a, b), 1))

C2PSA 模块是一个自定义神经网络层，带有注意力机制，用于增强特征提取和处理。

类概述

目的：
C2PSA 模块引入了一个卷积块，利用注意力机制来改进特征提取和处理。
它使用一系列 PSABlock 模块，这些模块可能代表某种形式的位置自注意力 (PSA)，并且该架构旨在允许堆叠多个 PSABlock 层。

构造函数（__init__）：

参数：
c1：输入通道（必须等于 c2）。
c2：输出通道（必须等于 c1）。
n：要堆叠的 PSABlock 模块数量（默认值为 1）。
e：扩展比率，用于计算隐藏通道的数量（默认值为 0.5）。
属性：
self.c：隐藏通道数，计算为 int(c1 * e)。
self.cv1：一个 1x1 卷积，将输入通道数从 c1 减少到 2 * self.c。这为将输入分成两部分做好准备。
self.cv2：另一个 1x1 卷积，处理后将通道维度恢复回 c1。
self.m：一系列 PSABlock 模块。每个 PSABlock 接收 self.c 通道，注意头的数量为 self.c // 64。每个块应用注意和前馈操作。