CV method： YOLOv12（v11 进化！）

WiSirius

已于 2025-03-25 15:53:29 修改

阅读量653

点赞数 23

分类专栏： CV Method 文章标签： YOLO 人工智能计算机视觉目标检测

于 2025-03-25 15:52:42 首次发布

本文链接：https://blog.csdn.net/WiSirius/article/details/146214305

版权

系列文章目录

文章目录

系列文章目录
前言
一、area-attenion
二、Residual Efficient Layer Aggregation Networks (R-ELAN)
三、Architectural Improvements
四、Model Variants
五、Results
总结

前言

YOLOv12 来了，同样由ultralytics公司出品（这更新速度可是够快的了）

本次主要改动依然在模型方面：

1）简单有效的区域注意力机制（area-attention）
2）高效的聚合网络R-ELAN

paper：https://arxiv.org/pdf/2502.12524
github：https://github.com/sunsmarterjie/yolov12

一、area-attenion

注意机制天生就比卷积神经网络（CNN）慢，首先自注意力的运算的复杂度随输入序列长度呈二次增长，另外大多数基于注意力的视觉Transformer由于其复杂的设计，逐渐累积计算成本。为了能够缩小注意力机制产生的计算成本YOLOv12提出了area attention运算。其将分辨率为（H， W）的feature map划分为大小为（H/l， W）或（H, W/l）的l段。默认值l设置为4，将接受域减少到原来的1/4，但仍然保持较大的接受域。
请添加图片描述


class AAttn(nn.Module):
    """
    Area-attention module with the requirement of flash attention.
    Attributes:
        dim (int): Number of hidden channels;
        num_heads (int): Number of heads into which the attention mechanism is divided;
        area (int, optional): Number of areas the feature map is divided. Defaults to 1.
    Methods:
        forward: Performs a forward process of input tensor and outputs a tensor after the execution of the area attention mechanism.
    Examples:
        >>> import torch
        >>> from ultralytics.nn.modules import AAttn
        >>> model = AAttn(dim=64, num_heads=2, area=4)
        >>> x = torch.randn(2, 64, 128, 128)
        >>> output = model(x)
        >>> print(output.shape)
    
    Notes: 
        recommend that dim//num_heads be a multiple of 32 or 64.
    """
 
    def __init__(self, dim, num_heads, area=1):
        """Initializes the area-attention module, a simple yet efficient attention module for YOLO."""
        super().__init__()
        self.area = area
 
        self.num_heads = num_heads
        self.head_dim = head_dim = dim // num_heads
        all_head_dim = head_dim * self.num_heads
 
        self.qkv = Conv(dim, all_head_dim