目标检测算法之RT-DETR

Background

Real-time Detection Transformer(RT-DETR)是一个基于tranformer的实时推理目标检测模型。RT-DETR是2023年百度发布的一个新目标检测模型,它兼顾了速度和精度俩个特性,在速度上超越yolo,同时仍保持不低于yolo模型的精度。其分别从encoder部分、query选择俩个方面进行改进,保持了模型的精度,同时提高了模型的推理速度。
在这里插入图片描述
论文地址:https://arxiv.org/pdf/2304.08069
代码地址:https://github.com/lyuwenyu/RT-DETR

Model Architecture

在这里插入图片描述
模型的结构如上图所示,输出图片经过Backbone进行特征提取,获取三个特征图 S 3 、 S 4 、 S 5 S_3、S_4、S_5 S3S4S5。然后将它们输入Efficient Hybrid Encoder层。Efficient Hybrid Encoder层对特征图 S 5 S_5 S5做AIFI获得特征图 F 5 F_5 F5,然后通过CCFF结合 S 3 、 S 4 、 F 5 S_3、S_4、F_5 S3S4F5输出。然后用Uncertainty-minimal Query Selection选取query,再和Encoder的输出一起输入decoder中,最后输出检测结果。

Efficient Hybrid Encoder

作者分析了特征图自交互的情况,认为低级特征具备丰富的图像语义,交互的需求不大。同时通过实验验证了这一观点。这里的出发点是从缩短输入的AIFI的长度出发,由于计算复杂度与长度的平方成正比,由于高级特征的长度较小,所以计算量较少,同时能够验证低级特征交互是不必要,那么就可以较少这一部分的计算。
整个Efficient Hybrid Encoder模块可以用公式表达出来,即 Q = K = V = F l a t t e n ( C 5 ) F 5 = R e s h a p e ( A I F I ( Q , K , V ) ) O = C C F F ( { S 3 , S 4 , F 5 } ) \begin{align*}Q =& K=V = Flatten(C_5)\\F_5 = &Reshape(AIFI(Q,K,V))\\O=&CCFF(\{S_3,S_4,F_5\})\end{align*}

### YOLOv11 Detection Head Architecture The YOLOv11 detection head incorporates significant advancements over previous versions, particularly through the integration of an RT-DETR head and uncertainty-aware query selection mechanisms. The core components include: #### Uncertainty-Aware Query Selection A key innovation is the introduction of a mechanism that selects queries with minimal uncertainty to provide high-quality initial inputs for the decoder. This approach enhances accuracy by ensuring only the most reliable features are used during decoding processes[^1]. #### High-Efficiency Hybrid Encoder Additionally, the design includes a hybrid encoder optimized for efficiency while maintaining robust performance across multiple scales. By decoupling internal scale interactions from cross-scale fusions, this structure reduces computational overhead without sacrificing effectiveness or speed[^2]. Below is a simplified illustration of how these elements might be implemented within the YOLOv11 detection head architecture using Python code snippets. ```python import torch.nn as nn class RT_DETR_Head(nn.Module): def __init__(self, num_classes=80, d_model=256, nhead=8): super(RT_DETR_Head, self).__init__() # Define layers here based on specific requirements def forward(self, x): # Forward pass implementation goes here return output def select_queries_with_minimal_uncertainty(features): """Selects queries having minimum uncertainty.""" uncertainties = calculate_uncertainties(features) selected_indices = find_lowest(uncertainties) return extract_features_at(selected_indices) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值