[Object Detection]DETR - DeFormable - DINO

本文链接：https://blog.csdn.net/weixin_45863274/article/details/141819696

1. BaseInfo


Title	End-to-End Object Detection with Transformers
Adress	https://arxiv.org/pdf/2005.12872
Journal/Time	ECCV2020
Author	Facebook
Code	https://github.com/facebookresearch/detr


Title	Deformable DETR: Deformable Transformers for End-to-End Object Detection
Adress	https://arxiv.org/pdf/2010.04159
Journal/Time	2020
Author	商汤、中科大、港中文
Code	https://github.com/fundamentalvision/Deformable-DETR


Title	DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Adress	https://arxiv.org/pdf/2203.03605
Journal/Time	ICLR 2023
Author	港科大、清华
Code	https://github.com/fundamentalvision/Deformable-DETR

2. Creative Q&A

3. Concrete

3.1. DETR ：端到端网络，小目标检测差

CNN 提取特征，input embedding+positional encoding 操作转换为图像序列。
预测阶段：生成 100 个预测框，利用匈牙利算法做匹配，计算 loss。
测试阶段：在生成的预测框中筛选类别置信度大于 0.7 的。
object queries 是可学习的。只用了最后一个特征图。
在标准的 Transformer 中位置编码只加在 Input 上，这里的位置编码加在 k 和 q 上。
分类 loss 使用 CE；对成功匹配的采用 L1 和 giou。

在这里插入图片描述