Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection 论文初读

最新推荐文章于 2025-04-08 01:00:00 发布

SERGIOLI0903

最新推荐文章于 2025-04-08 01:00:00 发布

阅读量505

点赞数 5

文章标签：目标检测人工智能计算机视觉

本文链接：https://blog.csdn.net/m0_55898550/article/details/141865236

版权

Abstract

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of openset object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion. While previous works mainly evaluate open-set object detection on novel categories, we propose to also perform evaluations on referring expression comprehension for objects specified with attributes. Grounding DINO performs remarkably well on all three settings, including benchmarks on COCO, LVIS, ODinW, and RefCOCO/+/g. Grounding DINO achieves a 52.5 AP on the COCO detection zero-shot transfer benchmark, i.e., without any training data from COCO. After finetuning with COCO data, Grounding DINO reaches 63.0 AP. It sets a new record on the ODinW zero-shot benchmark with a mean 26.1 AP. Code will be available at https://github.com/IDEA-Research/GroundingDINO.
在本文中，我们通过将基于变换器的检测器 DINO 与接地预训练相结合，提出了一种名为接地 DINO 的开放集对象检测器，它可以通过类别名称或引用表达等人工输入来检测任意对象。开放集对象检测的关键解决方案是在封闭集检测器中引入语言，以实现开放集概念泛化。为了有效地融合语言和视觉模式，我们从概念上将封闭集检测器分为三个阶段，并提出了一个紧密的融合解决方案，其中包括特征增强器、语言引导的查询选择和用于跨模式融合的跨模式解码器。以往的工作主要是对新类别的开放集对象检测进行评估，而我们建议同时对带有属性的对象的引用表达理解进行评估。接地 DINO 在所有三种设置中都表现出色，包括 COCO、LVIS、ODinW 和 RefCOCO/+/g 的基准测试。接地型 DINO 在 COCO 检测零镜头传输基准（即没有 COCO 的任何训练数据）上的得分达到了 52.5 分。在使用 COCO 数据进行微调后，Grounding DINO 达到了 63.0 AP。它以平均 26.1 AP 的成绩刷新了 ODinW 零点传输基准的记录。代码见 https://github.com/IDEA-Research/GroundingDINO。