DiffusionDet:生成模型方法用于目标检测（Object Detection)

Liu Huijie

已于 2023-02-28 15:02:32 修改

阅读量937

点赞数

文章标签：目标检测深度学习计算机视觉

于 2023-02-28 14:58:49 首次发布

本文链接：https://blog.csdn.net/qq_51485151/article/details/129193700

版权

0 前言

PaperDiffusionDet: Diffusion Model for Object Detection
Code: https://github.com/ShoufaChen/DiffusionDet

1 Abstract

作者提出了DiffusionDet模型。训练时，扩散过程，模型从ground truth的box开始加noise；反向过程，模型学习去噪。推理时，模型将一组随机生成的box逐步refine成output。
作者的两个发现:

Random boxes,although drastically different from pre-defined anchors or learned queries, are also effective object candidates.
目标检测任务可以用生成模型解决。

2 Motivation

如何在没有heuristic object priors 和 learnable queries的情况下实现目标检测？
传统的针对image的diffusion model实现了将添加了噪声的image去噪成带有语义的无噪声的清晰image。而针对目标检测，能否将添加了大量随机box(类似于添加噪声)的image去掉多余的box(类似于去噪)，留下带有正确box的image(类似于带有语义的无噪声的清晰image)。

3 Model

来自原论文

3.1 Object Detection

目标检测的学习目标是input-target pairs $(\mathbf{x},\mathbf{b},\mathbf{c})$ ,其中 $\mathbf{x}$ 是image, $\mathbf{b} = (c_x^i,c_y^i,w^i,h^i)$ , $c_x^i,c_y^i)$ 为bounding box的center。

3.2 Diffusion model

此处不再详细讲解，不了解的朋友可以自行阅读相关文章。这里仅针对DiffusionDet作简单注释。

$L_{train} = \frac{1}{2}\vert\vert{f_\theta(z_t,t )-z_0\vert\vert}^2$

在DiffusionDet中， $\mathbf{z_0}=\mathbf{b},\mathbf{b}\in{\mathbb{R}^{N\times4}}$

3.3 Architecture

Diffusion模型的一大痛点是其迭代计算的方式导致训练与推理花费较大。如果DiffusionDet直接使用 $f_\theta(z_t,t )$ 计算量大，因此作者用了encoder-decoder架构。

Encoder
Backbone:ResNet+Transformer-based models like Swin.
Detection decoder
Just like Sprase R-CNN.

3.4 Training

在这里插入图片描述

Ground truth padding.
Padding some extra boxes to original ground truth boxes such that all boxes are summed up to a fixed number $N_{train}$ .
Box corruption.
Training losses.

3.5 Inference

在这里插入图片描述

Sampling step.
上一步的boxes送给encoder，然后用DDIM预测下一步的boxes。
Box renewal.
每一步被预测出的boxes有两种类型：desired and un desired predictions.desired要保留，而undesired是arbitrary，但是这个arbitrary是被预测出的arbitrary，并不是扩散过程中产生的随机高斯噪声。
为此，作者提出box renewal：①剔除undesired boxes(scores lower than a particular threshold);②Concatenating some new boxes sampled from Gaussian distribution.
Once-for-all.
Once the model is trained, it can be used with changing the number of boxes and number of sample steps in inference.

4 Properties

DiffusionDet can achieve better accuracy by using more boxes or/and more refining steps at the cost of higher latency.

Dynamic boxes.增加boxes数量可以提高accuracy，但是增加了cost.
Progressive refinement. 增大iterate次数可以提高accuracy，但是增加了cost.

Conclusion

DiffusionDet第一次实现了将diffusion model应用到object detection，noise-to-box pipeline has several appealing properties, including dynamic box and progressive refinement, enabling us to use the same network parameters to obtain the desired speed-accuracy trade-off without re-training the model.

Liu Huijie

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
DiffusionDet:生成模型方法用于目标检测（Object Detection)

# 1 Abstract &ensp;&ensp;作者提出了DiffusionDet模型。训练时，扩散过程，模型从ground truth的box开始加noise；反向过程，模型学习去噪。推理时，模型将一组随机生成的box逐步refine成output。 &ensp;&ensp;作者的两个发现: 1. Random boxes,although drastically different from pre-defined anchors or learned queries, are also e
复制链接

扫一扫