1. Motivation
-
The improvements in sampling strategies can be divided into two tendencies.
(1) From Static to Dynamic.
(2) From Sample-wise to Instance-wise.
-
These sampling strategies might have a few limitations.
(1) Static rules are not learnable and prediction-aware (e.g. center region and anchor-based), which may be not always the best choice for some eccentric object.
(2) Some Dynamic rules like PAA might suffer from the noisy samples and per-sample quality rules, without jointly formulating a quality distribution in spatial dimensions, as shown in Fig. 1(b).
(3) They sample uniformly over regu- lar grids of image owing to the dense prediction paradigm, which is difficult to assemble enough high-quality and diverse samples.
2. Contribution
- Our main contribution is to propose an instance-wise quality distribution, which is extracted from the regional feature of the ground-truth to approximate each predic- tion’s quality. It guides noise-robustly sampling and it is a prediction-aware strategy.
- Besides, we formulate an assignment and resampling strategy according to the distribution. It is adapted to the semantic pattern and scale of each instance and simulta- neously training with sufficient and high-quality samples.
- We achieve state-of-the-art results on COCO dataset without bells and whistles. Our method leads to 2.8 AP improvements from 38.7 AP to 41.1 AP on single-stage method FCOS. ResNext-101-DCN based IQDet yields 51.6 AP, achieving state-of-the-art performance without introducing any additional overhead.
3. Method
3.1 Formulation of Quality Distribution Encoder
本文提出了一个新的学习分布的subnet,命名为Quality Distribution Encoder(QDE)。
具体做法,先根据gt的location提取gt的信息。这一步通过使用RoIAlign layer来实现,输入的RoI是GT box,作者认为提取GT信息的regional feature在空间维度上和分布的制定是对齐的。
-
To effectively encode the instance-wise feature, we first extract the feature of an object according to the GT location and it is realized by applying the RoIAlign layer to each pyramid feature, where the input RoI is the ground-truth box.
-
Specifically, the motivation of using GT feature is that extracting the regional feature of GT is properly aligning with the distribution assignment in spatial dimensions.
由于未知的分布不容易学习,basic idea是使用encoder将未知的分布映射为一个已知的分布,例如高斯混合模型GMM。
- It can form smooth approximations to arbitrarily shaped distribution.
- The individual component may model some underlying set of hidden classes.
对于每一个gt的质量分布的概率密度函数,可以由公式1表示,理解为gt的所有K个componet的GMM 函数加权共同作用的结果,其中式子中的 π , μ , σ \pi, \mu, \sigma π,μ,σ都是通过网络预测得到, d ⃗ \vec d d应该可以直接由图像中的pred center到gt center求出。
- K和 θ \theta θ分别表示component number和encoder parameters。
- $\vec \pi $表示图片I中沿x和y空间维度上的混合参数mixing coefficient。
- d ⃗ \vec d d表示物体内部到gt center采样的沿x和y方向的
该研究提出了一个名为IQDet的实例级质量分布方法,通过从真实框中提取的区域特征来近似预测框的质量。它通过质量分布编码器学习实例级质量,并指导鲁棒的采样策略。实验表明,IQDet在COCO数据集上实现了最先进的结果,显著提高了检测性能。
最低0.47元/天 解锁文章
2346

被折叠的 条评论
为什么被折叠?



