1. Motivation
-
The improvements in sampling strategies can be divided into two tendencies.
(1) From Static to Dynamic.
(2) From Sample-wise to Instance-wise.
-
These sampling strategies might have a few limitations.
(1) Static rules are not learnable and prediction-aware (e.g. center region and anchor-based), which may be not always the best choice for some eccentric object.
(2) Some Dynamic rules like PAA might suffer from the noisy samples and per-sample quality rules, without jointly formulating a quality distribution in spatial dimensions, as shown in Fig. 1(b).
(3) They sample uniformly over regu- lar grids of image owing to the dense prediction paradigm, which is difficult to assemble enough high-quality and diverse samples.
2. Contribution
- Our main contribution is to propose an instance-wise quality distribution, which is extracted from the regional feature of the ground-truth to approximate each predic- tion’s quality. It guides noise-robustly sampling and it is a prediction-aware strategy.
- Besides, we formulate an assignment and resampling strategy according to the distribution. It is adapted to the semantic pattern and scale of each instance and simulta- neously training with sufficient and high-quality samples.
- We achieve state-of-the-art results on COCO dataset without bells and whistles. Our method leads to 2.8 AP improvements from 38.7 AP to 41.1 AP on single-stage method FCOS. ResNext-101-DCN based IQDet yields 51.6 AP, achieving state-of-the-art performance without introducing any additional overhead.
3. Method
3.1 Formulation of Quality Distribution Encoder
本文提出了一个新的学习分布的subnet,命名为Quality Distribution Encoder(QDE)。
具体做法,先根据gt的location提取gt的信息。这一步通过使用RoIAlign layer来实现,输入的RoI是GT box,作者认为提取GT信息的regional feature在空间维度上和分布的制定是对齐的。
-
To effectively encode the instance-wise feature, we first extract the feature of an object according to the GT location and it is realized by applying the RoIAlign layer to each pyramid feature, where the input RoI is the ground-truth box.
-
Specifically, the motivation of using GT feature is that extracting the regional feature of GT is properly aligning with the distribution assignment in spatial dimensions.
由于未知的分布不容易学习,basic idea是使用encoder将未知的分布映射为一个已知的分布,例如高斯混合模型GMM。
- It can form smooth approximations to arbitrarily shaped distribution.
- The individual component may model some underlying set of hidden classes.
对于每一个gt的质量分布的概率密度函数,可以由公式1表示,理解为gt的所有K个componet的GMM 函数加权共同作用的结果,其中式子中的 π , μ , σ \pi, \mu, \sigma π,μ,σ都是通过网络预测得到, d ⃗ \vec d d应该可以直接由图像中的pred center到gt center求出。
- K和 θ \theta θ分别表示component number和encoder parameters。
- $\vec \pi $表示图片I中沿x和y空间维度上的混合参数mixing coefficient。
- d ⃗ \vec d d表示物体内部到gt center采样的沿x和y方向的