1. Contribution
分类任务上的few-shot研究较多,相比之前FSOD收到较少的关注。
-
Detecting rare objects from a few examples is an emerging problem.
-
However, much of this work has focused on basic image classification tasks. In contrast, few-shot object detection has received far less attention.
目前一些已经存在评估的问题阻碍了模型的对比。
-
Several issues with the existing evaluation protocols prevent consistent model comparisons.
-
In this work, we propose improved methods to evaluate few-shot object detection.
-
We adopt a two-stage training scheme for fine-tuning as shown in Figure 1
-
We find that this two-stage fine-tuning approach (TFA) out- performs all previous state-of-the-art meta-learning based methods by 2~20 points on the existing on the existing PASCAL VOC and COCO benchmarks.
-
We sample different groups of few-shot training examples for multiple runs of the experiments to obtain a stable accuracy estimation and quantitatively analyze the variances of different evaluation metrics.
2. Related Work
Meta-Learning
元学习是获取元知识从而来帮助模型更快的适应少样本标注的新任务。主要方法有通过学习fine-tune以及好的权重参数初始化;以及在novel tasks中使用权重生成的方法。
-
The goal of meta-learning is to acquire task-level meta knowledge that can help the model quickly adapt to new tasks and environments with very few labeled examples.
-
Some learn to fine-tune and aim to obtain a good parameter initialization that can adapt to new tasks with a few scholastic gradient updates.
-
Another popular line of research on meta-learning is to use parameter generation during adaptation to novel tasks.
Metric-Learning
度量学习是通过建模2张输入图片之前的距离度量,从而估计它们的相似度,然后泛华到少样本任务上。主要方法采用余弦相似度。
-
Intuitively, if the model can construct distance metrics to estimate the similarity between two input images, it may generalize to novel categories with few labeled instances.
-
Some adopt a cosine similarity based classifier to reduce the intra-class variance on the few-shot classification task.
-
However, we focus on the instance-level distance measurement rather than on the image level.
3. Method
- The goal is to optimize the detection accuracy measured by average precision (AP) of the novel classes as well as the base classes.
3.1 Two-stage fine-tuning approach
- The key component of our method is to separate the feature representation learning and the box predictor learning into two stages
few shot fine-tuning
- We assign randomly initialized weights to the box prediction networks for the novel classes and fine-tune only the box classification and regression networks, namely
Cosine similarity for box classifier
$w_j \in R^{ d \times 1} $, F ( x ) i ∈ R 1 × d F(x)_i \in R^{1\times d} F(x)i∈R1×d, 两者相乘,除以模,得到一个标量。
本文使用余弦相似度来计算某个objects相对于class j的得分 s i , j s_{i,j} si,