1. Motivation
本文是基于fine-tuning based方法
- In this work, we observe and address the essential weakness of the fine- tuning based approach – constantly mislabeling novel in- stances as confusable categories, and improve the few-shot detection performance to the new state-of-the-art (SOTA)
2. Contribution
对比学习的引入。
- We present Few-Shot object detection via Contrastive pro- posals Encoding (FSCE), a simple yet effective fine-tune based approach for few-shot object detection
本文通过一个对比的分支来加强RoI,对比分支衡量proposal encoding之间的相似度。
- When trans- fer the base detector to few-shot novel data, we augment the primary Region-of-Interest (RoI) head with a contrastive branch, the contrastive branch measures the similarity between object proposal encodings
- contrastive proposal encoding (CPE) loss,
3. Methods
3.1 Preliminary
Rethinking the two-stage fine-tuning approach
作者指出,冻结了RPN,FPN以及ROI 特征提取的部分是不太合理的,因为这样子会使得特征只含有base classes。
虽然在TFA baseline中,不冻结RPN和ROI会造成性能的下降,但是本文指出,fine-tune ROI feature extractor 以及 box predictor效果更好
- In baseline TFA, unfreezing RPN and RoI feature extractor leads to degraded results for novel classes.
- We propose a stronger baseline which adapts much better to novel data with jointly fine-tuned feature extractors and box predictors
FSCE是对于TFA的改进,因此也是属于tranfer-learning based的方法。FSCE对于第二阶段的冻结层,只冻结了ROI之前的部分。将RoI feature extractor通过对比函数来监督。
-
However it is counter-intuitive that Feature Pyramid Network , RPN, especially the RoI feature extractor which contain semantic information learned from base classes only, could be transferred directly to novel classes without any form of training.
-
The backbone feature extractor is frozen during fine-tuning while the RoI feature extractor is supervised by a contrastive objective.
从图4可以得出,TFA在fine-tune阶段 rpn的positive anchor比base training少了很多。
那么作者的观点就是改善这些得分较低proposal objectness ,它们都无法通过RPN的nms操作,就被淘汰了。
除此之外,重新平衡前景proposals的比例,对于防止背景类统治fine-tuning阶段的梯度非常的关键。
- Our insight is to rescue the low objectness positive anchors that are suppressed.
- Besides, re-balancing the foreground proposals fraction is also critical to prevent the diffusive yet easy backgrounds from dominating the gradient descent for novel instances in fine-tuning
因此,作者的改进方法是解冻了RPN和RoI层,使用2个新的设定:使得RPN NMS后的proposals更多,并且使得在RoI head 部分进行loss计算采样的proposals减半,也就是在fine-tuning阶段减半的porposal只包含背景部分(即 1:1 各128个)。
- double the maximum number of proposals kept after NMS, this brings more foreground proposals for novel instances
- halving the number of sampled proposals in RoI head used for loss computation, as in fine-tuning stage the discarded half contains only backgrounds(standard RoI batch size is 512, and the number of foreground proposals are far less than half of it)
# default
ROI.BATCH_SIZE_PER_IMAGE = 512
# new for FSCE in configs
ROI.BATCH_SIZE_PER_IMAGE = 256
得到更强的baseline的实验如下:
其中Fine-tune FPN指的是不冻结FPN层。 refinement RPN、 ROI指的是对于2部分进行更改。总体而言相比TFA原先的backbone,提升很明显。
3.2. Contrastive object proposal encoding
本文在ROI的双分支结构中,引入并行的contrastive branch,由于ROI阶段会引入ReLU 非线性操作,本文认为这样子做会在0处截断,从而导致2个proposal embedding之间的特征无法被计算出来,因此改用MLP结构,得到的对比特征为128,从而是的类别相同的proposal之间的相似度越大,类别不同的proposal更有区分度(使用constrastive loss)
- we introduce a contrastive branch to the primary RoI head, parallel to the classifica- tion and regression branches.
- Therefore, the contrastive branch ap- plies a 1-layer multi-layer-perceptron (MLP) head with neg- ligible cost to encode the RoI feature to contrastive feature
3.3. Contrastive Proposal Encoding (CPE) Loss
- Proposal consistency control
对于这个分支的输入是 { z i , u i , y i }