[FSCE]FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding(CVPR. 2021)

最新推荐文章于 2023-04-27 11:04:58 发布

Ah丶Weii

最新推荐文章于 2023-04-27 11:04:58 发布

阅读量2.2k

点赞数 2

分类专栏：笔记文章标签：深度学习机器学习自然语言处理

本文链接：https://blog.csdn.net/weixin_43823854/article/details/119920041

版权

在这里插入图片描述

1. Motivation

本文是基于fine-tuning based方法

In this work, we observe and address the essential weakness of the fine- tuning based approach – constantly mislabeling novel in- stances as confusable categories, and improve the few-shot detection performance to the new state-of-the-art (SOTA)

2. Contribution

对比学习的引入。

We present Few-Shot object detection via Contrastive pro- posals Encoding (FSCE), a simple yet effective fine-tune based approach for few-shot object detection

本文通过一个对比的分支来加强RoI，对比分支衡量proposal encoding之间的相似度。

When trans- fer the base detector to few-shot novel data, we augment the primary Region-of-Interest (RoI) head with a contrastive branch, the contrastive branch measures the similarity between object proposal encodings
contrastive proposal encoding (CPE) loss,

3. Methods

在这里插入图片描述

3.1 Preliminary

Rethinking the two-stage fine-tuning approach

作者指出，冻结了RPN,FPN以及ROI 特征提取的部分是不太合理的，因为这样子会使得特征只含有base classes。

虽然在TFA baseline中，不冻结RPN和ROI会造成性能的下降，但是本文指出，fine-tune ROI feature extractor 以及 box predictor效果更好

In baseline TFA, unfreezing RPN and RoI feature extractor leads to degraded results for novel classes.
We propose a stronger baseline which adapts much better to novel data with jointly fine-tuned feature extractors and box predictors

FSCE是对于TFA的改进，因此也是属于tranfer-learning based的方法。FSCE对于第二阶段的冻结层，只冻结了ROI之前的部分。将RoI feature extractor通过对比函数来监督。

However it is counter-intuitive that Feature Pyramid Network , RPN, especially the RoI feature extractor which contain semantic information learned from base classes only, could be transferred directly to novel classes without any form of training.
The backbone feature extractor is frozen during fine-tuning while the RoI feature extractor is supervised by a contrastive objective.

从图4可以得出，TFA在fine-tune阶段 rpn的positive anchor比base training少了很多。

那么作者的观点就是改善这些得分较低proposal objectness ，它们都无法通过RPN的nms操作，就被淘汰了。

除此之外，重新平衡前景proposals的比例，对于防止背景类统治fine-tuning阶段的梯度非常的关键。

Our insight is to rescue the low objectness positive anchors that are suppressed.
Besides, re-balancing the foreground proposals fraction is also critical to prevent the diffusive yet easy backgrounds from dominating the gradient descent for novel instances in fine-tuning

在这里插入图片描述

因此，作者的改进方法是解冻了RPN和RoI层，使用2个新的设定：使得RPN NMS后的proposals更多，并且使得在RoI head 部分进行loss计算采样的proposals减半，也就是在fine-tuning阶段减半的porposal只包含背景部分(即 1:1 各128个）。

double the maximum number of proposals kept after NMS, this brings more foreground proposals for novel instances
halving the number of sampled proposals in RoI head used for loss computation, as in fine-tuning stage the discarded half contains only backgrounds（standard RoI batch size is 512, and the number of foreground proposals are far less than half of it)

# default
ROI.BATCH_SIZE_PER_IMAGE = 512
# new for FSCE in configs
ROI.BATCH_SIZE_PER_IMAGE = 256

得到更强的baseline的实验如下：

其中Fine-tune FPN指的是不冻结FPN层。 refinement RPN、 ROI指的是对于2部分进行更改。总体而言相比TFA原先的backbone，提升很明显。

在这里插入图片描述

3.2. Contrastive object proposal encoding

本文在ROI的双分支结构中，引入并行的contrastive branch，由于ROI阶段会引入ReLU 非线性操作，本文认为这样子做会在0处截断，从而导致2个proposal embedding之间的特征无法被计算出来，因此改用MLP结构，得到的对比特征为128，从而是的类别相同的proposal之间的相似度越大，类别不同的proposal更有区分度（使用constrastive loss）

we introduce a contrastive branch to the primary RoI head, parallel to the classifica- tion and regression branches.
Therefore, the contrastive branch ap- plies a 1-layer multi-layer-perceptron (MLP) head with neg- ligible cost to encode the RoI feature to contrastive feature

3.3. Contrastive Proposal Encoding (CPE) Loss

在这里插入图片描述

Proposal consistency control

对于这个分支的输入是

最低0.47元/天解锁文章

Ah丶Weii

关注

2
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
[FSCE]FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding(CVPR. 2021)

1. Motivation本文是基于fine-tuning based方法In this work, we observe and address the essential weakness of the fine- tuning based approach – constantly mislabeling novel in- stances as confusable categories, and improve the few-shot detection performance to t.
复制链接

扫一扫