【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》

最新推荐文章于 2024-01-15 13:20:47 发布

bryant_meng

最新推荐文章于 2024-01-15 13:20:47 发布

阅读量645

点赞数

分类专栏： CNN / Transformer

本文链接：https://blog.csdn.net/bryant_meng/article/details/79357903

版权

CNN / Transformer 专栏收录该内容

210 篇文章 7 订阅

订阅专栏

在这里插入图片描述
CVPR-2018

1 Background and Motivation

Objectness is a generic concept and a universal objectness detector can be learned.

本片论文在 R-FCN的基础上进行改进，像 YOLO9000一样，拓展成能识别3000的结构，在不牺牲速度的同时，一定程度上保证了精度。

【R-FCN】《R-FCN: Object Detection via Region-based Fully Convolutional Networks》（NIPS-2016）

1.1 Background

现在的目标检测系统

Good performance in benchmark datasets
（R-CNN、Fast R-CNN、Faster R-CNN、 Deformable Convolutional Networks、Mask RCNN）
- PASCAL VOC：33%-88%
- COCO：37%-73%（at 50 overlap）
Bad for real-life object detection（eg: YOLO 9000，at the cost of accuracy）
- speed
- thousands of classes

1.2 Motivation

Many object classes are visually similar and share parts.
Decouple objectness detection and classification of the detected object so that the computational quirements for localization remain constant as the number of classes increases.

在这里插入图片描述

1.3 Notion

fine-grained¹

decoupling ²

1）耦合

耦合是指两个或两个以上的体系或两种运动形式间通过相互作用而彼此影响以至联合起来的现象。
在软件工程中，对象之间的耦合度就是对象之间的依赖性。对象之间的耦合越高，维护成本越高，因此对象的设计应使类和构件之间的耦合最小。
分类：有软硬件之间的耦合，还有软件各模块之间的耦合。耦合性是程序结构中各个模块之间相互关联的度量。它取决于各个模块之间的接口的复杂程度、调用模块的方式以及哪些信息通过接口。

2）解耦

解耦，字面意思就是解除耦合关系。
在软件工程中，降低耦合度即可以理解为解耦，模块间有依赖关系必然存在耦合，理论上的绝对零耦合是做不到的，但可以通过一些现有的方法将耦合度降至最低。
设计的核心思想：尽可能减少代码耦合，如果发现代码耦合，就要采取解耦技术。让数据模型，业务逻辑和视图显示三层之间彼此降低耦合，把关联依赖降到最低，而不至于牵一发而动全身。原则就是A功能的代码不要写在B的功能代码中，如果两者之间需要交互，可以通过接口，通过消息，甚至可以引入框架，但总之就是不要直接交叉写。
观察者模式：观察者模式存在的意义就是「解耦」，它使观察者和被观察者的逻辑不再搅在一起，而是彼此独立、互不依赖。比如网易新闻的夜间模式，当用户切换成夜间模式之后，被观察者会通知所有的观察者「设置改变了，大家快蒙上遮罩吧」。QQ消息推送来了之后，既要在通知栏上弹个推送，又要在桌面上标个小红点，也是观察者与被观察者的巧妙配合。

2 Innovation

modification of the R-FCN architecture

3 Advantages

It outperforms YOLO-9000 by 18% while processing 30 images per second.
zero-shot（unseen classes）效果还行

4 Method

4.1 Weakly Supervised vs. Supervised?

gap is large
用分类的数据，做 object detection
比如分狗，只识别到身体就足以区别其他类，腿和尾巴特征往往不重要，但是这很利于 object detection（bounding box）

做3000类的 object detection

supervised：no data（like COCO，VOC）
weakly supervised：poor performance

作者折衷，用了 ImageNet 数据集

A potential downside of using ImageNet for training object detectors is the loss of variation in scale and context around objects available in detection datasets, but we do have access to the bounding boxes of the objects.

聊胜于无嘛

4.2 Superclass Discovery

ResNet-101
the average of 2048-dimensional + K-means

$\left \{ x_j:j\in \left \{ 1,2,...,C \right \} \right \}$

$C$ 是 super class

$x$ 是 ResNet-101 最后一层的average , $x_j^i$ , $i$ 是 0到2048

4.3 Architecture

在这里插入图片描述

上面分支

Super classes：position sensitive filters
分类（超类k+1背景）+回归（offset）

下面分支

Fine grained：without position sensitive filters
分类（c类）

最后合二为一
超类分数*细类分数（因为聚类预先知道哪些细类归为一个超类）

4.4 Label Assignment

在这里插入图片描述

k 超类，0……K
c 细类，0……C

一个超类由很多细类组成

Detection（K+1 类（K-means 结果 + background））
- positive RoI： $k_i$ vs Ground True （overlap 大于 0.5）
- background：Otherwise
Classification（C 类）
- train only positive

4.5 Loss Function

可参考 Fast RCNN 的 loss 【Fast RCNN】《Fast-RCNN》

0.05*Smooth L1（定位） + L（分类）（softmax的输出 vs GT）

0.05 是因为分类需要 positive RoI，而定位是全部的 Proposal，数量差距大，平衡一下loss而设定的

其实， loss 的结构并没有改变

5 Experiments

5.1 Dataset

在这里插入图片描述

训练了 7 epochs
Training is performed on 2 Nvidia P6000 GPUs
(375x500)
Three anchor scales of (64,128,256)，3 aspect ratios of (1:2), (1:1) and (2:1) for the anchor boxes,

5.2 Comparison with Weakly Supervised Detectors

在这里插入图片描述

5.3 Speed and Performance

在这里插入图片描述

随着超类（clusters）的增多，performance 提升，时间变多
All the speed results are on a P6000 GPU.

NMS is performed for a group of visually similar classes together, instead of each class separately.
（这个NMS怎么用的我其实不太了解咯）

6 Discussion

6.1 Impact of Number of Classes and Clusters

在这里插入图片描述

（a）、（b）中，从 cluster 5 到 class-specific ，performance下降不是那么大

In light of these observations, we can conclude that more crucial to R-FCN
is learning an objectness measure instead of class-specific objectness.

在这里插入图片描述

6.2 Are PositionSensitive Filters Per Class Necessary?

PASCAL VOC
在这里插入图片描述
base line：deformable R-FCN detector （ResNet 50）
decouple 之后performance 并没有降低很多

在这里插入图片描述

设计细节
在这里插入图片描述

³、⁴

参考

bryant_meng

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》

CVPR-2018文章目录1 Background and Motivation1.1 Background1.2 Motivation1.3 Notion2 Innovation3 Advantages4 Method4.1 Weakly Supervised vs. Supervised?4.2 Superclass Discovery4.3 Architecture4.4 Label ...
复制链接

扫一扫