RCNN 论文 笔记

Rich feature hierarchies for accurate object detection and semantic segmentation

RCNN 物体检测

模型结构

整体模型包含三个自模型:
* first generates category-independent region proposals, these proposals define the set of candidate detections available to our detector.
* second is a large convolutional neural network that extracts a fixed-length feature vector from each region.
* third is a set of class specific linear SVMs

Region proposals

selective search

Feature extraction

extract a 4096-dim feature vector from region proposal using Caffe
为了使region的大小为227*227,we warp all pixels in a tight bounding box around it to the required size

Test-time detection

测试阶段,使用selective search提取了2000个region,warp这些区域,输入到CNN中来提取特征,然后,对于每一类,使用SVM进行打分,对于一张图片所有打好分的区域,we apply a greedy non-maximum suppression (for each class independently) that rejects a region if it has an intersection-over-union overlap with a higher scoring selected region large than a learned threshold.

Training

Supervised pre-training

pre-training

Domain-specific fine-tuning

为了将模型中的CNN应用到new task(detection)和new domain(warped proposal windows), 使用SGD,接着训练。N+1个类别,1 指的是背景。

We treat all region proposals with >= 0.5 IoU overlap with a ground-truth box as posistives and the rest as negatives.

In each SGD iteration, we uniformly sample 32 positive windows (over all classes) and 96 background windows to construct a mini-batch of size 128.

We bias the sampling towards positive windows because they are extremely rare compared to background.

Object category classifiers

当一个图片紧紧包含一个物体,很明显为正类,当一点点物体不包含,很明显不包含。那当与物体只是有交叉部分时怎样判断?
We resolve this issue with an IoU overlap threshold, below which regions are defined as negatives. The overlap threshold, 0.3, was selected by a grid search over {0, 0.1, …, 0.5} on a validation set.

Since the training data is too large to fit in memory, we adopt the standard hard negative mining method

Visualization, ablation and modes of error

Visualizing learned features

We propose a simple (and complementary) non-parametric method that directly shows what the network learned.

We compute the unit’s activations on a large set of held-out region proposals, perform non-maximum suppression, and then display the top-scoring regions.

Bounding-box regression

正对每个proposal使用SVM打分后,预测一个新的bounding-box regressor.
We regress from features computed by the CNN, rather than from geometric features computed on the inferred DPM part location.

这里写图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值