Fast R-CNN

论文链接:Fast R-CNN

对于R-CNN的复杂度高分析

Complexity arises because detection requires the accurate localization of objects, creating two primary challenges. First, numerous candidate object locations (often called “proposals”) must be processed. Second, these candidates provide only rough localization that must be refined to achieve precise localization. Solutions to these problems often compromise speed, accuracy, or simplicity.

R-CNN慢的原因
R-CNN is slow because it performs a ConvNet forward pass for each object proposal, without sharing computation. Spatial pyramid pooling networks (SPPnets) were proposed to speed up R-CNN by sharing computation.

SPPnet的缺点

SPPnet also has notable drawbacks. Like R-CNN, training is a multi-stage pipeline that involves extracting features, fine-tuning a network with log loss, training SVMs, and finally fitting bounding-box regressors. Features are also written to disk.

作者认为 Fast R-CNN 的优点

1. Higher detection quality (mAP) than R-CNN, SPPnet
2. Training is single-stage, using a multi-task loss
3. Training can update all network layers
4. No disk storage is required for feature caching

Fast R-CNN框架图

 

图2

图2 来源于 https://www.jiqizhixin.com/articles/2017-09-18-7

A Fast R-CNN network takes as input an entire image and a set of object proposals.

The network first processes the whole image with several convolutional (conv) and max pooling layers to produce a conv feature map.

Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map.

Each feature vector is fed into a sequence of fully connected (fc) layers that finally branch into two sibling output layers: one that produces softmax probability estimates over K object classes plus a catch-all “background” class

and another layer that outputs four real-valued numbers for each of the K object classes.

Each set of 4 values encodes refined bounding-box positions for one of the K classes.

R-CNN 框架图

https://www.jiqizhixin.com/articles/2017-09-18-7

直接承接 R-CNN 的是 Fast R-CNN。Fast R-CNN 在很多方面与 R-CNN 类似,但是,凭借两项主要的增强手段,其检测速度较 R-CNN 有所提高:

  1. 在推荐区域之前,先对图像执行特征提取工作,通过这种办法,后面只用对整个图像使用一个 CNN(之前的 R-CNN 网络需要在 2000 个重叠的区域上分别运行 2000 个 CNN)。
  2. 将支持向量机替换成了一个 softmax 层,这种变化并没有创建新的模型,而是将神经网络进行了扩展以用于预测工作。

https://zhuanlan.zhihu.com/p/34142321

文章最后的讨论也有一定的借鉴意义:

  • multi-loss traing相比单独训练classification确有提升
  • multi-scale相比single-scale精度略有提升,但带来的时间开销更大。一定程度上说明CNN结构可以内在地学习尺度不变性
  • 在更多的数据(VOC)上训练后,精度是有进一步提升的
  • Softmax分类器比"one vs rest"型的SVM表现略好,引入了类间的竞争
  • 更多的Proposal并不一定带来精度的提升

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值