Fast R-CNN

最新推荐文章于 2023-12-01 08:56:18 发布

努力奋斗-不断进化

最新推荐文章于 2023-12-01 08:56:18 发布

阅读量205

点赞数

分类专栏：图像处理目标检测

本文链接：https://blog.csdn.net/zhuoyuezai/article/details/81666020

版权

目标检测同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

图像处理

8 篇文章 0 订阅

订阅专栏

论文链接：Fast R-CNN

对于R-CNN的复杂度高分析

Complexity arises because detection requires the accurate localization of objects, creating two primary challenges. First, numerous candidate object locations (often called “proposals”) must be processed. Second, these candidates provide only rough localization that must be refined to achieve precise localization. Solutions to these problems often compromise speed, accuracy, or simplicity.

R-CNN慢的原因
R-CNN is slow because it performs a ConvNet forward pass for each object proposal, without sharing computation. Spatial pyramid pooling networks (SPPnets) were proposed to speed up R-CNN by sharing computation.

SPPnet的缺点

SPPnet also has notable drawbacks. Like R-CNN, training is a multi-stage pipeline that involves extracting features, fine-tuning a network with log loss, training SVMs, and finally fitting bounding-box regressors. Features are also written to disk.

作者认为 Fast R-CNN 的优点

1. Higher detection quality (mAP) than R-CNN, SPPnet
2. Training is single-stage, using a multi-task loss
3. Training can update all network layers
4. No disk storage is required for feature caching

Fast R-CNN框架图

图2 来源于 https://www.jiqizhixin.com/articles/2017-09-18-7

A Fast R-CNN network takes as input an entire image and a set of object proposals.

The network first processes the whole image with several convolutional (conv) and max pooling layers to produce a conv feature map.

Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map.

Each feature vector is fed into a sequence of fully connected (fc) layers that finally branch into two sibling output layers: one that produces softmax probability estimates over K object classes plus a catch-all “background” class

and another layer that outputs four real-valued numbers for each of the K object classes.

Each set of 4 values encodes refined bounding-box positions for one of the K classes.

https://www.jiqizhixin.com/articles/2017-09-18-7

直接承接 R-CNN 的是 Fast R-CNN。Fast R-CNN 在很多方面与 R-CNN 类似，但是，凭借两项主要的增强手段，其检测速度较 R-CNN 有所提高：

在推荐区域之前，先对图像执行特征提取工作，通过这种办法，后面只用对整个图像使用一个 CNN（之前的 R-CNN 网络需要在 2000 个重叠的区域上分别运行 2000 个 CNN）。
将支持向量机替换成了一个 softmax 层，这种变化并没有创建新的模型，而是将神经网络进行了扩展以用于预测工作。

https://zhuanlan.zhihu.com/p/34142321

文章最后的讨论也有一定的借鉴意义：

multi-loss traing相比单独训练classification确有提升
multi-scale相比single-scale精度略有提升，但带来的时间开销更大。一定程度上说明CNN结构可以内在地学习尺度不变性
在更多的数据(VOC)上训练后，精度是有进一步提升的
Softmax分类器比"one vs rest"型的SVM表现略好，引入了类间的竞争
更多的Proposal并不一定带来精度的提升

努力奋斗-不断进化

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Fast R-CNN

论文链接：Fast R-CNN对于R-CNN的复杂度高分析Complexity arises because detection requires the accurate localization of objects, creating two primary challenges. First, numerous candidate object locations (often c...
复制链接

扫一扫