Object Detection《fast-rcnn》笔记（3）

最新推荐文章于 2022-11-24 17:46:13 发布

3602138103

最新推荐文章于 2022-11-24 17:46:13 发布

阅读量240

点赞数

分类专栏：深度学习之图像处理

本文链接：https://blog.csdn.net/qq_27163197/article/details/78366989

版权

深度学习之图像处理专栏收录该内容

18 篇文章 0 订阅

订阅专栏

Fast r-cnn

说明：针对 R-CNN 速度慢问题做了优化

Introduction

R-CNN：
1，Training is a multi-stage pipeline.【R-CNN 在训练期间分步完成操作，提 proposal + ConvNet 获取特征 + SVM 分类 + bounding box regression】
2，Training is expensive in space and time【R-CNN 在训练期间需要提取神经网络的最后一层 fc 层的输出作为 feature，训练 SVM 和 bounding box regression，这些个 feature 需要保存到硬盘当中，费时，占地。】
3，Test-time detection is slow
SPPnet：
can only update the fully-connected layers that follow spatial pyramid pooling. We hypothesize that this limitation will prevent very deep networks, like VGG16, from reaching their full potential。
Fast R-CNN：
1. Higher detection quality (mAP) than R-CNN
2. Training is single-stage, using a multi-task loss【实现end-to-end（端对端）单阶段训练】
3. All network layers can be updated during training
4. No disk storage is required for feature caching【不需要离线存储特征文件】

Fast R-CNN training

architectures that have several convolutional (conv) and max pooling layers, followed by a region of interest (RoI) pooling layer, and then several fully-connected (fc) layers.
去除了 SVM 分类器，使用 softmax 层进行分类
1，对图片中的潜在物体进行定位，使用 sparse 的 proposal，如 selective search 产生的结果，每幅图片产生约 2000 个 proposal。
2，训练和测试时，每张图片对神经网络的输入只有这个图片，还有对应的 proposal 位置，神经网络的卷积层与全连接层中，加入 RoI pooling 层，此层会对每个 proposal 提取相同维度的激活值到接下来的全连接层，解决重复计算问题，
3，神经网络的最后一层是 softmax 和 bbox regression 并联，所以这个网络能够同时输出物体类别和微调 proposal 的位置，所以 R-CNN 中提 proposal + ConvNet 获取特征 + SVM 分类 + bounding box regression，整合为提 proposal + 卷积神经网络两步，使得网络更加的简洁。最后仍然有非最大值抑制:-)
这里写图片描述

The RoI pooling layer

这里写图片描述

Using pre-trained networks

When a pre-trained network initializes Fast R-CNN, it undergoes three transformations
1，the last max pooling layer is replaced by a RoI pooling layer that is configured by setting H’ and W’ to be compatible with the net’s first fully-connected layer (e.g.,H’ = W’ = 7 for VGG16)
2，the network’s final fully-connected layer and softmax (which were trained for 1000-way ImageNet classification) are replaced with the two sibling layers described earlier (a fully-connected layer and softmax over K +1 categories and bounding-box regressors)
3，the network is modified to take two data inputs:, a batch of N images and a list of R RoIs. The batch size and number of RoIs can change。