Object Detection《fast-rcnn》笔记(3)

Fast r-cnn

说明:针对 R-CNN 速度慢问题做了优化

Introduction

R-CNN
1,Training is a multi-stage pipeline.【R-CNN 在训练期间分步完成操作,提 proposal + ConvNet 获取特征 + SVM 分类 + bounding box regression】
2,Training is expensive in space and time【R-CNN 在训练期间需要提取神经网络的最后一层 fc 层的输出作为 feature,训练 SVM 和 bounding box regression,这些个 feature 需要保存到硬盘当中,费时,占地。】
3,Test-time detection is slow
SPPnet
can only update the fully-connected layers that follow spatial pyramid pooling. We hypothesize that this limitation will prevent very deep networks, like VGG16, from reaching their full potential。
Fast R-CNN
1. Higher detection quality (mAP) than R-CNN
2. Training is single-stage, using a multi-task loss【实现end-to-end(端对端) 单阶段训练】
3. All network layers can be updated during training
4. No disk storage is required for feature caching【不需要离线存储特征文件】

Fast R-CNN training

architectures that have several convolutional (conv) and max pooling layers, followed by a region of interest (RoI) pooling layer, and then several fully-connected (fc) layers.
去除了 SVM 分类器,使用 softmax 层进行分类
1,对图片中的潜在物体进行定位,使用 sparse 的 proposal,如 selective search 产生的结果,每幅图片产生约 2000 个 proposal。
2,训练和测试时,每张图片对神经网络的输入只有这个图片,还有对应的 proposal 位置,神经网络的卷积层与全连接层中,加入 RoI pooling 层,此层会对每个 proposal 提取相同维度的激活值到接下来的全连接层,解决重复计算问题,
3,神经网络的最后一层是 softmax 和 bbox regression 并联,所以这个网络能够同时输出物体类别和微调 proposal 的位置,所以 R-CNN 中提 proposal + ConvNet 获取特征 + SVM 分类 + bounding box regression,整合为提 proposal + 卷积神经网络两步,使得网络更加的简洁。最后仍然有非最大值抑制:-)
这里写图片描述

The RoI pooling layer

这里写图片描述

Using pre-trained networks

When a pre-trained network initializes Fast R-CNN, it undergoes three transformations
1,the last max pooling layer is replaced by a RoI pooling layer that is configured by setting H’ and W’ to be compatible with the net’s first fully-connected layer (e.g.,H’ = W’ = 7 for VGG16)
2,the network’s final fully-connected layer and softmax (which were trained for 1000-way ImageNet classification) are replaced with the two sibling layers described earlier (a fully-connected layer and softmax over K +1 categories and bounding-box regressors)
3,the network is modified to take two data inputs:, a batch of N images and a list of R RoIs. The batch size and number of RoIs can change。

Back-propagation through RoI pooling layers

这里写图片描述

Multi-task loss.

这里写图片描述

training

这里写图片描述

Truncated SVD:截断 SVD

在测试时作者发现神经网络大量的时间都是用在全连接层,假设 weight matrix 是 u×v大小,类似 PCA ,做了一个截断 SVD 近似。
这里写图片描述

Fast R-CNN概览

这里写图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值