Fast r-cnn
说明:针对 R-CNN 速度慢问题做了优化
Introduction
R-CNN:
1,Training is a multi-stage pipeline.【R-CNN 在训练期间分步完成操作,提 proposal + ConvNet 获取特征 + SVM 分类 + bounding box regression】
2,Training is expensive in space and time【R-CNN 在训练期间需要提取神经网络的最后一层 fc 层的输出作为 feature,训练 SVM 和 bounding box regression,这些个 feature 需要保存到硬盘当中,费时,占地。】
3,Test-time detection is slow
SPPnet:
can only update the fully-connected layers that follow spatial pyramid pooling. We hypothesize that this limitation will prevent very deep networks, like VGG16, from reaching their full potential。
Fast R-CNN:
1. Higher detection quality (mAP) than R-CNN
2. Training is single-stage, using a multi-task loss【实现end-to-end(端对端) 单阶段训练】
3. All network layers can be updated during training
4. No disk storage is required for feature caching【不需要离线存储特征文件】
Fast R-CNN training
architectures that have several convolutional (conv) and max pooling layers, followed by a region of interest (RoI) pooling layer, and then several fully-connected (fc) layers.
去除了 SVM 分类器,使用 softmax 层进行分类
1,对图片中的潜在物体进行定位,使用 sparse 的 proposal,如 selective search 产生的结果,每幅图片产生约 2000 个 proposal。
2,训练和测试时,每张图片对神经网络的输入只有这个图片,还有对应的 proposal 位置,神经网络的卷积层与全连接层中,加入 RoI pooling 层,此层会对每个 proposal 提取相同维度的激活值到接下来的全连接层,解决重复计算问题,
3,神经网络的最后一层是 softmax 和 bbox regression 并联,所以这个网络能够同时输出物体类别和微调 proposal 的位置,所以 R-CNN 中提 proposal + ConvNet 获取特征 + SVM 分类 + bounding box regression,整合为提 proposal + 卷积神经网络两步,使得网络更加的简洁。最后仍然有非最大值抑制:-)
The RoI pooling layer
Using pre-trained networks
When a pre-trained network initializes Fast R-CNN, it undergoes three transformations
1,the last max pooling layer is replaced by a RoI pooling layer that is configured by setting H’ and W’ to be compatible with the net’s first fully-connected layer (e.g.,H’ = W’ = 7 for VGG16)
2,the network’s final fully-connected layer and softmax (which were trained for 1000-way ImageNet classification) are replaced with the two sibling layers described earlier (a fully-connected layer and softmax over K +1 categories and bounding-box regressors)
3,the network is modified to take two data inputs:, a batch of N images and a list of R RoIs. The batch size and number of RoIs can change。
Back-propagation through RoI pooling layers
Multi-task loss.
training
Truncated SVD:截断 SVD
在测试时作者发现神经网络大量的时间都是用在全连接层,假设 weight matrix 是 u×v大小,类似 PCA ,做了一个截断 SVD 近似。