Rich feature hierarchies for accurate object detection and semantic segmentation
说明:extracts a fixed-length feature vector from each proposal using a CNN, and then classifies each region with category-specific linear SVMs;use unsupervised pre-training+fine-tuning;
Object detection with R-CNN
1,The first generates category-independent region proposals。
2,The second module is a large convolutional neural network that extracts a fixed-length feature vector from each region.
3,The third module is a set of classspecific linear SVMs.
Region proposals:use selective search extract around 2000 region proposals
Feature extraction:extract a 4096-dimensional feature vector from each region proposal;Features are computed by forward propagating a mean-subtracted 227 × 227 RGB image through five convolutional layers and two fully connected layers.
Test-time detection
1,we dilate the tight bounding box so that at the warped size there are exactly p pixels of warped image context around the original box (we use p = 16)【在每个建议框周围加上16个像素值为建议框像素平均值的边框,再直接变形为227×227的大小】
2, rejects a region if it has an intersection-over-union (IoU) overlap with a higher scoring selected region larger than a learned threshold。【分别对上述2000×20维矩阵中每一列即每一类进行非极大值抑制剔除重叠建议框,得到该列即该类中得分最高的一些建议框;】
Training
Domain-specific fine-tuning:initialized 21-way classification layer (for the 20 VOC classes plus background), the CNN architecture is unchanged. We treat all region proposals with ≥ 0:5 IoU overlap with a ground-truth box as positives for that box’s class and the rest as negatives. We start SGD at a learning rate of 0.001 (1/10th of the initial pre-training rate), which allows fine-tuning to make progress while not clobbering the initialization. In each SGD iteration, we uniformly sample 32 positive windows (over all classes) and 96 background windows to construct a mini-batch of size 128。
相关概念
1,mAP【mean Average Precision】:给每一类分别计算AP,然后做mean平均;AP是Precision-Recall Curve下面的面积;准确率precision: TP/(TP+FP);召回率recall: TP/(TP+FN)。
2,IoU:= (A∩B)/(A∪B)
在测试过程完成到第4步之后,获得2000×20维矩阵表示每个建议框是某个物体类别的得分情况,此时会遇到下图所示情况,同一个车辆目标会被多个建议框包围,这时需要非极大值抑制操作去除得分较低的候选框以减少重叠框。
存在问题
1,训练时间很长(84小时)
2,测试阶段很慢
3,复杂的多阶段训练