# 基本概念

### IOU是什么

The bounding boxes for the training and testing sets are hand labeled, hence why we call them the “ground-truth”. Your goal is to take the training images + bounding boxes, construct an object detector, and then evaluate its performance on the testing set. An IoU score > 0.5 is normally considereda“good”prediction.

### mAP

In the context of machine learning, precision typically refers to accuracy — but in the context of object detection, IoU is our precision. However, we need to deﬁne a method to compute accuracy per class and across all classes in dataset. To accomplish this goal, we need mean Average Precision(mAP)

To compute average precision for a single class, we determine the IoU of all data points for a particular class. Once we have the IoU we divide by the total class labels for that speciﬁc class, yielding the average precision. To compute the mean average precision, we compute the average IoU for all N classes — then we take the average of these N averages, hence the term mean average precision.也就是说，对总的IoU取平均值。这也可以说就是mAP的意思。

0.5是一个基准值。大多数目标检测算法都应该在0.5这个值以上，否则，这个目标检测算法就没有什么意义。

# Fast R-CNN

### RPN

RPN通常用来对图像的潜在候选区域进行识别，这时候我们并不知道具体候选区域是哪一个，只是对其潜在的候选区域进行识别。

### Anchor

（1）滑动窗口 + 图像金字塔（2）候选区域选择算法。然而我们的目标是使用一种端到端的深度学习方法来生成候选区域模型，因此我们需要一种全新的方法来找到我们的ROI。

### Regin of Interest（ROI）Pooling

ROI池化层的目的是提取N个候选区域的ROI特征，之后ROI池化模型会对维度重新编码，并将维度下降至77D（D是特征图的深度），这些准备好的特征是为接下来的全连接层做准备。

ROI池化层提取的这些候选区域由RPN模型来提供。

### Region-based Convolutional Neural Network

Class labels之所以是N+1是因为有N个类别标记，另外一个类别是背景类别，所以加1。Bounding box predictions之所以是4*N是因为，每一个类别有四个坐标位置要求（$\Delta x-center$$\Delta y-center$$\Delta widith$$\Delta height$）。

（1）对于分类会使用分类交叉熵损失函数，（2）对于边界框的回归问题我们将会使用平滑L1损失函数，我们这里使用categorical cross-entropy 而不是binary cross-entropy的主要原因是 （we are computing probabilities for each of our N classes versus the binary case (background vs. foreground) in the RPN module ）我们在RPN模型中计算的是每一个类别的N分类问题，而不是前景和后景的二分类情况。

# 关于计算的选择

In nearly all situations you’ll ﬁnd that jointly training the entire network end-to-end by minimizing the weighted sum of the four loss functions not only takes less time but also obtains higher accuracy as well.