捋一下目标检测的思想，应该是有不对的地方，待更正

最新推荐文章于 2024-06-28 13:02:40 发布

yueyuecsdn

最新推荐文章于 2024-06-28 13:02:40 发布

阅读量151

点赞数

分类专栏：目标检测

本文链接：https://blog.csdn.net/yueyuecsdn/article/details/107997885

版权

目标检测专栏收录该内容

1 篇文章 0 订阅

订阅专栏

NMS的英文解释：

For a prediction bounding box B, the model calculates the predicted probability for each category. Assume the largest predicted probability is p, the category corresponding to this probability is the predicted category of B. We also refer to pas the confidence level of prediction bounding box B. On the same image, we sort the prediction bounding boxes with predicted categories other than background by confidence level from high to low, and obtain the list L. Select the prediction bounding box B1 with highest confidence level from L as a baseline and remove all non-benchmark prediction bounding boxes with an IoU with B1 greater than a certain threshold from L. The threshold here is a preset hyper-parameter. At this point,L retains the prediction bounding box with the highest confidence level and removes other prediction bounding boxes similar to it. Next, select the prediction bounding box B2 with the second highest confidence level from L as a baseline, and remove all non-benchmark prediction bounding boxes with an IoU with B2 greater than a certain threshold from L. Repeat this process until all prediction bounding boxes in L have been used as a baseline. At this time, the IoU of any pair of prediction bounding boxes in L is less than the threshold. Finally, output all prediction bounding boxes in the list L.

Detection中，FasterRCNN，YOLO这些都用到了anchor。一般是图像在若干个特征层后产生特征图，特征图上每个像素给定若干个anchor，当然这些anchor是映射到原图上去的。这样的候选框其实也是基于简单粗暴的列举的思想，只不过是在特征层，而不是原始的在原图上简单粗暴的滑窗，至此，特征层完成了一大使命。训练的时候，我们默想下在一张图像上，有很多anchor框，然后我们标注了GT框，那么GT框和我们预先设定的框会有重叠，也会有不重叠。那么我们就可以得到重叠的面积，重叠的面积越大，我们便认为这个anchor属于这个类别的可能性就越大，那么就把这个anchor留下来，留下来的anchor就叫做region proposal。这个anchor可以找到它属于特征图上的哪个像素，那么这个像素接下来会有什么重要作用吗？

言归正传，训练的时候，在每个GT框周围围绕了很多anchor框，我们是想要取出重叠面积较大的那些anchor，设置一个阈值，大于该阈值就留下，小于该阈值就删除。然后，我们通过损失函数，逼近anchor框和GT框的类别以及坐标，也就是说，在最后一步，anchor框是会发生平移的？然后建立最终保留下的anchor框和原始图像的函数关系，这个函数关系就是模型。GT框的作用就是在于帮助建立anchor和图像的映射关系，抽象为数学问题，就是建立图像矩阵和坐标矩阵的映射关系。

测试的时候，我们加载模型，也就是加载映射关系，所以输入一张图像就会得到预测框的输出。但是映射关系有好有坏，这也就是为什么有的模型输出预测框标签不对，或者回归框偏移太大。其实哪怕是模型训练的足够好，也会出现很多多余的框，这个时候就需要用到NMS。NMS步骤如下：