-
Advantages:
-
A single convolutional network simultaneously predicts multiple bounding boxes and class probabilities for those boxes
-
YOLO is extremely fast, 45 frames per second
-
Since we frame detection as a regression problem
-
-
YOLO sees the entire image during training and test time so it implicitly encodes contextual information about classes as well as their appearance
-
Fast R-CNN, a top detection method, mistakes background patches in an image for objects because it can’t see the larger context
-
-
YOLO is highly generalizable
-
-
Step:
-
Our system divides the input image into an S × S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object.
-
For every grid cell, get two bounding boxes, every BBox have 5 parameters, The remaining 20 denote the number of classes. The values denote the class score, which is the conditional probability of object belongs to class i
-
how to compute the bbox confidence of every bbox
-
这个confidence 代表了所预测的Bbox中含有object的置信度 和这个预测有多准 的两重信息
-
-
-
Next, we multiply all these class score with bounding box confidence and get class scores for different bounding boxes
-
In the end, you will get a tensor value of 7*7*30.
-
-
-
Loss的设计
-
YOLO算法将目标检测问题看做是回归问题,采用的是均方差损失函数,但是对于不同的部分采用了不同的权重值
-
对于输出的tensor,有8维是与位置相关的,20维是与类别相关的,其重要性当然不能相同。于是对于定位误差,采用较大的权重
对于不包含目标的边界框的置信度采用较小的权重
, 其他权重设置维1, 然后采取均方误差
-
但是实际上对于大小不同的边框,小边框的坐标误差是要比大边框的坐标误差更加铭感一点的,所以将网络的边界框的宽与高的预测改为对其平方根的预测
-
其中第一项是边界框中心坐标的误差项,
指的是第
个单元格存在目标,且该单元格中的第
个边界框负责预测该目标。第二项是边界框的高与宽的误差项。第三项是包含目标的边界框的置信度误差项。第四项是不包含目标的边界框的置信度误差项。而最后一项是包含目标的单元格的分类误差项,
指的是第
个单元格存在目标。
-
-
网络预测
-
NMS非极大抑制
-
首先从所有的检测框中找到置信度最大的那个框,
-
然后挨个计算其与剩余框的IOU,如果其值大于一定阈值(重合度过高),那么就将该框剔除;
-
然后对剩余的检测框重复上述过程,直到处理完所有的检测框。
-
-
-
缺点
-
由于每个单元格预测多个边界框,但是其对应的 类别只有一个,那么在训练时,如果该单元格里确实有目标,那么就选择与 groundTruth的IOU最大的那个边界框来负责预测该i目标。但是如果一个单元格里确实有多个目标怎么办,那么YOLO算法就只能选其中一个进行训练
-
-