最近把YOLO系列的论文,还有YOLOv3的源码又回顾了一下,感觉有一些是之前没注意到的,稍微总结下最近的工作。
- V3的Pytorch源码,我看的是这个版本:
ayooshkathuria/YOLO_v3_tutorial_from_scratch
- 这个是关于源码的一些解释说明:
How to implement a YOLO (v3) object detector from scratch in PyTorch
v3的论文相比较与v1,v2的论文,不去看源码,不得不说,有点晕…
对代码部分,个人大概可以分几个部分去理解,cfg文件的解析,创建模型,设计输入输出,Loss函数,NMS,但是这个版本没有Loss的部分
- 有一个地方是:卷积层后后面如果跟BN层,那么卷积层是可以不用bias的
不添加bias,在BN层的处理是:
x
i
−
x
ˉ
D
(
x
)
2
\frac{x_i-\bar{x}}{\sqrt[2]{D(x)}}
2D(x)xi−xˉ
添加bias,在BN层的处理是:
x
i
+
b
−
(
x
ˉ
+
b
)
D
(
x
)
2
\frac{x_i+b-(\bar{x}+b)}{\sqrt[2]{D(x)}}
2D(x)xi+b−(xˉ+b)
- 其他的部分都还好,这次就记录下跟NMS相关的那一部分,具体NMS的部分,之前有记录过
def write_results(prediction, confidence, num_classes, nms_conf = 0.4):
'''将输出的结果中 >confidence 的输出改为0'''
conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
prediction = prediction*conf_mask
'''将之前输出的center_x, center_y, H, W,改为box的左上和右下的坐标值'''
box_corner = prediction.new(prediction.shape)
box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)
box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)
box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2)
box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)
prediction[:,:,:4] = box_corner[:,:,:4]
batch_size = prediction.size(0)
write = False
'''依次提取每一张图片'''
for ind in range(batch_size):
image_pred = prediction[ind] #image Tensor
'''torch.max返回的两个值,第一个是最大值,第二个是最大值的索引'''
max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1)
max_conf = max_conf.float().unsqueeze(1)
max_conf_score = max_conf_score.float().unsqueeze(1)
seq = (image_pred[:,:5], max_conf, max_conf_score)
image_pred = torch.cat(seq, 1)
'''然后将之前改为0的值去掉'''
non_zero_ind = (torch.nonzero(image_pred[:,4]))
try:
image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)
except:
continue
if image_pred_.shape[0] == 0:
continue
#Get the various classes detected in the image
'''输出的所有的预测种类,然后根据类别去做NMS'''
img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index
for cls in img_classes:
#perform NMS
#get the detections with one particular class
cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)
class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
image_pred_class = image_pred_[class_mask_ind].view(-1,7)
#sort the detections such that the entry with the maximum objectness
#confidence is at the top
'''根据置信度排序,从最大的开始循环处理'''
conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]
image_pred_class = image_pred_class[conf_sort_index]
idx = image_pred_class.size(0) #Number of detections
for i in range(idx):
#Get the IOUs of all boxes that come after the one we are looking at
#in the loop
try:
ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
except ValueError:
break
except IndexError:
break
#Zero out all the detections that have IoU > treshhold
'''如果IOU > 设置的阈值,就改为0,然后移除这些框'''
iou_mask = (ious < nms_conf).float().unsqueeze(1)
image_pred_class[i+1:] *= iou_mask
#Remove the non-zero entries
non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
image_pred_class = image_pred_class[non_zero_ind].view(-1,7)
batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind) #Repeat the batch_id for as many detections of the class cls in the image
seq = batch_ind, image_pred_class
'''最后,把所有的输出合并'''
if not write:
output = torch.cat(seq,1)
write = True
else:
out = torch.cat(seq,1)
output = torch.cat((output,out))
try:
return output
except:
return 0