Pytorch深度学习框架YOLOv3目标检测学习笔记（四）——置信阈值和非最大抑制

本文链接：https://blog.csdn.net/sinat_33896833/article/details/103125040

准备

在util.py中创建write_results函数来获取我们的正确检测结果

def write_results(prediction, confidence, num_classes, nms_conf = 0.4):

函数以prediction, confidence (objectness score threshold), num_classes (80, in our case) and nms_conf (the NMS IoU threshold)作为输入

目标置信度

预测向量包含B×10647锚框的信息。对每一个得分低于阈值的锚框，我们设置它的每个属性都为零

    conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
    prediction = prediction*conf_mask

实现非最大抑制

锚框的属性有中心坐标，高和宽。然而用每个锚框的一对对角线坐标很容易计算两个锚框的IoU（交并比）

    box_corner = prediction.new(prediction.shape)
    box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)
    box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)
    box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2) 
    box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)
    prediction[:,:,:4] = box_corner[:,:,:4]

在每个图片的真实测量数量可能不同，例如，一个批大小为3的图像有1,2,3图片，有5,2,4真实预测，所以置信阈值和非最大抑制必须一次性对一张图片做完，这意味着我们不能认为操作了就完了，而必须循环遍历预测的每一个维度

    batch_size = prediction.size(0)

    write = False

    for ind in range(batch_size):
        image_pred = prediction[ind]          #image Tensor
           #confidence threshholding 
           #NMS

像先前所说，write标志位用来说明我们对output是否初始化，一个我们用来收集整个批次正确检测的向量。
在循环里，每一个锚框行有85个属性，这种情况下，我们只关注最大值的类得分，所以我们从每行移除80个类的得分

        max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1)
        max_conf = max_conf.float().unsqueeze(1)
        max_conf_score = max_conf_score.float().unsqueeze(1)
        seq = (image_pred[:,:5], max_conf, max_conf_score)
        image_pred = torch.cat(seq, 1)

将置信度低于阈值的锚框置零


        non_zero_ind =  (torch.nonzero(image_pred[:,4]))
        try:
            image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)
        except:
            continue
        
        #For PyTorch 0.4 compatibility
        #Since the above code with not raise exception for no detection 
        #as scalars are supported in PyTorch 0.4
        if image_pred_.shape[0] == 0:
            continue

在图片循环体中继续跳过没有检测到的块
图像中检测得到类

        #Get the various classes detected in the image
        img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index

如果同一类有多个正确检测，我们使用unique函数得到给出图片的真实类

def unique(tensor):
    tensor_np = tensor.cpu().numpy()
    unique_np = np.unique(tensor_np)
    unique_tensor = torch.from_numpy(unique_np)
    
    tensor_res = tensor.new(unique_tensor.shape)
    tensor_res.copy_(unique_tensor)
    return tensor_res

使用非最大抑制分类

        for cls in img_classes:
            #perform NMS

进行NMS非极大抑制

for i in range(idx):
    #Get the IOUs of all boxes that come after the one we are looking at 
    #in the loop
    try:
        ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
    except ValueError:
        break

    except IndexError:
        break

    #Zero out all the detections that have IoU > treshhold
    iou_mask = (ious < nms_conf).float().unsqueeze(1)
    image_pred_class[i+1:] *= iou_mask       

    #Remove the non-zero entries
    non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
    image_pred_class = image_pred_class[non_zero_ind].view(-1,7)

这里我们使用bbox_iou函数，第一项输入是用变量i在循环中索引的锚框的行
第二项Bbox_iou的输入是锚框的多行向量，bbox_iou函数的输出是一个包含交并比的向量，代表第一个输入的锚框和在第二个输入的每一个锚框的交并比
如果我么有同一类超过阈值的的两个锚框，低的置信度的就被忽略，我们要用有最大置信度的锚框进行分类。
在循环体中，下面的代码给出了锚框的交并比。

ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])

每一次迭代，如果i以后的锚框有IoU大于nms_thresh阈值，这个锚框就被取消


iou_mask = (ious < nms_conf).float().unsqueeze(1)
image_pred_class[i+1:] *= iou_mask       

#Remove the non-zero entries
non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
image_pred_class = image_pred_class[non_zero_ind]

计算IoU

如下是bbox_iou函数

def bbox_iou(box1, box2):
    """
    Returns the IoU of two bounding boxes 
    
    
    """
    #Get the coordinates of bounding boxes
    b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3]
    b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3]
    
    #get the corrdinates of the intersection rectangle
    inter_rect_x1 =  torch.max(b1_x1, b2_x1)
    inter_rect_y1 =  torch.max(b1_y1, b2_y1)
    inter_rect_x2 =  torch.min(b1_x2, b2_x2)
    inter_rect_y2 =  torch.min(b1_y2, b2_y2)
    
    #Intersection area
    inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(inter_rect_y2 - inter_rect_y1 + 1, min=0)
 
    #Union Area
    b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1)
    
    iou = inter_area / (b1_area + b2_area - inter_area)
    
    return iou

写入预测文件

write_results函数输出大小为D×8向量这里D是所有图像的正确检测，用一行代表，每个检测有8个属性。
像之前一样，我们没有初始化我们的输出向量因为我们有正确匹配赋值给它，一旦它被初始化了，我们将后续检测连接上去，我们用write标志说明向量是否被初始化，我们将检测结果写到output向量中。

            batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)      
            #Repeat the batch_id for as many detections of the class cls in the image
            seq = batch_ind, image_pred_class

            if not write:
                output = torch.cat(seq,1)
                write = True
            else:
                out = torch.cat(seq,1)
                output = torch.cat((output,out))

在函数最后，我们检测输出是否被初始化了，如果没有意味着在这一批里没有一个检测到的一个目标，返回零

    try:
        return output
    except:
        return 0