PyTorch实现yolov3代码详细解密（三）

最新推荐文章于 2024-08-10 15:57:41 发布

小小小绿叶

最新推荐文章于 2024-08-10 15:57:41 发布

阅读量4.5k

点赞数 6

分类专栏：深度学习文章标签：深度学习 pytorch

本文链接：https://blog.csdn.net/litt1e/article/details/89435660

版权

深度学习专栏收录该内容

39 篇文章 56 订阅

订阅专栏

我们在之前的讲解中已经完成了Yolov3的整体网络框架，即已经构建了一个能为给定输入图像输出多个目标检测结果的模型。具体来说，我们的输出是一个形状为 B x 10647 x 85 的张量；其中 B 是指一批（batch）中图像的数量，10647 是每个图像中所预测的边界框的数量，85 是指边界框属性的数量。

在完成具体框架后，就要看看yolov3的训练机制了。我们首先整体讲解它的inference procedure，然后具体在代码中进行分析。
在这里插入图片描述
我们得到一个形状为 B x 10647 x 85 的张量，这里我们让B=1，即得到10647×85，有10647个bbox。首先，设定阈值，将confidence小于阈值的bbox置零清除，再通过第5个attribution（confidence）来对所有bbox进行降序。yolov3有80个类别，降序之后我们带每个类别的score进行NMS非极大值抑制不懂请戳，去除哪些冗余的bbox。

具体NMS如何做呢？
在这里插入图片描述
将之前处理后的张量再对每个类别的score进行降序，比如我们看到上图狗这一类别，我们把最大的score称作bbox_max，用它与其他比它得分少的bbox（我们称为bbox_cur）来比较，如果它们的IOU大于0.5，我们就把这个bbox_cur的得分置零，一直循环完所有的bbox。这就是我们大概的procedure。
下面我们具体看看代码如何体现。

置信度阈值设置和非极大值抑制

我们必须使我们的输出满足 objectness 分数阈值和非极大值抑制（NMS），以得到后文所说的「真实（true）」检测结果。util.py 文件中一个名为 write_results 的函数。

def write_results(prediction, confidence, num_classes, nms_conf = 0.4):

该函数的输入为预测结果、置信度（objectness 分数阈值）、num_classes（我们这里是 80）和 nms_conf（NMS IoU 阈值）。

目标置信度阈值

conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
    #unsqueeze将第2维解压成1，这里prediction[:,:,4]->[:;:],unsqueeze后变为[:;:;1]
    prediction = prediction*conf_mask
    # 将object小于confidence的索引位置零。

我们的预测张量包含有关 B x 10647 边界框的信息。对于有低于一个阈值的 objectness 分数的每个边界框，我们将其每个属性的值（表示该边界框的一整行）都设为零。

执行非极大值抑制

box_a = prediction.new(prediction.shape)
    box_a[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)
    box_a[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)
    box_a[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2) 
    box_a[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)
    prediction[:,:,:4] = box_a[:,:,:4]
    #将box_a的0-3通道的值赋值给pre，即（x1,y1,x2,y2）左上右下坐标

我们知道prediction[:,:,0]表示中心坐标x ，prediction[:,:,2]表示宽，所以如上图公式，我们可以轻松算出左上角 x, 左上角 y, 右下角 x, 右下角 y。这样换算是为了计算IOU。

    for ind in range(batch_size):
        #select the image from the batch
        image_pred = prediction[ind]
        #prediction->[batch_size*10647*85],image_pred->[10647*85]

我们用ind索引来分别计算每一张图的检测结果。

		#Get the class having maximum score, and the index of that class
        #Get rid of num_classes softmax scores 
        #Add the class index and the class score of class having maximum score
        max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1)
        #max_conf是置信度最大值，max_conf_score是其索引
        max_conf = max_conf.float().unsqueeze(1)#解压成[10647*1]
        max_conf_score = max_conf_score.float().unsqueeze(1)
        seq = (image_pred[:,:5], max_conf, max_conf_score)
        #seq是元组包含上面三个[10647*5],[10647*1],[10647*1]
        image_pred = torch.cat(seq, 1)
        #image_pred融合成[10647*7]{坐标信息，confidence，最大值，索引}

我们只关心有最大值的类别分数。所以，我们移除了每一行的这 80 个类别分数，并且转而增加了有最大值的类别的索引以及那一类别的类别分数。max_conf是置信度最大值，max_conf_score是其索引，seq是元组包含上面三个[10647×5],[10647×1],[10647×1]，image_pred融合成[10647×7]{坐标信息，confidence，最大值，索引}。

		#Get rid of the zero entries
        non_zero_ind =  (torch.nonzero(image_pred[:,4]))
        #将满足confidence的索引赋值

        
        image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)
        #image_pred_就是移除置信度小于阈值后，80个类别信息换作最大值及索引。

我们用torch.nonzero来获取满足confidence的索引，image_pred_就是移除置信度小于阈值后的值。

 		 #Get the various classes detected in the image
        try:
            img_classes = unique(image_pred_[:,-1])#image_pred_最后一行是最大值索引
        except:
             continue

现在，让我们获取一张图像中所检测到的类别。image_pred_[:,-1]指的是类别最大值的索引，用unique把重复的索引删除。

for cls in img_classes:

然后，我们按照类别执行 NMS。

			#get the detections with one particular class
            cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)
            #将image_pred_中不是class的置零
            class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
            #清零后得到索引指向class那一列

            image_pred_class = image_pred_[class_mask_ind].view(-1,7)
            #具有类别最大值的image_pred_(一个类别如狗可以有多个bbox)
            #所以这里image_pred_class有多列，然后用NMS去除冗余信息，得到最优
		
        
             #sort the detections such that the entry with the maximum objectness
             #confidence is at the top
            conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]
            image_pred_class = image_pred_class[conf_sort_index]
            #按照置信度降序排列好每个bbox
            idx = image_pred_class.size(0)

进入循环后，我们首先将不是该类别的bbox置零清除，然后按照置信度降序。

			#if nms has to be done
            if nms:
                #For each detection
                for i in range(idx):
                    #Get the IOUs of all boxes that come after the one we are looking at 
                    #in the loop
                    try:
                        ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
                    except ValueError:
                        break
        
                    except IndexError:
                        break
                    
                    #Zero out all the detections that have IoU > treshhold
                    iou_mask = (ious < nms_conf).float().unsqueeze(1)
                    image_pred_class[i+1:] *= iou_mask       
                    
                    #Remove the non-zero entries
                    non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
                    image_pred_class = image_pred_class[non_zero_ind].view(-1,7)

这里，我们使用了函数 bbox_iou。第一个输入是边界框行，这是由循环中的变量 i 索引的。bbox_iou 的第二个输入是多个边界框行构成的张量。bbox_iou 函数的输出是一个张量，其中包含通过第一个输入代表的边界框与第二个输入中的每个边界框的 IoU。
在这里插入图片描述
如果我们有 2 个同样类别的边界框且它们的 IoU 大于一个阈值，那么就去掉其中类别置信度较低的那个。我们已经对边界框进行了排序，其中有更高置信度的在上面。

 ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])

每次迭代时，如果有边界框的索引大于 i 且有大于阈值 nms_thresh 的 IoU（与索引为 i 的框），那么就去掉那个特定的框。

#Zero out all the detections that have IoU > treshhold
iou_mask = (ious < nms_conf).float().unsqueeze(1)
image_pred_class[i+1:] *= iou_mask       

#Remove the non-zero entries
non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
image_pred_class = image_pred_class[non_zero_ind]

计算 IoU

def bbox_iou(box1, box2):
    """
    Returns the IoU of two bounding boxes 


    """
    #Get the coordinates of bounding boxes
    b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3]
    b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3]

    #get the corrdinates of the intersection rectangle
    inter_rect_x1 =  torch.max(b1_x1, b2_x1)
    inter_rect_y1 =  torch.max(b1_y1, b2_y1)
    inter_rect_x2 =  torch.min(b1_x2, b2_x2)
    inter_rect_y2 =  torch.min(b1_y2, b2_y2)

    #Intersection area
    inter_area = (inter_rect_x2 - inter_rect_x1 + 1)*(inter_rect_y2 - inter_rect_y1 + 1)

    #Union Area
    b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1)

    iou = inter_area / (b1_area + b2_area - inter_area)

    return iou

这里是 bbox_iou 函数。

写出预测
write_results 函数输出一个形状为 Dx8 的张量；其中 D 是所有图像中的「真实」检测结果，每个都用一行表示。每一个检测结果都有 8 个属性，即：该检测结果所属的 batch 中图像的索引、4 个角的坐标、objectness 分数、有最大置信度的类别的分数、该类别的索引。

batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)      
            #Repeat the batch_id for as many detections of the class cls in the image
            seq = batch_ind, image_pred_class

            if not write:
                output = torch.cat(seq,1)
                write = True
            else:
                out = torch.cat(seq,1)
                output = torch.cat((output,out))