Faceboxes pytorch代码解读(二) box_utils.py（下篇）

最新推荐文章于 2024-04-04 09:35:21 发布

Faded浩

最新推荐文章于 2024-04-04 09:35:21 发布

阅读量850

点赞数

文章标签： python 算法深度学习计算机视觉 pytorch

本文链接：https://blog.csdn.net/qq_36396844/article/details/106858569

版权

Faceboxes pytorch代码解读(二) box_utils.py（下篇）

续上篇，我们继续进行box_utils.py的解读。
不得不说，对于小白来说，看别人的代码真是一件头疼的事情。遇到各种没见过的函数都要一次次去查，去搞清楚它的用法。虽然这个过程有点痛苦，但是当你突然搞明白某个函数功能、某个类的作用时，那种豁然开朗的感觉真的让人很舒服。
咳咳，来点鸡汤。所以说，不管怎么样，当你坚持做一件事情的时候，一定不要轻易放弃。即使迷茫，也要坚持下去。总有一天，你会拨开迷雾，看见湛蓝的天空和最美的太阳！

match()函数

功能：很多，如下

1.求一张图片中所有真实框和先验框的交并比IOU
2.求出与真实框最大IOU的先验框anchor位置索引best_prior_idx和IOU大小
3.求出与先验框anchor最大IOU的真实框的位置索引best_truth_idx和IOU大小
4.如果某个先验框和真实框的IOU互为最大，我们将其重叠IOU设为2
5.将每个anchor对应的最大IOU的真实框的坐标输入到一个张量 match 中，该张量 match 的维度为 [num_priors,4]，存放所有先验框对应的真实框的坐标信息
6.选择置信度小与阈值threshold的先验框anchor为背景
7.对match 按照先验框anchor和variance进行encode,完成 match 到模型输出空间的位置转换
8.存储并且返回

def match(threshold, truths, priors, variances, labels, loc_t, conf_t, idx):#(？？？)
    """Match each prior box with the ground truth box of the highest jaccard
    overlap, encode the bounding boxes, then return the matched indices
    corresponding to both confidence and location preds.
    Args:
        threshold: (float) The overlap threshold used when mathing boxes.#IOU设置的阈值
        truths: (tensor) Ground truth boxes, Shape: [num_obj, num_priors].#ground_truth
        priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].#prior_box位置
        variances: (tensor) Variances corresponding to each prior coord,#？？？看不懂
            Shape: [num_priors, 4].
        labels: (tensor) All the class labels for the image, Shape: [num_obj].#标签
        loc_t: (tensor) Tensor to be filled w/ endcoded location targets.#存放
        conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds.#存放
        idx: (int) current batch index
    Return:
        The matched indices corresponding to 1)location and 2)confidence preds.
    """
    # jaccard index
    overlaps = jaccard(            #计算ground——truth和prior_box的IOU值
        truths,                    #shape(A,B)
        point_form(priors)
    )
    # (Bipartite Matching)
    # [1,num_objects] best prior for each ground truth
    best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
    #torch.max(input,dim=,keepdim=)函数有两个返回值：最大值及索引
    #dim=1 行方向 获取每个box不同类别中的最大jaccard比
    #shape(A,1)
    valid_gt_idx = best_prior_overlap[:, 0] >= 0.2#将阈值大于0.2的保存为bool类型的数组
    best_prior_idx_filter = best_prior_idx[valid_gt_idx, :]#将所有阈值大于0.2的索引保存（小于0.2的扔掉）
    if best_prior_idx_filter.shape[0] <= 0:#如果数量为0的话，loc和conf
        loc_t[idx] = 0
        conf_t[idx] = 0
        return

    # [1,num_priors] best ground truth for each prior
    best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)#dim=0,表示每个anchor最大的jaccard shape(1,B)
    best_truth_idx.squeeze_(0)#torch.squeeze(dim=0) 如果第0维是1,则去除 shape(B)
    best_truth_overlap.squeeze_(0)#shape (B)
    best_prior_idx.squeeze_(1)#torch.squeeze(dim=1) 如果第一维是1,则取除 shape(A)
    best_prior_idx_filter.squeeze_(1)#
    best_prior_overlap.squeeze_(1)#shape(A)
    # 下面的意思是在所有与真实框相交IOU最大的anchor中，找到真实框相交最大IOU的anchor
    # 相互的意思，彼此是彼此最大，重叠设置为2
    #index_fill_(dim,index,val)按照参数index总的索引数确定的顺序，将原tensor用参数val值填充
    best_truth_overlap.index_fill_(0, best_prior_idx_filter, 2)  # ensure best prior
    # TODO refactor: index  best_prior_idx with long tensor
    # ensure every gt matches with its prior of max overlap
    # 保证每一个ground truth 匹配它的都是具有最大IOU的prior
    # 根据 best_prior_dix 锁定 best_truth_idx里面的最大IOU prior
    # num_objects次循环
    for j in range(best_prior_idx.size(0)):
        best_truth_idx[best_prior_idx[j]] = j
    # 提取出所有匹配的ground truth box
    matches = truths[best_truth_idx]          # Shape: [num_priors,4]
     # # 提取出所有GT框的类别 
    conf = labels[best_truth_idx]          # Shape: [num_priors]
    conf[best_truth_overlap < threshold] = 0  # label as background
    # 编码包围框
    loc = encode(matches, priors, variances)
    # # 保存匹配好的loc和conf到loc_t和conf_t中
    loc_t[idx] = loc    # [num_priors,4] encoded offsets to learn
    conf_t[idx] = conf  # [num_priors] top class label for each prior

decode()函数

功能：解码，将某一特定形式的数据转换成另一种形式的数据

encode就是编码的意思，和上述的解码decode是相对的，即将真实框（基于原图大小）映射到SSD的输出空间上，作为SSD的监督/标签信息，这样做的原因是为了更方便的求解损失函数。但是这里会有一个问题，如何进行匹配，即如何把真实框ground truth和先验框anchor进行对应，也就是哪些anchor负责检测某一真实框ground truth的检测？

def decode(loc, priors, variances):
    """
    将网络的输出通过anchor和ssd中独有的variances解码成（左上坐标，右下坐标）
    该解码后的坐标是基于原图大小的
    Args:
        loc (tensor): location predictions for loc layers,
            Shape: [num_priors,4]
        priors (tensor): Prior boxes in center-offset form.
            Shape: [num_priors,4].
        variances: (list[float]) Variances of priorboxes
    Return:
        decoded bounding box predictions
    """
    # variation [0.1, 0.2],
    boxes = torch.cat((
        priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:], #中心坐标的decode
        priors[:, 2:] * torch.exp(loc[:, 2:] * variances[1])), 1)
    boxes[:, :2] -= boxes[:, 2:] / 2
    boxes[:, 2:] += boxes[:, :2]
    return boxes

encode()函数

功能：decode的反向过程

def encode(matched, priors, variances):
    """
    将真实框映射到encode空间
    Encode the variances from the priorbox layers into the ground truth boxes
    we have matched (based on jaccard overlap) with the prior boxes.
    Args:
        matched: (tensor) Coords of ground truth for each prior in point-form
            Shape: [num_priors, 4].
        priors: (tensor) Prior boxes in center-offset form
            Shape: [num_priors,4].
        variances: (list[float]) Variances of priorboxes
    Return:
        encoded boxes (tensor), Shape: [num_priors, 4]
    """

    # dist b/t match center and prior's center
    g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2]
    # encode variance
    g_cxcy /= (variances[0] * priors[:, 2:])
    # match wh / prior wh
    g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
    g_wh = torch.log(g_wh) / variances[1]
    # return target for smooth_l1_loss
    return torch.cat([g_cxcy, g_wh], 1)  # [num_priors,4]

nms()函数

功能：NMS 非极大值抑制

NMS示例
原理：所谓非极大值抑制：先假设有6个矩形框，根据分类器类别分类概率做排序，从小到大分别属于车辆的概率分别为A<B<C<D<E<F。

(1) 从最大概率矩形框F开始，分别判断A、B、C、D、E与F的重叠度IOU是否大于某个设定的阈值;

(2) 假设B、D与F的重叠度超过阈值，那么就扔掉B、D；并标记第一个矩形框F，是我们保留下来的。

(3) 从剩下的矩形框A、C、E中，选择概率最大的E，然后判断A、C与E的重叠度，重叠度大于一定的阈值，那么就扔掉；并标记E是我们保留下来的第二个矩形框。

(4) 重复这个过程，找到所有被保留下来的矩形框。

def nms(boxes, scores, overlap=0.5, top_k=200):#(OK)  NMS非极大值抑制
    """Apply non-maximum suppression at test time to avoid detecting too many
    overlapping bounding boxes for a given object.
    Args:
        boxes: (tensor) The location preds for the img, Shape: [num_priors,4].
        scores: (tensor) The class predscores for the img, Shape:[num_priors].
        overlap: (float) The overlap thresh for suppressing unnecessary boxes.
        top_k: (int) The Maximum number of box preds to consider.
    Return:
        The indices of the kept boxes with respect to num_priors.
    """

    keep = torch.Tensor(scores.size(0)).fill_(0).long()#shape(num_priors)
    if boxes.numel() == 0:#torch.numel()返回元素数目
        return keep
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]
    area = torch.mul(x2 - x1, y2 - y1)#torch.mul(a, b)是矩阵a和b对应位相乘  area用来存储box的面积 shape(num_priors)
    v, idx = scores.sort(0)  # sort in ascending order  score.shape:(num_priors)
    # I = I[v >= 0.01]
    idx = idx[-top_k:]  # indices of the top-k largest vals 取scores前k个最大值的索引 shape(tok_k)
    xx1 = boxes.new()   #.new()函数 ：创建一个新的Tensor，该Tensor的type和device都和原有Tensor一致，且无内容
    yy1 = boxes.new()
    xx2 = boxes.new()
    yy2 = boxes.new()
    w = boxes.new()
    h = boxes.new()

    # keep = torch.Tensor()
    count = 0
    while idx.numel() > 0:
        i = idx[-1]  # index of current largest val #top_k个数据中最大一个的索引
        # keep.append(i)
        keep[count] = i#将当前数据放入keep
        count += 1
        if idx.size(0) == 1:#如果idx中只剩下1个数据，则跳出循环
            break
        idx = idx[:-1]  # remove kept element from view 移除idx中最大的元素的索引
        # load bboxes of next highest vals
        #取出剩余元素的x1,y1,x2,y2放在xx1,yy1,xx2,yy2中
        torch.index_select(x1, 0, idx, out=xx1)#torch.index_select dim：表示从第几维挑选数据，类型为int值；
                                               #index：表示从第一个参数维度中的哪个位置挑选数据
        torch.index_select(y1, 0, idx, out=yy1)
        torch.index_select(x2, 0, idx, out=xx2)
        torch.index_select(y2, 0, idx, out=yy2)
        # store element-wise max with next highest score
        #取得交集的四个坐标值
        xx1 = torch.clamp(xx1, min=x1[i])#torch.clamp 将输入input张量每个元素的夹紧到区间 [min,max]
        yy1 = torch.clamp(yy1, min=y1[i])
        xx2 = torch.clamp(xx2, max=x2[i])
        yy2 = torch.clamp(yy2, max=y2[i])
        w.resize_as_(xx2)
        h.resize_as_(yy2)
        #交集的宽和高
        w = xx2 - xx1
        h = yy2 - yy1
        # check sizes of xx1 and xx2.. after each iteration
        w = torch.clamp(w, min=0.0)
        h = torch.clamp(h, min=0.0)
        inter = w*h#剩余窗口与当前窗口的交集
        # IoU = i / (area(a) + area(b) - i)
        rem_areas = torch.index_select(area, 0, idx)  # load remaining area 加载剩余boxes的面积
        union = (rem_areas - inter) + area[i]#计算剩余窗口与当前窗口的并集
        IoU = inter/union  # store result in iou #存储剩余窗口与当前窗口的IOU值
        # keep only elements with an IoU <= overlap
        idx = idx[IoU.le(overlap)]#保留与当前窗口的IOU小于阈值的窗口索引
    return keep, count