目标检测笔记No.3 SSD框架的损失函数


很有幸参加Datawhale 十二月组队学习,本次笔记参考链接: 动手学CV-Pytorch第三章目标检测3.5部分,并在此基础上做出一些补充。
这里涉及论文中的名称,先验框(default boxes/prior bbox),标签框/目标框(ground truth box),预测框(prediction box). 该框架下损失函数,主要考虑三个内容分别是匹配策略、损失函数设计、在线难例挖掘。源程序通过一个Class 的结构包含了以上三部分,下面我将按我的理解,将程序分解贴于三个部分。

匹配策略

第一个原则:从ground truth box出发,寻找与每一个ground truth box有最大的jaccard overlap的prior bbox,这样就能保证每一个ground truth box一定与一个prior bbox对应起来(jaccard overlap就是IOU)。 反之,若一个prior bbox没有与任何ground truth进行匹配,那么该prior bbox只能与背景匹配,就是负样本。
第二个原则:从prior bbox出发,对剩余的还没有配对的prior bbox与任意一个ground truth box尝试配对,只要两者之间的jaccard overlap大于阈值(一般是0.5),那么该prior bbox也与这个ground truth进行匹配。这意味着某个ground truth可能与多个Prior box匹配,这是可以的。但是反过来却不可以,因为一个prior bbox只能匹配一个ground truth,如果多个ground truth与某个prior bbox的 IOU 大于阈值,那么prior bbox只与IOU最大的那个ground truth进行匹配。
注意:第二个原则一定在第一个原则之后进行。

通俗版理解:ground truth box真的有’后宫佳丽三千’的感觉,可以与多个prior box匹配;而prior box只能与一个ground truth box 匹配。

论文中的匹配部分的程序。

        # For each image
        for i in range(batch_size):
            n_objects = boxes[i].size(0)

            overlap = find_jaccard_overlap(boxes[i], self.priors_xy)  # (n_objects, 441)

            # For each prior, find the object that has the maximum overlap
            overlap_for_each_prior, object_for_each_prior = overlap.max(dim=0)  # (441)

            # We don't want a situation where an object is not represented in our positive (non-background) priors -
            # 1. An object might not be the best object for all priors, and is therefore not in object_for_each_prior.
            # 2. All priors with the object may be assigned as background based on the threshold (0.5).

            # To remedy this -
            # First, find the prior that has the maximum overlap for each object.返回与object最切合的prior的编号
            _, prior_for_each_object = overlap.max(dim=1)  # (N_o)

            # Then, assign each object to the corresponding maximum-overlap-prior. (This fixes 1.)
            object_for_each_prior[prior_for_each_object] = torch.LongTensor(range(n_objects)).to(device)  #???没有变化

            # To ensure these priors qualify, artificially give them an overlap of greater than 0.5. (This fixes 2.)
            overlap_for_each_prior[prior_for_each_object] = 1.

损失函数设计

在这里插入图片描述
目标检测包括分类问题和框的回归问题,损失函数就是两者的加权和。其中下标(conf)为置信度损失,(loc)为定位损失。N表示有N对匹配的框。
在这里插入图片描述
定位损失,这里输入smoothL1( )函数的框信息不是(x1,y1,x2,y2)也不是(cx, cy, w, h),而是进行过编码之后的信息(gcx, gcy, gw, gh) 也就是损失函数下面的公式。这里定位损失拟合的是目标框与预测框之间的变换。换句话说,是经过一系列变换之后的框的数据信息。
在这里插入图片描述
置信度损失,这里就是分类问题的损失表示,角标p表示类别,i表示第i个prior box,j表示第j个ground truth box,x表示匹配变量(自己理解的说法)。
这类补充一下,smoothL1( ),该函数的特点是①当预测框与目标框差值过大时,梯度不至于过大②当预测框与目标框差值不大时,梯度不至于大小。观察函数,我们看到小于1时,是二阶函数;而大于1时,是一阶函数。
在这里插入图片描述

   from torch import nn
   smooth_l1 = nn.L1Loss()  #直接调用   

以下是源代码中损失函数部分,我把整个类拆开来看,最后再去github 看源程序能方便理解一些。部分损失计算包含难例挖掘,分在下一部分。

        # LOCALIZATION LOSS

        # Localization loss is computed only over positive (non-background) priors
        loc_loss = self.smooth_l1(predicted_locs[positive_priors], true_locs[positive_priors])  # (), scalar

        # Note: indexing with a torch.uint8 (byte) tensor flattens the tensor when indexing is across multiple dimensions (N & 441)
        # So, if predicted_locs has the shape (N, 441, 4), predicted_locs[positive_priors] will have (total positives, 4)

        # CONFIDENCE LOSS

        # Confidence loss is computed over positive priors and the most difficult (hardest) negative priors in each image
        # That is, FOR EACH IMAGE,
        # we will take the hardest (neg_pos_ratio * n_positives) negative priors, i.e where there is maximum loss
        # This is called Hard Negative Mining - it concentrates on hardest negatives in each image, and also minimizes pos/neg imbalance

        # Number of positive and hard-negative priors per image
        n_positives = positive_priors.sum(dim=1)  # (N)
        n_hard_negatives = self.neg_pos_ratio * n_positives  # (N)

        # First, find the loss for all priors
        conf_loss_all = self.cross_entropy(predicted_scores.view(-1, n_classes), true_classes.view(-1))  # (N * 441)
        conf_loss_all = conf_loss_all.view(batch_size, n_priors)  # (N, 441)

        # We already know which priors are positive
        conf_loss_pos = conf_loss_all[positive_priors]  # (sum(n_positives))

        # Next, find which priors are hard-negative
        # To do this, sort ONLY negative priors in each image in order of decreasing loss and take top n_hard_negatives
        conf_loss_neg = conf_loss_all.clone()  # (N, 441)
        conf_loss_neg[positive_priors] = 0.  # (N, 441), positive priors are ignored (never in top n_hard_negatives)



在线难例挖掘

一般情况下negative prior bboxes数量 >> positive prior bboxes数量,直接训练会导致网络过于重视负样本,预测效果很差。为了保证正负样本尽量平衡,我们这里使用SSD使用的在线难例挖掘策略(hard negative mining),即依据confidience loss对属于负样本的prior bbox进行排序,只挑选其中confidience loss高的bbox进行训练,将正负样本的比例控制在positive:negative=1:3。

        conf_loss_neg, _ = conf_loss_neg.sort(dim=1, descending=True)  # (N, 441), sorted by decreasing hardness
        hardness_ranks = torch.LongTensor(range(n_priors)).unsqueeze(0).expand_as(conf_loss_neg).to(device)  # (N, 441)
        hard_negatives = hardness_ranks < n_hard_negatives.unsqueeze(1)  # (N, 441)
        conf_loss_hard_neg = conf_loss_neg[hard_negatives]  # (sum(n_hard_negatives))
        # As in the paper, averaged over positive priors only, although computed over both positive and hard-negative priors
        conf_loss = (conf_loss_hard_neg.sum() + conf_loss_pos.sum()) / n_positives.sum().float()  # (), scalar

        # return TOTAL LOSS
        return conf_loss + self.alpha * loc_loss

源程序格式

class MultiBoxLoss(nn.Module):
    def __init__(self, priors_cxcy, threshold=0.5, neg_pos_ratio=3, alpha=1.):
        super(MultiBoxLoss, self).__init__()
        # 数据初始化
        pass
    def forward(self, predicted_locs, predicted_scores, boxes, labels):
        # 中间包含了匹配+在线难例挖掘+损失计算
        return conf_loss + self.alpha * loc_loss

传送门:
SSD论文.
动手学CV-Pytorch第三章目标检测3.5部分.

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值