目标检测笔记No.3 SSD框架的损失函数

最新推荐文章于 2023-11-06 16:50:15 发布

Sunny:)

最新推荐文章于 2023-11-06 16:50:15 发布

阅读量906

点赞数

分类专栏：目标检测或识别文章标签：深度学习 pytorch

本文链接：https://blog.csdn.net/charry_win/article/details/111550341

版权

目标检测或识别专栏收录该内容

9 篇文章 0 订阅

订阅专栏

SSD框架的损失函数

匹配策略
损失函数设计
在线难例挖掘
源程序格式

很有幸参加Datawhale 十二月组队学习，本次笔记参考链接: 动手学CV-Pytorch第三章目标检测3.5部分，并在此基础上做出一些补充。
这里涉及论文中的名称，先验框(default boxes/prior bbox)，标签框/目标框(ground truth box),预测框(prediction box). 该框架下损失函数，主要考虑三个内容分别是匹配策略、损失函数设计、在线难例挖掘。源程序通过一个Class 的结构包含了以上三部分，下面我将按我的理解，将程序分解贴于三个部分。

匹配策略

第一个原则：从ground truth box出发，寻找与每一个ground truth box有最大的jaccard overlap的prior bbox，这样就能保证每一个ground truth box一定与一个prior bbox对应起来(jaccard overlap就是IOU)。反之，若一个prior bbox没有与任何ground truth进行匹配，那么该prior bbox只能与背景匹配，就是负样本。
第二个原则：从prior bbox出发，对剩余的还没有配对的prior bbox与任意一个ground truth box尝试配对，只要两者之间的jaccard overlap大于阈值（一般是0.5），那么该prior bbox也与这个ground truth进行匹配。这意味着某个ground truth可能与多个Prior box匹配，这是可以的。但是反过来却不可以，因为一个prior bbox只能匹配一个ground truth，如果多个ground truth与某个prior bbox的 IOU 大于阈值，那么prior bbox只与IOU最大的那个ground truth进行匹配。
注意：第二个原则一定在第一个原则之后进行。

通俗版理解：ground truth box真的有’后宫佳丽三千’的感觉，可以与多个prior box匹配；而prior box只能与一个ground truth box 匹配。

论文中的匹配部分的程序。

        # For each image
        for i in range(batch_size):
            n_objects = boxes[i].size(0)

            overlap = find_jaccard_overlap(boxes[i], self.priors_xy)  # (n_objects, 441)

            # For each prior, find the object that has the maximum overlap
            overlap_for_each_prior, object_for_each_prior = overlap.max(dim=0)  # (441)

            # We don't want a situation where an object is not represented in our positive (non-background) priors -
            # 1. An object might not be the best object for all priors, and is therefore not in object_for_each_prior.
            # 2. All priors with the object may be assigned as background based on the threshold (0.5).

            # To remedy this -
            # First, find the prior that has the maximum overlap for each object.返回与object最切合的prior的编号
            _, prior_for_each_object = overlap.max(dim=1)  # (N_o)

            # Then, assign each object to the corresponding maximum-overlap-prior. (This fixes 1.)
            object_for_each_prior[prior_for_each_object] = torch.LongTensor(range(n_objects)).to(device)  #？？？没有变化

            # To ensure these priors qualify, artificially give them an overlap of greater than 0.5. (This fixes 2.)
            overlap_for_each_prior[prior_for_each_object] = 1.

损失函数设计

在这里插入图片描述
目标检测包括分类问题和框的回归问题，损失函数就是两者的加权和。其中下标（conf）为置信度损失，（loc）为定位损失。N表示有N对匹配的框。

定位损失，这里输入smoothL1( )函数的框信息不是(x1,y1,x2,y2)也不是（cx, cy, w, h），而是进行过编码之后的信息(gcx, gcy, gw, gh) 也就是损失函数下面的公式。这里定位损失拟合的是目标框与预测框之间的变换。换句话说，是经过一系列变换之后的框的数据信息。
在这里插入图片描述
置信度损失，这里就是分类问题的损失表示，角标p表示类别，i表示第i个prior box，j表示第j个ground truth box，x表示匹配变量(自己理解的说法)。
这类补充一下，smoothL1( )，该函数的特点是①当预测框与目标框差值过大时，梯度不至于过大②当预测框与目标框差值不大时，梯度不至于大小。观察函数，我们看到小于1时，是二阶函数；而大于1时，是一阶函数。
在这里插入图片描述

   from torch import nn
   smooth_l1 = nn.L1Loss()  #直接调用

以下是源代码中损失函数部分，我把整个类拆开来看，最后再去github 看源程序能方便理解一些。部分损失计算包含难例挖掘，分在下一部分。

        # LOCALIZATION LOSS

        # Localization loss is computed only over positive (non-background) priors
        loc_loss = self.smooth_l1(predicted_locs[positive_priors], true_locs[positive_priors])  # (), scalar

        # Note: indexing with a torch.uint8 (byte) tensor flattens the tensor when indexing is across multiple dimensions (N & 441)
        # So, if predicted_locs has the shape (N, 441, 4), predicted_locs[positive_priors] will have (total positives, 4)

        # CONFIDENCE LOSS

        # Confidence loss is computed over positive priors and the most difficult (hardest) negative priors in each image
        # That is, FOR EACH IMAGE,
        # we will take the hardest (neg_pos_ratio * n_positives) negative priors, i.e where there is maximum loss
        # This is called Hard Negative Mining - it concentrates on hardest negatives in each image, and also minimizes pos/neg imbalance

        # Number of positive and hard-negative priors per image
        n_positives = positive_priors.sum(dim=1)  # (N)
        n_hard_negatives = self.neg_pos_ratio * n_positives  # (N)

        # First, find the loss for all priors
        conf_loss_all = self.cross_entropy(predicted_scores.view(-1, n_classes), true_classes.view(-1))  # (N * 441)
        conf_loss_all = conf_loss_all.view(batch_size, n_priors)  # (N, 441)

        # We already know which priors are positive
        conf_loss_pos = conf_loss_all[positive_priors]  # (sum(n_positives))

        # Next, find which priors are hard-negative
        # To do this, sort ONLY negative priors in each image in order of decreasing loss and take top n_hard_negatives
        conf_loss_neg = conf_loss_all.clone()  # (N, 441)
        conf_loss_neg[positive_priors] = 0.  # (N, 441), positive priors are ignored (never in top n_hard_negatives)

在线难例挖掘

一般情况下negative prior bboxes数量 >> positive prior bboxes数量，直接训练会导致网络过于重视负样本，预测效果很差。为了保证正负样本尽量平衡，我们这里使用SSD使用的在线难例挖掘策略(hard negative mining)，即依据confidience loss对属于负样本的prior bbox进行排序，只挑选其中confidience loss高的bbox进行训练，将正负样本的比例控制在positive：negative=1:3。

        conf_loss_neg, _ = conf_loss_neg.sort(dim=1, descending=True)  # (N, 441), sorted by decreasing hardness
        hardness_ranks = torch.LongTensor(range(n_priors)).unsqueeze(0).expand_as(conf_loss_neg).to(device)  # (N, 441)
        hard_negatives = hardness_ranks < n_hard_negatives.unsqueeze(1)  # (N, 441)
        conf_loss_hard_neg = conf_loss_neg[hard_negatives]  # (sum(n_hard_negatives))
        # As in the paper, averaged over positive priors only, although computed over both positive and hard-negative priors
        conf_loss = (conf_loss_hard_neg.sum() + conf_loss_pos.sum()) / n_positives.sum().float()  # (), scalar

        # return TOTAL LOSS
        return conf_loss + self.alpha * loc_loss

源程序格式

class MultiBoxLoss(nn.Module):
    def __init__(self, priors_cxcy, threshold=0.5, neg_pos_ratio=3, alpha=1.):
        super(MultiBoxLoss, self).__init__()
        # 数据初始化
        pass
    def forward(self, predicted_locs, predicted_scores, boxes, labels):
        # 中间包含了匹配+在线难例挖掘+损失计算
        return conf_loss + self.alpha * loc_loss

传送门:
SSD论文.
动手学CV-Pytorch第三章目标检测3.5部分.

Sunny:)

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
目标检测笔记No.3 SSD框架的损失函数

SSD框架的损失函数匹配策略损失函数设计在线难例挖掘源程序格式很有幸参加Datawhale 十二月组队学习，本次笔记参考链接: 动手学CV-Pytorch第三章目标检测3.5部分，并在此基础上做出一些补充。这里涉及论文中的名称，先验框(default boxes/prior bbox)，标签框/目标框(ground truth box),预测框(prediction box). 该框架下损失函数，主要考虑三个内容分别是匹配策略、损失函数设计、在线难例挖掘。源程序通过一个Class 的结构包含了以上三部分
复制链接

扫一扫

专栏目录