目标检测入门系列Task03-损失函数

文章主要内容来自于开源组织DataWhale的:动手学CV-Pytorch

一、匹配策略

我们分配了许多prior bboxes,我们要想让其预测类别和目标框信息,我们先要知道每个prior bbox和哪个目标对应,从而才能判断预测的是否准确,从而将训练进行下去。

不同方法 ground truth boxes 与 prior bboxes 的匹配策略大致都是类似的,但是细节会有所不同。这里我们采用SSD中的匹配策略,有两个原则:

  • 从ground truth box出发,寻找与每一个ground truth box有最大的jaccard overlap的prior bbox,这样就能保证每一个groundtruth box一定与一个prior bbox对应起来(jaccard overlap就是IOU)。 反之,若一个prior bbox没有与任何ground truth进行匹配,那么该prior bbox只能与背景匹配,就是负样本。
    在这里插入图片描述
    一个图片中ground truth是非常少的,而prior bbox却很多,如果仅按第一个原则匹配,很多prior bbox会是负样本,正负样本极其不平衡,所以需要第二个原则。
  • 从prior bbox出发,对剩余的还没有配对的prior bbox与任意一个ground truth box尝试配对,只要两者之间的jaccard overlap大于阈值(一般是0.5),那么该prior bbox也与这个ground truth进行匹配。这意味着某个ground truth可能与多个Prior box匹配,这是可以的。但是反过来却不可以,因为一个prior bbox只能匹配一个ground truth,如果多个ground truth与某个prior bbox的 IOU 大于阈值,那么prior bbox只与IOU最大的那个ground truth进行匹配。

注意:第二个原则一定在第一个原则之后进行,仔细考虑一下这种情况,如果某个ground truth所对应最大IOU的prior bbox小于阈值,并且所匹配的prior bbox却与另外一个ground truth的IOU大于阈值,那么该prior bbox应该匹配谁,答案应该是前者,首先要确保每个ground truth一定有一个prior bbox与之匹配。

用一个示例来说明上述的匹配原则:
在这里插入图片描述
图像中有7个红色的框代表先验框,黄色的是ground truths,在这幅图像中有三个真实的目标。按照前面列出的步骤将生成以下匹配项:
在这里插入图片描述

二、损失函数

将总体的目标损失函数定义为 定位损失(loc)和置信度损失(conf)的加权和:
在这里插入图片描述
其中N是匹配到GT(Ground Truth)的prior bbox数量,如果N=0,则将损失设为0;而 α 参数用于调整confidence loss和location loss之间的比例,默认 α=1。

confidence loss是在多类别置信度( c )上的softmax loss,公式如下:
在这里插入图片描述
其中i指代搜索框序号,j指代真实框序号,p指代类别序号,p=0表示背景。其中 x i j p = { 1 , 0 } x^{p}_{ij}=\left\{1,0\right\} xijp={1,0},中取1表示第i个prior bbox匹配到第 j 个GT box,而这个GT box的类别为 p 。 C i p C^{p}_{i} Cip表示第i个搜索框对应类别p的预测概率。此处有一点需要关注,公式前半部分是正样本(Pos)的损失,即分类为某个类别的损失(不包括背景),后半部分是负样本(Neg)的损失,也就是类别为背景的损失。

而location loss(位置回归)是典型的smooth L1 loss
在这里插入图片描述
其中,l为预测框,g为ground truth。(cx,xy)为补偿(regress to offsets)后的默认框d的中心,(w,h)为默认框的宽和高。更详细的解释看-看下图:
在这里插入图片描述

三、Hard negative mining

值得注意的是,一般情况下negative prior bboxes数量 >> positive prior bboxes数量,直接训练会导致网络过于重视负样本,预测效果很差。为了保证正负样本尽量平衡,我们这里使用SSD使用的在线难例挖掘策略(hard negative mining),即依据confidience loss对属于负样本的prior bbox进行排序,只挑选其中confidience loss高的bbox进行训练,将正负样本的比例控制在positive:negative=1:3。其核心作用就是只选择负样本中容易被分错类的困难负样本来进行网络训练,来保证正负样本的平衡和训练的有效性。

举个例子:假设在这 441 个 prior bbox 里,经过匹配后得到正样本先验框P个,负样本先验框 441−P 个。将负样本prior bbox按照prediction loss从大到小顺序排列后选择最高的M个prior bbox。这个M需要根据我们设定的正负样本的比例确定,比如我们约定正负样本比例为1:3时。我们就取M=3P,这M个loss最大的负样本难例将会被作为真正参与计算loss的prior bboxes,其余的负样本将不会参与分类损失的loss计算。

四、代码

代码在model.py里的MultiBoxLoss函数里:

class MultiBoxLoss(nn.Module):
    """
    The loss function for object detection.
    对于Loss的计算,完全遵循SSD的定义,即 MultiBox Loss
    This is a combination of:
    (1) a localization loss for the predicted locations of the boxes.
    (2) a confidence loss for the predicted class scores.
    """

    def __init__(self, priors_cxcy, threshold=0.5, neg_pos_ratio=3, alpha=1.):
        super(MultiBoxLoss, self).__init__()
        self.priors_cxcy = priors_cxcy
        self.priors_xy = cxcy_to_xy(priors_cxcy)
        self.threshold = threshold
        self.neg_pos_ratio = neg_pos_ratio
        self.alpha = alpha

        self.smooth_l1 = nn.L1Loss()
        self.cross_entropy = nn.CrossEntropyLoss(reduce=False)

    def forward(self, predicted_locs, predicted_scores, boxes, labels):
        """
        Forward propagation.
        :param predicted_locs: predicted locations/boxes w.r.t the 441 prior boxes, a tensor of dimensions (N, 441, 4)
        :param predicted_scores: class scores for each of the encoded locations/boxes, a tensor of dimensions (N, 441, n_classes)
        :param boxes: true  object bounding boxes in boundary coordinates, a list of N tensors
        :param labels: true object labels, a list of N tensors
        :return: multibox loss, a scalar
        """
        batch_size = predicted_locs.size(0)
        n_priors = self.priors_cxcy.size(0)
        n_classes = predicted_scores.size(2)

        assert n_priors == predicted_locs.size(1) == predicted_scores.size(1)

        true_locs = torch.zeros((batch_size, n_priors, 4), dtype=torch.float).to(device)  # (N, 441, 4)
        true_classes = torch.zeros((batch_size, n_priors), dtype=torch.long).to(device)  # (N, 441)

        # For each image
        for i in range(batch_size):
            n_objects = boxes[i].size(0)

            overlap = find_jaccard_overlap(boxes[i], self.priors_xy)  # (n_objects, 441)

            # For each prior, find the object that has the maximum overlap
            overlap_for_each_prior, object_for_each_prior = overlap.max(dim=0)  # (441)

            # We don't want a situation where an object is not represented in our positive (non-background) priors -
            # 1. An object might not be the best object for all priors, and is therefore not in object_for_each_prior.
            # 2. All priors with the object may be assigned as background based on the threshold (0.5).

            # To remedy this -
            # First, find the prior that has the maximum overlap for each object.
            _, prior_for_each_object = overlap.max(dim=1)  # (N_o)

            # Then, assign each object to the corresponding maximum-overlap-prior. (This fixes 1.)
            object_for_each_prior[prior_for_each_object] = torch.LongTensor(range(n_objects)).to(device)

            # To ensure these priors qualify, artificially give them an overlap of greater than 0.5. (This fixes 2.)
            overlap_for_each_prior[prior_for_each_object] = 1.

            # Labels for each prior
            label_for_each_prior = labels[i][object_for_each_prior]  # (441)
            # Set priors whose overlaps with objects are less than the threshold to be background (no object)
            label_for_each_prior[overlap_for_each_prior < self.threshold] = 0  # (441)

            # Store
            true_classes[i] = label_for_each_prior

            # Encode center-size object coordinates into the form we regressed predicted boxes to
            true_locs[i] = cxcy_to_gcxgcy(xy_to_cxcy(boxes[i][object_for_each_prior]), self.priors_cxcy)  # (441, 4)

        # Identify priors that are positive (object/non-background)
        positive_priors = true_classes != 0  # (N, 441)

        # LOCALIZATION LOSS

        # Localization loss is computed only over positive (non-background) priors
        loc_loss = self.smooth_l1(predicted_locs[positive_priors], true_locs[positive_priors])  # (), scalar

        # Note: indexing with a torch.uint8 (byte) tensor flattens the tensor when indexing is across multiple dimensions (N & 441)
        # So, if predicted_locs has the shape (N, 441, 4), predicted_locs[positive_priors] will have (total positives, 4)

        # CONFIDENCE LOSS

        # Confidence loss is computed over positive priors and the most difficult (hardest) negative priors in each image
        # That is, FOR EACH IMAGE,
        # we will take the hardest (neg_pos_ratio * n_positives) negative priors, i.e where there is maximum loss
        # This is called Hard Negative Mining - it concentrates on hardest negatives in each image, and also minimizes pos/neg imbalance

        # Number of positive and hard-negative priors per image
        n_positives = positive_priors.sum(dim=1)  # (N)
        n_hard_negatives = self.neg_pos_ratio * n_positives  # (N)

        # First, find the loss for all priors
        conf_loss_all = self.cross_entropy(predicted_scores.view(-1, n_classes), true_classes.view(-1))  # (N * 441)
        conf_loss_all = conf_loss_all.view(batch_size, n_priors)  # (N, 441)

        # We already know which priors are positive
        conf_loss_pos = conf_loss_all[positive_priors]  # (sum(n_positives))

        # Next, find which priors are hard-negative
        # To do this, sort ONLY negative priors in each image in order of decreasing loss and take top n_hard_negatives
        conf_loss_neg = conf_loss_all.clone()  # (N, 441)
        conf_loss_neg[positive_priors] = 0.  # (N, 441), positive priors are ignored (never in top n_hard_negatives)
        conf_loss_neg, _ = conf_loss_neg.sort(dim=1, descending=True)  # (N, 441), sorted by decreasing hardness
        hardness_ranks = torch.LongTensor(range(n_priors)).unsqueeze(0).expand_as(conf_loss_neg).to(device)  # (N, 441)
        hard_negatives = hardness_ranks < n_hard_negatives.unsqueeze(1)  # (N, 441)
        conf_loss_hard_neg = conf_loss_neg[hard_negatives]  # (sum(n_hard_negatives))

        # As in the paper, averaged over positive priors only, although computed over both positive and hard-negative priors
        conf_loss = (conf_loss_hard_neg.sum() + conf_loss_pos.sum()) / n_positives.sum().float()  # (), scalar

        # return TOTAL LOSS
        return conf_loss + self.alpha * loc_loss

代码的分析留在最后训练时,对整个代码的流程进行分析。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
需要学习Windows系统YOLOv4的同学请前往《Windows版YOLOv4目标检测实战:原理与源码解析》,课程链接 https://edu.csdn.net/course/detail/29865【为什么要学习这门课】 Linux创始人Linus Torvalds有一句名言:Talk is cheap. Show me the code. 冗谈不够,放码过来!  代码阅读是从基础到提高的必由之路。尤其对深度学习,许多框架隐藏了神经网络底层的实现,只能在上层调包使用,对其内部原理很难认识清晰,不利于进一步优化和创新。YOLOv4是最近推出的基于深度学习的端到端实时目标检测方法。YOLOv4的实现darknet是使用C语言开发的轻型开源深度学习框架,依赖少,可移植性好,可以作为很好的代码阅读案例,让我们深入探究其实现原理。【课程内容与收获】 本课程将解析YOLOv4的实现原理和源码,具体内容包括:- YOLOv4目标检测原理- 神经网络及darknet的C语言实现,尤其是反向传播的梯度求解和误差计算- 代码阅读工具及方法- 深度学习计算的利器:BLAS和GEMM- GPU的CUDA编程方法及在darknet的应用- YOLOv4的程序流程- YOLOv4各层及关键技术的源码解析本课程将提供注释后的darknet的源码程序文件。【相关课程】 除本课程《YOLOv4目标检测:原理与源码解析》外,本人推出了有关YOLOv4目标检测系列课程,包括:《YOLOv4目标检测实战:训练自己的数据集》《YOLOv4-tiny目标检测实战:训练自己的数据集》《YOLOv4目标检测实战:人脸口罩佩戴检测》《YOLOv4目标检测实战:中国交通标志识别》建议先学习一门YOLOv4实战课程,对YOLOv4的使用方法了解以后再学习本课程。【YOLOv4网络模型架构图】 下图由白勇老师绘制  

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值