Retinaface代码记录（五）(损失函数)

最新推荐文章于 2022-10-18 11:38:27 发布

Lyang-Never

最新推荐文章于 2022-10-18 11:38:27 发布

阅读量2.5k

点赞数 1

分类专栏： Pytorch框架 cv 文章标签：深度学习 pytorch

本文链接：https://blog.csdn.net/qq_37690498/article/details/104737790

版权

cv 同时被 2 个专栏收录

17 篇文章 5 订阅

订阅专栏

Pytorch框架

8 篇文章 4 订阅

订阅专栏

一、写在开头

这次主要记录关于Retinaface的损失函数部分。

下面是代码地址：
Retinaface代码地址

主要包括的脚本为：
multibox_loss.py

box_utils.py

也欢迎阅读其上一篇博客Retinaface代码记录（一）。可以帮助读者对本片博客可以有一个整体上的把握和理解。

二、主要内容

来到我们的multibox_loss.py（见下）中，其损失函数和ssd的其实类似，只不过将landmark的损失也加上了。首先是初始化一系列参数；然后，计算损失是在forward()中，首先，获得计算得到的loc，conf，landm信息，接着初始化三个loc，conf，landm，用于下列的match()函数使用，(match()函数位于box_utils.py脚本中，见下。)
这里我们说match函数，overlaps用来保存truth和default_box的iou，shape为[truths.shape[0],priors.shape[0]]，意思是每一行保存着一个truth和所有priors的iou，下面俩max的作用一个是为每个truth匹配最好的default_box，一个是为default_box匹配最好的truth，而squeeze的作用就是若维度为1则去除这个维度，具体见辅助页（见下）Eg1。index._fill_的作用就是防止匹配的框因为阈值太低被过滤掉，index._fill_用法见辅助页Eg2。继续往下，是个for循环，这里是确保每一个ground truth都能匹配到一个priors box。encode的作用，就是得到我们最终所需的目标回归值，具体利用公式见辅助页Eg3。(encode_landm类似)最后，将所得的的目标回归值分别存入loc_t，conf_t，landm_t中，可以视为match的返回值。(要是还么理解的，下面有关于这个函数的测试用例，直接放入box_utils.py中，进行Debug理解)
然后回到我们的multibox_loss.py中，pos1挑选出置信度大于0的用于计算landm的损失值，这里采用Smooth函数，loc同理。利用公式如Eg3中的ssd中的求loc的公式。
因为conf_data 的Shape:[batch,num_priors,num_classes]，而cross_entropy的input要求为[N,C]的2-d Tensorcross_entropy的input要求为[N,C]的2-d Tensor，所以得到batch_conf 的Shape:[batchnum_priors,num_classes]，然后loss(x,class)=−log[ exp(x[class])∑j exp(x[j])) ] = −x[class]+log(∑j exp(x[j]))，gather含义见辅助页Eg4，作用就是计算x[class]项。接着将正样本置0，两次sort，(示例见Eg5)得到其大小排列顺序，越大的其序号越小，然后取前self.negpos_rationum_pos个负样本，用于计算loss。最终得到负样本的数量neg。
再往下就类似上面计算loc和landm的loss了，只不过采用了cross_entropy交叉熵公式。
这里Multibox就记录完了。整个代码的难点在于正负样本的挑选，以及最终标签的构造。
辅助页：

Eg1:

In[1]: a
Out[1]:tensor([[4, 5, 3, 7, 7],
        	   [8, 4, 5, 2, 6]])
        
In[2]: value,idx = a.max(1,keepdim=True)
	   value,idx:
Out[2]:(tensor([[7],  tensor([[4],
         [8]]),        [0]])  )
         
In[3]:value.shape,idx.shape
Out[3]:torch.Size([2, 1]),torch.Size([2, 1])  

In[4]:value.squeeze(1),idx.squeeze(1)
Out[4]:(tensor([7, 8]), tensor([4, 0]))
#squeeze的作用就是消除维度为1的维度，match中，可以知道
#best_truth_idx，best_truth_overlap维度0为1，best_prior_idx，best_prior_overlap维度1为1。
#到这里我们应该理解squeeze的作用了。

Eg2:
index_fill_(dim, index, val) → Tensor

In [1]: a=torch.arange(0,16).view(4,4)

In [2]: a
Out[2]:
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])

In [3]: index = torch.tensor([0,1,2])
Out[3]: tensor([0, 1, 2])

In [52]: a.index_fill(0,index,100)
Out[52]:
tensor([[100, 100, 100, 100],
        [100, 100, 100, 100],
        [100, 100, 100, 100],
        [ 12,  13,  14,  15]])

In [55]: a.index_fill(1,index,100)
Out[55]:
tensor([[100, 100, 100,   3],
        [100, 100, 100,   7],
        [100, 100, 100,  11],
        [100, 100, 100,  15]])
#这里dim==0，就是按行索引，index就是后面的列号，这样一个行号一个列号，自然确定其位置了。所以要注意index的格式，要让其能找到‘位置’才行。val，就是用来填充的数。

Eg4:
torch.gather(input, dim, index, out=None) → Tensor

In[1]:b = torch.Tensor([[1,2,3],[4,5,6]])
	  b
Out[1]:tensor([[1., 2., 3.],
        	   [4., 5., 6.]])

In[2]:index1 = torch.LongTensor([[0,1],[2,0]])
	  index2 = torch.LongTensor([[0,1,1],[0,0,0]])
	  index1,index2
Out[2]:(tensor([[0, 1],
        		[2, 0]]),
        tensor([[0, 1, 1],
        		[0, 0, 0]]) )
In[3]: torch.gather(b, dim=1, index=index1)
	   torch.gather(b, dim=0, index=index2)
Out[3]:tensor([[1., 2.],
        	   [6., 4.]])
       tensor([[1., 5., 6.],
        	   [1., 2., 3.]])
这个其实和上面那个index_fill很相似，只不过这个是给了你位置，然后从input中挑选出来。

Eg5:

b = torch.randint(low=1, high=10, size=(2,5))
b
Out[1]: 
tensor([[4, 9, 7, 8, 5],
        [3, 5, 9, 1, 4]])
# 现在进行第一次的sort，返回的是元素降序的对应索引
_, loss_idx = b.sort(dim=1, descending=True)
loss_idx
Out[2]: 
tensor([[1, 3, 2, 4, 0],
        [2, 1, 4, 0, 3]])
# 进行第二次的sort，得到原Tensor的元素按dim指定维度，排第几，索引变成了排名
_, idx_rank = loss_idx.sort(dim=1)
idx_rank
Out[3]: 
tensor([[4, 0, 2, 1, 3],
        [3, 1, 0, 4, 2]])
# 具体来说，可看原Tensor第一排的元素9，它是第一排（也就是按dim=1看）里面最大的，
# 所以它的排名是0，原Tensor第一排的元素4，它是第一排里面最小的，所以它的排名是4
# 当然这是以0-based的排名，且这里因为第一次sort是指定降序排列

Eg3:(相比下面论文中的公式，代码中多了平衡参数variances)
在这里插入图片描述

multibox_loss.py：

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from utils.box_utils import match, log_sum_exp
from data import cfg_mnet
GPU = cfg_mnet['gpu_train']

class MultiBoxLoss(nn.Module):
    """SSD Weighted Loss Function
    Compute Targets:
        1) Produce Confidence Target Indices by matching  ground truth boxes
           with (default) 'priorboxes' that have jaccard index > threshold parameter
           (default threshold: 0.5).
        2) Produce localization target by 'encoding' variance into offsets of ground
           truth boxes and their matched  'priorboxes'.
        3) Hard negative mining to filter the excessive number of negative examples
           that comes with using a large number of default bounding boxes.
           (default negative:positive ratio 3:1)
    Objective Loss:
        L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
        Where, Lconf is the CrossEntropy Loss and Lloc is the SmoothL1 Loss
        weighted by α which is set to 1 by cross val.
        Args:
            c: class confidences,
            l: predicted boxes,
            g: ground truth boxes
            N: number of matched default boxes
        See: https://arxiv.org/pdf/1512.02325.pdf for more details.
    """

    def __init__(self, num_classes, overlap_thresh, prior_for_matching, bkg_label, neg_mining, neg_pos, neg_overlap, encode_target):
        super(MultiBoxLoss, self).__init__()
        self.num_classes = num_classes
        self.threshold = overlap_thresh
        self.background_label = bkg_label
        self.encode_target = encode_target
        self.use_prior_for_matching = prior_for_matching
        self.do_neg_mining = neg_mining
        self.negpos_ratio = neg_pos
        self.neg_overlap = neg_overlap
        self.variance = [0.1, 0.2]

    def forward(self, predictions, priors, targets):
        """Multibox Loss
        Args:
            predictions (tuple): A tuple containing loc preds, conf preds,
            and prior boxes from SSD net.
                conf shape: torch.size(batch_size,num_priors,num_classes)
                loc shape: torch.size(batch_size,num_priors,4)
                priors shape: torch.size(num_priors,4)

            ground_truth (tensor): Ground truth boxes and labels for a batch,
                shape: [batch_size,num_objs,5] (last idx is the label).
        """

        loc_data, conf_data, landm_data = predictions
        priors = priors
        num = loc_data.size(0)
        num_priors = (priors.size(0))

        # match priors (default boxes) and ground truth boxes
        loc_t = torch.Tensor(num, num_priors, 4)
        landm_t = torch.Tensor(num, num_priors, 10)
        conf_t = torch.LongTensor(num, num_priors)
        for idx in range(num):
            truths = targets[idx][:, :4].data
            labels = targets[idx][:, -1].data
            landms = targets[idx][:, 4:14].data
            defaults = priors.data
            match(self.threshold, truths, defaults, self.variance, labels, landms, loc_t, conf_t, landm_t, idx)
        if GPU:
            loc_t = loc_t.cuda()
            conf_t = conf_t.cuda()
            landm_t = landm_t.cuda()

        zeros = torch.tensor(0).cuda()
        # landm Loss (Smooth L1)
        # Shape: [batch,num_priors,10]
        # 返回和conf_t同形状的Tensor,符合条件的为1,否则为0
        pos1 = conf_t > zeros
        num_pos_landm = pos1.long().sum(1, keepdim=True)
        N1 = max(num_pos_landm.data.sum().float(), 1)
        pos_idx1 = pos1.unsqueeze(pos1.dim()).expand_as(landm_data)
        
        landm_p = landm_data[pos_idx1].view(-1, 10)
        landm_t = landm_t[pos_idx1].view(-1, 10)
        loss_landm = F.smooth_l1_loss(landm_p, landm_t, reduction='sum')


        pos = conf_t != zeros
        conf_t[pos] = 1

        # Localization Loss (Smooth L1)
        # Shape: [batch,num_priors,4]
        pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)
        loc_p = loc_data[pos_idx].view(-1, 4)
        loc_t = loc_t[pos_idx].view(-1, 4)
        loss_l = F.smooth_l1_loss(loc_p, loc_t, reduction='sum')

        # Compute max conf across batch for hard negative mining
        batch_conf = conf_data.view(-1, self.num_classes)
        loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))

        # Hard Negative Mining
        loss_c[pos.view(-1, 1)] = 0 # filter out pos boxes for now
        loss_c = loss_c.view(num, -1)
        _, loss_idx = loss_c.sort(1, descending=True)
        _, idx_rank = loss_idx.sort(1)
        num_pos = pos.long().sum(1, keepdim=True)
        num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1)
        neg = idx_rank < num_neg.expand_as(idx_rank)

        # Confidence Loss Including Positive and Negative Examples
        pos_idx = pos.unsqueeze(2).expand_as(conf_data)
        neg_idx = neg.unsqueeze(2).expand_as(conf_data)
        conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1,self.num_classes)
        targets_weighted = conf_t[(pos+neg).gt(0)]
        loss_c = F.cross_entropy(conf_p, targets_weighted, reduction='sum')

        # Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
        N = max(num_pos.data.sum().float(), 1)
        loss_l /= N
        loss_c /= N
        loss_landm /= N1

        return loss_l, loss_c, loss_landm

box_utils.py：

import torch
import numpy as np


def point_form(boxes):...

def center_size(boxes):...
    
def intersect(box_a, box_b):...
    
def jaccard(box_a, box_b):...
   
def matrix_iou(a, b):...

def matrix_iof(a, b):...

def match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx):
    """Match each prior box with the ground truth box of the highest jaccard
    overlap, encode the bounding boxes, then return the matched indices
    corresponding to both confidence and location preds.
    Args:
        threshold: (float) The overlap threshold used when mathing boxes.
        truths: (tensor) Ground truth boxes, Shape: [num_obj, 4].
        priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].
        variances: (tensor) Variances corresponding to each prior coord,
            Shape: [num_priors, 4].
        labels: (tensor) All the class labels for the image, Shape: [num_obj].
        landms: (tensor) Ground truth landms, Shape [num_obj, 10].
        loc_t: (tensor) Tensor to be filled w/ endcoded location targets.
        conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds.
        landm_t: (tensor) Tensor to be filled w/ endcoded landm targets.
        idx: (int) current batch index
    Return:
        The matched indices corresponding to 1)location 2)confidence 3)landm preds.
    """
    # jaccard index
    overlaps = jaccard(
        truths,
        point_form(priors)
    )
    # (Bipartite Matching)
    # [1,num_objects] best prior for each ground truth
    best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)

    # ignore hard gt
    valid_gt_idx = best_prior_overlap[:, 0] >= 0.2
    best_prior_idx_filter = best_prior_idx[valid_gt_idx, :]
    if best_prior_idx_filter.shape[0] <= 0:
        loc_t[idx] = 0
        conf_t[idx] = 0
        return

    # [1,num_priors] best ground truth for each prior
    best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
    best_truth_idx.squeeze_(0)
    best_truth_overlap.squeeze_(0)
    best_prior_idx.squeeze_(1)
    best_prior_idx_filter.squeeze_(1)
    best_prior_overlap.squeeze_(1)
    best_truth_overlap.index_fill_(0, best_prior_idx_filter, 2)  # ensure best prior
    # TODO refactor: index  best_prior_idx with long tensor
    # ensure every gt matches with its prior of max overlap
    for j in range(best_prior_idx.size(0)):     # 判别此anchor是预测哪一个boxes
        best_truth_idx[best_prior_idx[j]] = j
    matches = truths[best_truth_idx]            # Shape: [num_priors,4] 此处为每一个anchor对应的bbox取出来
    conf = labels[best_truth_idx]               # Shape: [num_priors]      此处为每一个anchor对应的label取出来
    conf[best_truth_overlap < threshold] = 0    # label as background   overlap<0.35的全部作为负样本
    loc = encode(matches, priors, variances)

    matches_landm = landms[best_truth_idx]
    landm = encode_landm(matches_landm, priors, variances)
    loc_t[idx] = loc    # [num_priors,4] encoded offsets to learn
    conf_t[idx] = conf  # [num_priors] top class label for each prior
    landm_t[idx] = landm


def encode(matched, priors, variances):
    """Encode the variances from the priorbox layers into the ground truth boxes
    we have matched (based on jaccard overlap) with the prior boxes.
    Args:
        matched: (tensor) Coords of ground truth for each prior in point-form
            Shape: [num_priors, 4].
        priors: (tensor) Prior boxes in center-offset form
            Shape: [num_priors,4].
        variances: (list[float]) Variances of priorboxes
    Return:
        encoded boxes (tensor), Shape: [num_priors, 4]
    """

    # dist b/t match center and prior's center
    g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2]
    # encode variance
    g_cxcy /= (variances[0] * priors[:, 2:])
    # match wh / prior wh
    g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
    g_wh = torch.log(g_wh) / variances[1]
    # return target for smooth_l1_loss
    return torch.cat([g_cxcy, g_wh], 1)  # [num_priors,4]

def encode_landm(matched, priors, variances):
    """Encode the variances from the priorbox layers into the ground truth boxes
    we have matched (based on jaccard overlap) with the prior boxes.
    Args:
        matched: (tensor) Coords of ground truth for each prior in point-form
            Shape: [num_priors, 10].
        priors: (tensor) Prior boxes in center-offset form
            Shape: [num_priors,4].
        variances: (list[float]) Variances of priorboxes
    Return:
        encoded landm (tensor), Shape: [num_priors, 10]
    """

    # dist b/t match center and prior's center
    matched = torch.reshape(matched, (matched.size(0), 5, 2))
    priors_cx = priors[:, 0].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
    priors_cy = priors[:, 1].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
    priors_w = priors[:, 2].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
    priors_h = priors[:, 3].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
    priors = torch.cat([priors_cx, priors_cy, priors_w, priors_h], dim=2)
    g_cxcy = matched[:, :, :2] - priors[:, :, :2]
    # encode variance
    g_cxcy /= (variances[0] * priors[:, :, 2:])
    # g_cxcy /= priors[:, :, 2:]
    g_cxcy = g_cxcy.reshape(g_cxcy.size(0), -1)
    # return target for smooth_l1_loss
    return g_cxcy


def decode(loc, priors, variances):...

def decode_landm(pre, priors, variances):...
   
def log_sum_exp(x):...
  
def nms(boxes, scores, overlap=0.5, top_k=200):...


# if __name__ == "__main__":
#     num = 2
#     num_priors = 4
#     threshold = 0.5
#     truths = torch.tensor([[0.5,0.5,1.5,1.5],[1.,1.,1.8,1.8]])
#     priors = torch.tensor([[0.8,0.8,1.2,1.2],[1.8,1.8,0.5,0.5],[1.5,1.5,0.8,0.8],[3.,3.,1.,1.]])
#     variances = [0.1, 0.2]
#     labels = torch.tensor([[1],[0],[1],[0]]).squeeze(1)
#     landms =torch.randint(0,4,(num,10))
#     landms = landms.float()
#     loc_t = torch.Tensor(num, num_priors, 4)
#     landm_t = torch.Tensor(num, num_priors, 10)
#     conf_t = torch.LongTensor(num, num_priors)
#     idx = 1
#     match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx)
#     t = loc_t,conf_t,landm_t
#     print(t)