Retinaface代码记录(五)(损失函数)

一、写在开头

这次主要记录关于Retinaface的损失函数部分。

下面是代码地址:
Retinaface代码地址

主要包括的脚本为:
multibox_loss.py

box_utils.py

也欢迎阅读其上一篇博客Retinaface代码记录(一)。可以帮助读者对本片博客可以有一个整体上的把握和理解。

二、主要内容

来到我们的multibox_loss.py(见下)中,其损失函数和ssd的其实类似,只不过将landmark的损失也加上了。首先是初始化一系列参数;然后,计算损失是在forward()中,首先,获得计算得到的loc,conf,landm信息,接着初始化三个loc,conf,landm,用于下列的match()函数使用,(match()函数位于box_utils.py脚本中,见下。)
这里我们说match函数,overlaps用来保存truth和default_box的iou,shape为[truths.shape[0],priors.shape[0]],意思是每一行保存着一个truth和所有priors的iou,下面俩max的作用一个是为每个truth匹配最好的default_box,一个是为default_box匹配最好的truth,而squeeze的作用就是若维度为1则去除这个维度,具体见辅助页(见下)Eg1。index._fill_的作用就是防止匹配的框因为阈值太低被过滤掉,index._fill_用法见辅助页Eg2。继续往下,是个for循环,这里是确保每一个ground truth都能匹配到一个priors box。encode的作用,就是得到我们最终所需的目标回归值,具体利用公式见辅助页Eg3。(encode_landm类似)最后,将所得的的目标回归值分别存入loc_t,conf_t,landm_t中,可以视为match的返回值。(要是还么理解的,下面有关于这个函数的测试用例,直接放入box_utils.py中,进行Debug理解)
然后回到我们的multibox_loss.py中,pos1挑选出置信度大于0的用于计算landm的损失值,这里采用Smooth函数,loc同理。利用公式如Eg3中的ssd中的求loc的公式。
因为conf_data 的Shape:[batch,num_priors,num_classes],而cross_entropy的input要求为[N,C]的2-d Tensorcross_entropy的input要求为[N,C]的2-d Tensor,所以得到batch_conf 的Shape:[batchnum_priors,num_classes],然后loss(x,class)=−log[ exp(x[class])∑j exp(x[j])) ] = −x[class]+log(∑j exp(x[j])),gather含义见辅助页Eg4,作用就是计算x[class]项。接着将正样本置0,两次sort,(示例见Eg5)得到其大小排列顺序,越大的其序号越小,然后取前self.negpos_rationum_pos个负样本,用于计算loss。最终得到负样本的数量neg。
再往下就类似上面计算loc和landm的loss了,只不过采用了cross_entropy交叉熵公式。
这里Multibox就记录完了。整个代码的难点在于正负样本的挑选,以及最终标签的构造。
辅助页

Eg1:

In[1]: a
Out[1]:tensor([[4, 5, 3, 7, 7],
        	   [8, 4, 5, 2, 6]])
        
In[2]: value,idx = a.max(1,keepdim=True)
	   value,idx:
Out[2]:(tensor([[7],  tensor([[4],
         [8]]),        [0]])  )
         
In[3]:value.shape,idx.shape
Out[3]:torch.Size([2, 1]),torch.Size([2, 1])  

In[4]:value.squeeze(1),idx.squeeze(1)
Out[4]:(tensor([7, 8]), tensor([4, 0]))
#squeeze的作用就是消除维度为1的维度,match中,可以知道
#best_truth_idx,best_truth_overlap维度0为1,best_prior_idx,best_prior_overlap维度1为1。
#到这里我们应该理解squeeze的作用了。

Eg2:
index_fill_(dim, index, val) → Tensor

In [1]: a=torch.arange(0,16).view(4,4)

In [2]: a
Out[2]:
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])

In [3]: index = torch.tensor([0,1,2])
Out[3]: tensor([0, 1, 2])

In [52]: a.index_fill(0,index,100)
Out[52]:
tensor([[100, 100, 100, 100],
        [100, 100, 100, 100],
        [100, 100, 100, 100],
        [ 12,  13,  14,  15]])

In [55]: a.index_fill(1,index,100)
Out[55]:
tensor([[100, 100, 100,   3],
        [100, 100, 100,   7],
        [100, 100, 100,  11],
        [100, 100, 100,  15]])
#这里dim==0,就是按行索引,index就是后面的列号,这样一个行号一个列号,自然确定其位置了。所以要注意index的格式,要让其能找到‘位置’才行。val,就是用来填充的数。

Eg4:
torch.gather(input, dim, index, out=None) → Tensor

In[1]:b = torch.Tensor([[1,2,3],[4,5,6]])
	  b
Out[1]:tensor([[1., 2., 3.],
        	   [4., 5., 6.]])

In[2]:index1 = torch.LongTensor([[0,1],[2,0]])
	  index2 = torch.LongTensor([[0,1,1],[0,0,0]])
	  index1,index2
Out[2]:(tensor([[0, 1],
        		[2, 0]]),
        tensor([[0, 1, 1],
        		[0, 0, 0]]) )
In[3]: torch.gather(b, dim=1, index=index1)
	   torch.gather(b, dim=0, index=index2)
Out[3]:tensor([[1., 2.],
        	   [6., 4.]])
       tensor([[1., 5., 6.],
        	   [1., 2., 3.]])
这个其实和上面那个index_fill很相似,只不过这个是给了你位置,然后从input中挑选出来。

Eg5:

b = torch.randint(low=1, high=10, size=(2,5))
b
Out[1]: 
tensor([[4, 9, 7, 8, 5],
        [3, 5, 9, 1, 4]])
# 现在进行第一次的sort,返回的是元素降序的对应索引
_, loss_idx = b.sort(dim=1, descending=True)
loss_idx
Out[2]: 
tensor([[1, 3, 2, 4, 0],
        [2, 1, 4, 0, 3]])
# 进行第二次的sort,得到原Tensor的元素按dim指定维度,排第几,索引变成了排名
_, idx_rank = loss_idx.sort(dim=1)
idx_rank
Out[3]: 
tensor([[4, 0, 2, 1, 3],
        [3, 1, 0, 4, 2]])
# 具体来说,可看原Tensor第一排的元素9,它是第一排(也就是按dim=1看)里面最大的,
# 所以它的排名是0,原Tensor第一排的元素4,它是第一排里面最小的,所以它的排名是4
# 当然这是以0-based的排名,且这里因为第一次sort是指定降序排列

Eg3:(相比下面论文中的公式,代码中多了平衡参数variances)
在这里插入图片描述

multibox_loss.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from utils.box_utils import match, log_sum_exp
from data import cfg_mnet
GPU = cfg_mnet['gpu_train']

class MultiBoxLoss(nn.Module):
    """SSD Weighted Loss Function
    Compute Targets:
        1) Produce Confidence Target Indices by matching  ground truth boxes
           with (default) 'priorboxes' that have jaccard index > threshold parameter
           (default threshold: 0.5).
        2) Produce localization target by 'encoding' variance into offsets of ground
           truth boxes and their matched  'priorboxes'.
        3) Hard negative mining to filter the excessive number of negative examples
           that comes with using a large number of default bounding boxes.
           (default negative:positive ratio 3:1)
    Objective Loss:
        L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
        Where, Lconf is the CrossEntropy Loss and Lloc is the SmoothL1 Loss
        weighted by α which is set to 1 by cross val.
        Args:
            c: class confidences,
            l: predicted boxes,
            g: ground truth boxes
            N: number of matched default boxes
        See: https://arxiv.org/pdf/1512.02325.pdf for more details.
    """

    def __init__(self, num_classes, overlap_thresh, prior_for_matching, bkg_label, neg_mining, neg_pos, neg_overlap, encode_target):
        super(MultiBoxLoss, self).__init__()
        self.num_classes = num_classes
        self.threshold = overlap_thresh
        self.background_label = bkg_label
        self.encode_target = encode_target
        self.use_prior_for_matching = prior_for_matching
        self.do_neg_mining = neg_mining
        self.negpos_ratio = neg_pos
        self.neg_overlap = neg_overlap
        self.variance = [0.1, 0.2]

    def forward(self, predictions, priors, targets):
        """Multibox Loss
        Args:
            predictions (tuple): A tuple containing loc preds, conf preds,
            and prior boxes from SSD net.
                conf shape: torch.size(batch_size,num_priors,num_classes)
                loc shape: torch.size(batch_size,num_priors,4)
                priors shape: torch.size(num_priors,4)

            ground_truth (tensor): Ground truth boxes and labels for a batch,
                shape: [batch_size,num_objs,5] (last idx is the label).
        """

        loc_data, conf_data, landm_data = predictions
        priors = priors
        num = loc_data.size(0)
        num_priors = (priors.size(0))

        # match priors (default boxes) and ground truth boxes
        loc_t = torch.Tensor(num, num_priors, 4)
        landm_t = torch.Tensor(num, num_priors, 10)
        conf_t = torch.LongTensor(num, num_priors)
        for idx in range(num):
            truths = targets[idx][:, :4].data
            labels = targets[idx][:, -1].data
            landms = targets[idx][:, 4:14].data
            defaults = priors.data
            match(self.threshold, truths, defaults, self.variance, labels, landms, loc_t, conf_t, landm_t, idx)
        if GPU:
            loc_t = loc_t.cuda()
            conf_t = conf_t.cuda()
            landm_t = landm_t.cuda()

        zeros = torch.tensor(0).cuda()
        # landm Loss (Smooth L1)
        # Shape: [batch,num_priors,10]
        # 返回和conf_t同形状的Tensor,符合条件的为1,否则为0
        pos1 = conf_t > zeros
        num_pos_landm = pos1.long().sum(1, keepdim=True)
        N1 = max(num_pos_landm.data.sum().float(), 1)
        pos_idx1 = pos1.unsqueeze(pos1.dim()).expand_as(landm_data)
        
        landm_p = landm_data[pos_idx1].view(-1, 10)
        landm_t = landm_t[pos_idx1].view(-1, 10)
        loss_landm = F.smooth_l1_loss(landm_p, landm_t, reduction='sum')


        pos = conf_t != zeros
        conf_t[pos] = 1

        # Localization Loss (Smooth L1)
        # Shape: [batch,num_priors,4]
        pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)
        loc_p = loc_data[pos_idx].view(-1, 4)
        loc_t = loc_t[pos_idx].view(-1, 4)
        loss_l = F.smooth_l1_loss(loc_p, loc_t, reduction='sum')

        # Compute max conf across batch for hard negative mining
        batch_conf = conf_data.view(-1, self.num_classes)
        loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))

        # Hard Negative Mining
        loss_c[pos.view(-1, 1)] = 0 # filter out pos boxes for now
        loss_c = loss_c.view(num, -1)
        _, loss_idx = loss_c.sort(1, descending=True)
        _, idx_rank = loss_idx.sort(1)
        num_pos = pos.long().sum(1, keepdim=True)
        num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1)
        neg = idx_rank < num_neg.expand_as(idx_rank)

        # Confidence Loss Including Positive and Negative Examples
        pos_idx = pos.unsqueeze(2).expand_as(conf_data)
        neg_idx = neg.unsqueeze(2).expand_as(conf_data)
        conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1,self.num_classes)
        targets_weighted = conf_t[(pos+neg).gt(0)]
        loss_c = F.cross_entropy(conf_p, targets_weighted, reduction='sum')

        # Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
        N = max(num_pos.data.sum().float(), 1)
        loss_l /= N
        loss_c /= N
        loss_landm /= N1

        return loss_l, loss_c, loss_landm

box_utils.py

import torch
import numpy as np


def point_form(boxes):...

def center_size(boxes):...
    
def intersect(box_a, box_b):...
    
def jaccard(box_a, box_b):...
   
def matrix_iou(a, b):...

def matrix_iof(a, b):...

def match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx):
    """Match each prior box with the ground truth box of the highest jaccard
    overlap, encode the bounding boxes, then return the matched indices
    corresponding to both confidence and location preds.
    Args:
        threshold: (float) The overlap threshold used when mathing boxes.
        truths: (tensor) Ground truth boxes, Shape: [num_obj, 4].
        priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].
        variances: (tensor) Variances corresponding to each prior coord,
            Shape: [num_priors, 4].
        labels: (tensor) All the class labels for the image, Shape: [num_obj].
        landms: (tensor) Ground truth landms, Shape [num_obj, 10].
        loc_t: (tensor) Tensor to be filled w/ endcoded location targets.
        conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds.
        landm_t: (tensor) Tensor to be filled w/ endcoded landm targets.
        idx: (int) current batch index
    Return:
        The matched indices corresponding to 1)location 2)confidence 3)landm preds.
    """
    # jaccard index
    overlaps = jaccard(
        truths,
        point_form(priors)
    )
    # (Bipartite Matching)
    # [1,num_objects] best prior for each ground truth
    best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)

    # ignore hard gt
    valid_gt_idx = best_prior_overlap[:, 0] >= 0.2
    best_prior_idx_filter = best_prior_idx[valid_gt_idx, :]
    if best_prior_idx_filter.shape[0] <= 0:
        loc_t[idx] = 0
        conf_t[idx] = 0
        return

    # [1,num_priors] best ground truth for each prior
    best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
    best_truth_idx.squeeze_(0)
    best_truth_overlap.squeeze_(0)
    best_prior_idx.squeeze_(1)
    best_prior_idx_filter.squeeze_(1)
    best_prior_overlap.squeeze_(1)
    best_truth_overlap.index_fill_(0, best_prior_idx_filter, 2)  # ensure best prior
    # TODO refactor: index  best_prior_idx with long tensor
    # ensure every gt matches with its prior of max overlap
    for j in range(best_prior_idx.size(0)):     # 判别此anchor是预测哪一个boxes
        best_truth_idx[best_prior_idx[j]] = j
    matches = truths[best_truth_idx]            # Shape: [num_priors,4] 此处为每一个anchor对应的bbox取出来
    conf = labels[best_truth_idx]               # Shape: [num_priors]      此处为每一个anchor对应的label取出来
    conf[best_truth_overlap < threshold] = 0    # label as background   overlap<0.35的全部作为负样本
    loc = encode(matches, priors, variances)

    matches_landm = landms[best_truth_idx]
    landm = encode_landm(matches_landm, priors, variances)
    loc_t[idx] = loc    # [num_priors,4] encoded offsets to learn
    conf_t[idx] = conf  # [num_priors] top class label for each prior
    landm_t[idx] = landm


def encode(matched, priors, variances):
    """Encode the variances from the priorbox layers into the ground truth boxes
    we have matched (based on jaccard overlap) with the prior boxes.
    Args:
        matched: (tensor) Coords of ground truth for each prior in point-form
            Shape: [num_priors, 4].
        priors: (tensor) Prior boxes in center-offset form
            Shape: [num_priors,4].
        variances: (list[float]) Variances of priorboxes
    Return:
        encoded boxes (tensor), Shape: [num_priors, 4]
    """

    # dist b/t match center and prior's center
    g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2]
    # encode variance
    g_cxcy /= (variances[0] * priors[:, 2:])
    # match wh / prior wh
    g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
    g_wh = torch.log(g_wh) / variances[1]
    # return target for smooth_l1_loss
    return torch.cat([g_cxcy, g_wh], 1)  # [num_priors,4]

def encode_landm(matched, priors, variances):
    """Encode the variances from the priorbox layers into the ground truth boxes
    we have matched (based on jaccard overlap) with the prior boxes.
    Args:
        matched: (tensor) Coords of ground truth for each prior in point-form
            Shape: [num_priors, 10].
        priors: (tensor) Prior boxes in center-offset form
            Shape: [num_priors,4].
        variances: (list[float]) Variances of priorboxes
    Return:
        encoded landm (tensor), Shape: [num_priors, 10]
    """

    # dist b/t match center and prior's center
    matched = torch.reshape(matched, (matched.size(0), 5, 2))
    priors_cx = priors[:, 0].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
    priors_cy = priors[:, 1].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
    priors_w = priors[:, 2].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
    priors_h = priors[:, 3].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
    priors = torch.cat([priors_cx, priors_cy, priors_w, priors_h], dim=2)
    g_cxcy = matched[:, :, :2] - priors[:, :, :2]
    # encode variance
    g_cxcy /= (variances[0] * priors[:, :, 2:])
    # g_cxcy /= priors[:, :, 2:]
    g_cxcy = g_cxcy.reshape(g_cxcy.size(0), -1)
    # return target for smooth_l1_loss
    return g_cxcy


def decode(loc, priors, variances):...

def decode_landm(pre, priors, variances):...
   
def log_sum_exp(x):...
  
def nms(boxes, scores, overlap=0.5, top_k=200):...


# if __name__ == "__main__":
#     num = 2
#     num_priors = 4
#     threshold = 0.5
#     truths = torch.tensor([[0.5,0.5,1.5,1.5],[1.,1.,1.8,1.8]])
#     priors = torch.tensor([[0.8,0.8,1.2,1.2],[1.8,1.8,0.5,0.5],[1.5,1.5,0.8,0.8],[3.,3.,1.,1.]])
#     variances = [0.1, 0.2]
#     labels = torch.tensor([[1],[0],[1],[0]]).squeeze(1)
#     landms =torch.randint(0,4,(num,10))
#     landms = landms.float()
#     loc_t = torch.Tensor(num, num_priors, 4)
#     landm_t = torch.Tensor(num, num_priors, 10)
#     conf_t = torch.LongTensor(num, num_priors)
#     idx = 1
#     match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx)
#     t = loc_t,conf_t,landm_t
#     print(t)

三、结尾

到这里就算介绍完了,下面是参考的一些博客。

ps:
参考一
参考二
参考三

  • 1
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,关于 Pytorch-RetinaFace代码讲解,我可以给您提供以下信息: 1. RetinaFace 模型介绍 RetinaFace 是一个用于人脸检测和人脸识别的深度学习模型,其核心思想是在多个尺度下进行特征提取,并利用类似于 Anchor 的方式进行边框预测。RetinaFace 模型的核心网络结构为 ResNet-50,通过 ResNet-50 提取特征,再分别在不同的特征图上进行预测,从而提高检测的准确率和速度。 2. Pytorch-RetinaFace 代码结构 Pytorch-RetinaFace代码结构主要包含以下几个文件: - data/ 目录:包含了数据集相关的代码和数据集文件; - models/ 目录:包含了 RetinaFace 模型的代码和模型文件; - layers/ 目录:包含了 RetinaFace 模型中用到的自定义层代码; - utils/ 目录:包含了一些工具类和函数; - train.py:训练脚本; - test.py:测试脚本; - demo.py:演示脚本。 3. RetinaFace 模型训练 RetinaFace 模型的训练主要包含以下几个步骤: - 数据集准备:将数据集按照指定格式进行划分和预处理; - 模型构建:使用 Pytorch 搭建 RetinaFace 模型,并定义损失函数和优化器; - 模型训练:使用训练集对模型进行训练,并在验证集上进行验证和调参; - 模型保存:将训练好的模型保存到指定的路径。 4. RetinaFace 模型测试 RetinaFace 模型的测试主要包含以下几个步骤: - 加载模型:使用 Pytorch 加载训练好的模型; - 图像预处理:将待检测的图像进行预处理,包括大小调整和归一化等; - 特征提取:使用 ResNet-50 提取图像的特征; - 预测边框:在不同的特征图上进行边框预测,并进行 NMS 处理; - 绘制结果:将预测出的边框和置信度绘制在原图上。 以上就是关于 Pytorch-RetinaFace 代码的讲解,希望能够对您有所帮助。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值