一、写在开头
这次主要记录关于Retinaface的损失函数
部分。
下面是代码地址:
Retinaface代码地址
主要包括的脚本为:
multibox_loss.py
box_utils.py
也欢迎阅读其上一篇博客Retinaface代码记录(一)。可以帮助读者对本片博客可以有一个整体上的把握和理解。
二、主要内容
来到我们的multibox_loss.py
(见下)中,其损失函数和ssd的其实类似,只不过将landmark的损失也加上了。首先是初始化一系列参数;然后,计算损失是在forward()中,首先,获得计算得到的loc,conf,landm信息,接着初始化三个loc,conf,landm,用于下列的match()函数使用,(match()函数位于box_utils.py
脚本中,见下。)
这里我们说match函数,overlaps用来保存truth和default_box的iou,shape为[truths.shape[0],priors.shape[0]],意思是每一行保存着一个truth和所有priors的iou,下面俩max的作用一个是为每个truth匹配最好的default_box,一个是为default_box匹配最好的truth,而squeeze
的作用就是若维度为1则去除这个维度,具体见辅助页(见下)Eg1。index._fill_
的作用就是防止匹配的框因为阈值太低被过滤掉,index._fill_
用法见辅助页Eg2。继续往下,是个for循环,这里是确保每一个ground truth都能匹配到一个priors box。encode的作用,就是得到我们最终所需的目标回归值,具体利用公式见辅助页Eg3。(encode_landm类似)最后,将所得的的目标回归值分别存入loc_t,conf_t,landm_t中,可以视为match的返回值。(要是还么理解的,下面有关于这个函数的测试用例,直接放入box_utils.py
中,进行Debug理解)
然后回到我们的multibox_loss.py
中,pos1挑选出置信度大于0的用于计算landm的损失值,这里采用Smooth函数,loc同理。利用公式如Eg3中的ssd中的求loc的公式。
因为conf_data 的Shape:[batch,num_priors,num_classes],而cross_entropy的input要求为[N,C]的2-d Tensorcross_entropy的input要求为[N,C]的2-d Tensor,所以得到batch_conf 的Shape:[batchnum_priors,num_classes],然后loss(x,class)=−log[ exp(x[class])∑j exp(x[j])) ] = −x[class]+log(∑j exp(x[j])),gather含义见辅助页Eg4,作用就是计算x[class]项。接着将正样本置0,两次sort,(示例见Eg5)得到其大小排列顺序,越大的其序号越小,然后取前self.negpos_rationum_pos个负样本,用于计算loss。最终得到负样本的数量neg。
再往下就类似上面计算loc和landm的loss了,只不过采用了cross_entropy交叉熵公式。
这里Multibox就记录完了。整个代码的难点在于正负样本的挑选,以及最终标签的构造。
辅助页
:
Eg1:
In[1]: a
Out[1]:tensor([[4, 5, 3, 7, 7],
[8, 4, 5, 2, 6]])
In[2]: value,idx = a.max(1,keepdim=True)
value,idx:
Out[2]:(tensor([[7], tensor([[4],
[8]]), [0]]) )
In[3]:value.shape,idx.shape
Out[3]:torch.Size([2, 1]),torch.Size([2, 1])
In[4]:value.squeeze(1),idx.squeeze(1)
Out[4]:(tensor([7, 8]), tensor([4, 0]))
#squeeze的作用就是消除维度为1的维度,match中,可以知道
#best_truth_idx,best_truth_overlap维度0为1,best_prior_idx,best_prior_overlap维度1为1。
#到这里我们应该理解squeeze的作用了。
Eg2:
index_fill_(dim, index, val) → Tensor
In [1]: a=torch.arange(0,16).view(4,4)
In [2]: a
Out[2]:
tensor([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [3]: index = torch.tensor([0,1,2])
Out[3]: tensor([0, 1, 2])
In [52]: a.index_fill(0,index,100)
Out[52]:
tensor([[100, 100, 100, 100],
[100, 100, 100, 100],
[100, 100, 100, 100],
[ 12, 13, 14, 15]])
In [55]: a.index_fill(1,index,100)
Out[55]:
tensor([[100, 100, 100, 3],
[100, 100, 100, 7],
[100, 100, 100, 11],
[100, 100, 100, 15]])
#这里dim==0,就是按行索引,index就是后面的列号,这样一个行号一个列号,自然确定其位置了。所以要注意index的格式,要让其能找到‘位置’才行。val,就是用来填充的数。
Eg4:
torch.gather(input, dim, index, out=None) → Tensor
In[1]:b = torch.Tensor([[1,2,3],[4,5,6]])
b
Out[1]:tensor([[1., 2., 3.],
[4., 5., 6.]])
In[2]:index1 = torch.LongTensor([[0,1],[2,0]])
index2 = torch.LongTensor([[0,1,1],[0,0,0]])
index1,index2
Out[2]:(tensor([[0, 1],
[2, 0]]),
tensor([[0, 1, 1],
[0, 0, 0]]) )
In[3]: torch.gather(b, dim=1, index=index1)
torch.gather(b, dim=0, index=index2)
Out[3]:tensor([[1., 2.],
[6., 4.]])
tensor([[1., 5., 6.],
[1., 2., 3.]])
这个其实和上面那个index_fill很相似,只不过这个是给了你位置,然后从input中挑选出来。
Eg5:
b = torch.randint(low=1, high=10, size=(2,5))
b
Out[1]:
tensor([[4, 9, 7, 8, 5],
[3, 5, 9, 1, 4]])
# 现在进行第一次的sort,返回的是元素降序的对应索引
_, loss_idx = b.sort(dim=1, descending=True)
loss_idx
Out[2]:
tensor([[1, 3, 2, 4, 0],
[2, 1, 4, 0, 3]])
# 进行第二次的sort,得到原Tensor的元素按dim指定维度,排第几,索引变成了排名
_, idx_rank = loss_idx.sort(dim=1)
idx_rank
Out[3]:
tensor([[4, 0, 2, 1, 3],
[3, 1, 0, 4, 2]])
# 具体来说,可看原Tensor第一排的元素9,它是第一排(也就是按dim=1看)里面最大的,
# 所以它的排名是0,原Tensor第一排的元素4,它是第一排里面最小的,所以它的排名是4
# 当然这是以0-based的排名,且这里因为第一次sort是指定降序排列
Eg3:(相比下面论文中的公式,代码中多了平衡参数variances)
multibox_loss.py
:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from utils.box_utils import match, log_sum_exp
from data import cfg_mnet
GPU = cfg_mnet['gpu_train']
class MultiBoxLoss(nn.Module):
"""SSD Weighted Loss Function
Compute Targets:
1) Produce Confidence Target Indices by matching ground truth boxes
with (default) 'priorboxes' that have jaccard index > threshold parameter
(default threshold: 0.5).
2) Produce localization target by 'encoding' variance into offsets of ground
truth boxes and their matched 'priorboxes'.
3) Hard negative mining to filter the excessive number of negative examples
that comes with using a large number of default bounding boxes.
(default negative:positive ratio 3:1)
Objective Loss:
L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
Where, Lconf is the CrossEntropy Loss and Lloc is the SmoothL1 Loss
weighted by α which is set to 1 by cross val.
Args:
c: class confidences,
l: predicted boxes,
g: ground truth boxes
N: number of matched default boxes
See: https://arxiv.org/pdf/1512.02325.pdf for more details.
"""
def __init__(self, num_classes, overlap_thresh, prior_for_matching, bkg_label, neg_mining, neg_pos, neg_overlap, encode_target):
super(MultiBoxLoss, self).__init__()
self.num_classes = num_classes
self.threshold = overlap_thresh
self.background_label = bkg_label
self.encode_target = encode_target
self.use_prior_for_matching = prior_for_matching
self.do_neg_mining = neg_mining
self.negpos_ratio = neg_pos
self.neg_overlap = neg_overlap
self.variance = [0.1, 0.2]
def forward(self, predictions, priors, targets):
"""Multibox Loss
Args:
predictions (tuple): A tuple containing loc preds, conf preds,
and prior boxes from SSD net.
conf shape: torch.size(batch_size,num_priors,num_classes)
loc shape: torch.size(batch_size,num_priors,4)
priors shape: torch.size(num_priors,4)
ground_truth (tensor): Ground truth boxes and labels for a batch,
shape: [batch_size,num_objs,5] (last idx is the label).
"""
loc_data, conf_data, landm_data = predictions
priors = priors
num = loc_data.size(0)
num_priors = (priors.size(0))
# match priors (default boxes) and ground truth boxes
loc_t = torch.Tensor(num, num_priors, 4)
landm_t = torch.Tensor(num, num_priors, 10)
conf_t = torch.LongTensor(num, num_priors)
for idx in range(num):
truths = targets[idx][:, :4].data
labels = targets[idx][:, -1].data
landms = targets[idx][:, 4:14].data
defaults = priors.data
match(self.threshold, truths, defaults, self.variance, labels, landms, loc_t, conf_t, landm_t, idx)
if GPU:
loc_t = loc_t.cuda()
conf_t = conf_t.cuda()
landm_t = landm_t.cuda()
zeros = torch.tensor(0).cuda()
# landm Loss (Smooth L1)
# Shape: [batch,num_priors,10]
# 返回和conf_t同形状的Tensor,符合条件的为1,否则为0
pos1 = conf_t > zeros
num_pos_landm = pos1.long().sum(1, keepdim=True)
N1 = max(num_pos_landm.data.sum().float(), 1)
pos_idx1 = pos1.unsqueeze(pos1.dim()).expand_as(landm_data)
landm_p = landm_data[pos_idx1].view(-1, 10)
landm_t = landm_t[pos_idx1].view(-1, 10)
loss_landm = F.smooth_l1_loss(landm_p, landm_t, reduction='sum')
pos = conf_t != zeros
conf_t[pos] = 1
# Localization Loss (Smooth L1)
# Shape: [batch,num_priors,4]
pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)
loc_p = loc_data[pos_idx].view(-1, 4)
loc_t = loc_t[pos_idx].view(-1, 4)
loss_l = F.smooth_l1_loss(loc_p, loc_t, reduction='sum')
# Compute max conf across batch for hard negative mining
batch_conf = conf_data.view(-1, self.num_classes)
loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))
# Hard Negative Mining
loss_c[pos.view(-1, 1)] = 0 # filter out pos boxes for now
loss_c = loss_c.view(num, -1)
_, loss_idx = loss_c.sort(1, descending=True)
_, idx_rank = loss_idx.sort(1)
num_pos = pos.long().sum(1, keepdim=True)
num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1)
neg = idx_rank < num_neg.expand_as(idx_rank)
# Confidence Loss Including Positive and Negative Examples
pos_idx = pos.unsqueeze(2).expand_as(conf_data)
neg_idx = neg.unsqueeze(2).expand_as(conf_data)
conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1,self.num_classes)
targets_weighted = conf_t[(pos+neg).gt(0)]
loss_c = F.cross_entropy(conf_p, targets_weighted, reduction='sum')
# Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
N = max(num_pos.data.sum().float(), 1)
loss_l /= N
loss_c /= N
loss_landm /= N1
return loss_l, loss_c, loss_landm
box_utils.py
:
import torch
import numpy as np
def point_form(boxes):...
def center_size(boxes):...
def intersect(box_a, box_b):...
def jaccard(box_a, box_b):...
def matrix_iou(a, b):...
def matrix_iof(a, b):...
def match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx):
"""Match each prior box with the ground truth box of the highest jaccard
overlap, encode the bounding boxes, then return the matched indices
corresponding to both confidence and location preds.
Args:
threshold: (float) The overlap threshold used when mathing boxes.
truths: (tensor) Ground truth boxes, Shape: [num_obj, 4].
priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].
variances: (tensor) Variances corresponding to each prior coord,
Shape: [num_priors, 4].
labels: (tensor) All the class labels for the image, Shape: [num_obj].
landms: (tensor) Ground truth landms, Shape [num_obj, 10].
loc_t: (tensor) Tensor to be filled w/ endcoded location targets.
conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds.
landm_t: (tensor) Tensor to be filled w/ endcoded landm targets.
idx: (int) current batch index
Return:
The matched indices corresponding to 1)location 2)confidence 3)landm preds.
"""
# jaccard index
overlaps = jaccard(
truths,
point_form(priors)
)
# (Bipartite Matching)
# [1,num_objects] best prior for each ground truth
best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
# ignore hard gt
valid_gt_idx = best_prior_overlap[:, 0] >= 0.2
best_prior_idx_filter = best_prior_idx[valid_gt_idx, :]
if best_prior_idx_filter.shape[0] <= 0:
loc_t[idx] = 0
conf_t[idx] = 0
return
# [1,num_priors] best ground truth for each prior
best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
best_truth_idx.squeeze_(0)
best_truth_overlap.squeeze_(0)
best_prior_idx.squeeze_(1)
best_prior_idx_filter.squeeze_(1)
best_prior_overlap.squeeze_(1)
best_truth_overlap.index_fill_(0, best_prior_idx_filter, 2) # ensure best prior
# TODO refactor: index best_prior_idx with long tensor
# ensure every gt matches with its prior of max overlap
for j in range(best_prior_idx.size(0)): # 判别此anchor是预测哪一个boxes
best_truth_idx[best_prior_idx[j]] = j
matches = truths[best_truth_idx] # Shape: [num_priors,4] 此处为每一个anchor对应的bbox取出来
conf = labels[best_truth_idx] # Shape: [num_priors] 此处为每一个anchor对应的label取出来
conf[best_truth_overlap < threshold] = 0 # label as background overlap<0.35的全部作为负样本
loc = encode(matches, priors, variances)
matches_landm = landms[best_truth_idx]
landm = encode_landm(matches_landm, priors, variances)
loc_t[idx] = loc # [num_priors,4] encoded offsets to learn
conf_t[idx] = conf # [num_priors] top class label for each prior
landm_t[idx] = landm
def encode(matched, priors, variances):
"""Encode the variances from the priorbox layers into the ground truth boxes
we have matched (based on jaccard overlap) with the prior boxes.
Args:
matched: (tensor) Coords of ground truth for each prior in point-form
Shape: [num_priors, 4].
priors: (tensor) Prior boxes in center-offset form
Shape: [num_priors,4].
variances: (list[float]) Variances of priorboxes
Return:
encoded boxes (tensor), Shape: [num_priors, 4]
"""
# dist b/t match center and prior's center
g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2]
# encode variance
g_cxcy /= (variances[0] * priors[:, 2:])
# match wh / prior wh
g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
g_wh = torch.log(g_wh) / variances[1]
# return target for smooth_l1_loss
return torch.cat([g_cxcy, g_wh], 1) # [num_priors,4]
def encode_landm(matched, priors, variances):
"""Encode the variances from the priorbox layers into the ground truth boxes
we have matched (based on jaccard overlap) with the prior boxes.
Args:
matched: (tensor) Coords of ground truth for each prior in point-form
Shape: [num_priors, 10].
priors: (tensor) Prior boxes in center-offset form
Shape: [num_priors,4].
variances: (list[float]) Variances of priorboxes
Return:
encoded landm (tensor), Shape: [num_priors, 10]
"""
# dist b/t match center and prior's center
matched = torch.reshape(matched, (matched.size(0), 5, 2))
priors_cx = priors[:, 0].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
priors_cy = priors[:, 1].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
priors_w = priors[:, 2].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
priors_h = priors[:, 3].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2)
priors = torch.cat([priors_cx, priors_cy, priors_w, priors_h], dim=2)
g_cxcy = matched[:, :, :2] - priors[:, :, :2]
# encode variance
g_cxcy /= (variances[0] * priors[:, :, 2:])
# g_cxcy /= priors[:, :, 2:]
g_cxcy = g_cxcy.reshape(g_cxcy.size(0), -1)
# return target for smooth_l1_loss
return g_cxcy
def decode(loc, priors, variances):...
def decode_landm(pre, priors, variances):...
def log_sum_exp(x):...
def nms(boxes, scores, overlap=0.5, top_k=200):...
# if __name__ == "__main__":
# num = 2
# num_priors = 4
# threshold = 0.5
# truths = torch.tensor([[0.5,0.5,1.5,1.5],[1.,1.,1.8,1.8]])
# priors = torch.tensor([[0.8,0.8,1.2,1.2],[1.8,1.8,0.5,0.5],[1.5,1.5,0.8,0.8],[3.,3.,1.,1.]])
# variances = [0.1, 0.2]
# labels = torch.tensor([[1],[0],[1],[0]]).squeeze(1)
# landms =torch.randint(0,4,(num,10))
# landms = landms.float()
# loc_t = torch.Tensor(num, num_priors, 4)
# landm_t = torch.Tensor(num, num_priors, 10)
# conf_t = torch.LongTensor(num, num_priors)
# idx = 1
# match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx)
# t = loc_t,conf_t,landm_t
# print(t)
三、结尾
到这里就算介绍完了,下面是参考的一些博客。