在我的一篇博客简单介绍了一些损失函数:深度学习 损失函数综述。其中里面也有涉及到一些IoU的损失函数,在本篇博客中,主要介绍IoU损失函数以及优化的IoU损失函数,同时配上代码。
IoU损失函数
IoU即分别是预测框
Y
p
r
e
d
Y_{pred}
Ypred与真实框
Y
t
r
u
e
Y_{true}
Ytrue的交并比。交并比可以反映了预测框与真实框的检测效果,还有个特性是尺度不变性,在regression任务中,判断predict box和gt的距离最直接的指标就是IoU。(满足非负性;同一性;对称性;三角不等性)。论文:UnitBox: An Advanced Ogject Detection Network。
代码实现
伪代码:
import numpy as np
def Iou(box1, box2, wh=False):
if wh == False:
xmin1, ymin1, xmax1, ymax1 = box1
xmin2, ymin2, xmax2, ymax2 = box2
else:
xmin1, ymin1 = int(box1[0]-box1[2]/2.0), int(box1[1]-box1[3]/2.0)
xmax1, ymax1 = int(box1[0]+box1[2]/2.0), int(box1[1]+box1[3]/2.0)
xmin2, ymin2 = int(box2[0]-box2[2]/2.0), int(box2[1]-box2[3]/2.0)
xmax2, ymax2 = int(box2[0]+box2[2]/2.0), int(box2[1]+box2[3]/2.0)
# 获取矩形框交集对应的左上角和右下角的坐标(intersection)
xx1 = np.max([xmin1, xmin2])
yy1 = np.max([ymin1, ymin2])
xx2 = np.min([xmax1, xmax2])
yy2 = np.min([ymax1, ymax2])
# 计算两个矩形框面积
area1 = (xmax1-xmin1) * (ymax1-ymin1)
area2 = (xmax2-xmin2) * (ymax2-ymin2)
inter_area = (np.max([0, xx2-xx1])) * (np.max([0, yy2-yy1])) #计算交集面积
iou = inter_area / (area1+area2-inter_area+1e-6) #计算交并比
return iou
def Iou_score(smooth = 1e-5, threhold = 0.5):
def _Iou_score(y_true, y_pred):
# score calculation
y_pred = backend.greater(y_pred, threhold)
y_pred = backend.cast(y_pred, backend.floatx())
intersection = backend.sum(y_true[...,:-1] * y_pred, axis=[0,1,2])
union = backend.sum(y_true[...,:-1] + y_pred, axis=[0,1,2]) - intersection
score = (intersection + smooth) / (union + smooth)
return score
return _Iou_score
GIoU
IoU作为损失函数存在这样的缺点:
- 如果预测框与真实框没有相交,则
IoU=0
,不能反映两者的距离大小(重合度),同时在Backpropagation过程中,由于loss为0,导致梯度无法更新,网络就无法学习训练。 - IoU无法精确地反映两者的重合度大小。该loss值无法直观体现出regression的效果
为了解决这个问题,Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression提出了GIoU,如下图所示,C表示的是先计算两个框的最小闭包区域面积,即同时包含了预测框和真实框的最小框面积。
GIoU具有以下特性:
- 与IoU相似,GIoU也是一种距离度量,作为损失函数的话, L G I o U = 1 − G I o U L_{GIoU} = 1-GIoU LGIoU=1−GIoU,满足损失函数的基本要求
- GIoU对scale不敏感
- GIoU是IoU的下界,在两个框无限重合的情况下,IoU=GIoU=1
- IoU取值[0,1],但GIoU有对称区间,取值范围[-1,1]。在两者重合的时候取最大值1,在两者无交集且无限远的时候取最小值-1,因此GIoU是一个非常好的距离度量指标。
- 与IoU只关注重叠区域不同,GIoU不仅关注重叠区域,还关注其他的非重合区域,能更好的反映两者的重合度。
代码实现
def Giou(rec1,rec2):
#分别是第一个矩形左右上下的坐标
x1,x2,y1,y2 = rec1
x3,x4,y3,y4 = rec2
iou = Iou(rec1,rec2)
area_C = (max(x1,x2,x3,x4)-min(x1,x2,x3,x4))*(max(y1,y2,y3,y4)-min(y1,y2,y3,y4))
area_1 = (x2-x1)*(y1-y2)
area_2 = (x4-x3)*(y3-y4)
sum_area = area_1 + area_2
w1 = x2 - x1 #第一个矩形的宽
w2 = x4 - x3 #第二个矩形的宽
h1 = y1 - y2
h2 = y3 - y4
W = min(x1,x2,x3,x4)+w1+w2-max(x1,x2,x3,x4) #交叉部分的宽
H = min(y1,y2,y3,y4)+h1+h2-max(y1,y2,y3,y4) #交叉部分的高
Area = W*H #交叉的面积
add_area = sum_area - Area #两矩形并集的面积
end_area = (area_C - add_area)/area_C #闭包区域中不属于两个框的区域占闭包区域的比重
giou = iou - end_area
return giou
DIoU(Distance-IoU)
DIoU将目标与anchor之间的距离,重叠率以及尺度都考虑进行,使得目标框回归更加稳定。可参考论文:Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
它的计算公式是:
D
I
o
U
=
I
o
U
−
ρ
2
(
b
,
b
g
t
)
c
2
DIoU = IoU - \frac{\rho^2(b, b^{gt})}{c^2}
DIoU=IoU−c2ρ2(b,bgt)其中,
b
,
b
g
t
b, b^{gt}
b,bgt分别表示预测框和真实框的中心点,且
ρ
\rho
ρ代表的是计算两个中心点的欧式距离。
c
c
c代表的是能够同时包含预测框和真实框最小闭包区域的对角线距离。
代码实现
def Diou(bboxes1, bboxes2):
rows = bboxes1.shape[0]
cols = bboxes2.shape[0]
dious = torch.zeros((rows, cols))
if rows * cols == 0:#
return dious
exchange = False
if bboxes1.shape[0] > bboxes2.shape[0]:
bboxes1, bboxes2 = bboxes2, bboxes1
dious = torch.zeros((cols, rows))
exchange = True
# #xmin,ymin,xmax,ymax->[:,0],[:,1],[:,2],[:,3]
w1 = bboxes1[:, 2] - bboxes1[:, 0]
h1 = bboxes1[:, 3] - bboxes1[:, 1]
w2 = bboxes2[:, 2] - bboxes2[:, 0]
h2 = bboxes2[:, 3] - bboxes2[:, 1]
area1 = w1 * h1
area2 = w2 * h2
center_x1 = (bboxes1[:, 2] + bboxes1[:, 0]) / 2
center_y1 = (bboxes1[:, 3] + bboxes1[:, 1]) / 2
center_x2 = (bboxes2[:, 2] + bboxes2[:, 0]) / 2
center_y2 = (bboxes2[:, 3] + bboxes2[:, 1]) / 2
inter_max_xy = torch.min(bboxes1[:, 2:],bboxes2[:, 2:])
inter_min_xy = torch.max(bboxes1[:, :2],bboxes2[:, :2])
out_max_xy = torch.max(bboxes1[:, 2:],bboxes2[:, 2:])
out_min_xy = torch.min(bboxes1[:, :2],bboxes2[:, :2])
inter = torch.clamp((inter_max_xy - inter_min_xy), min=0)
inter_area = inter[:, 0] * inter[:, 1]
inter_diag = (center_x2 - center_x1)**2 + (center_y2 - center_y1)**2
outer = torch.clamp((out_max_xy - out_min_xy), min=0)
outer_diag = (outer[:, 0] ** 2) + (outer[:, 1] ** 2)
union = area1+area2-inter_area
dious = inter_area / union - (inter_diag) / outer_diag
dious = torch.clamp(dious,min=-1.0,max = 1.0)
if exchange:
dious = dious.T
return dious
CIoU
CIoU在DIoU的基础上,CIoU考虑了长宽比以此提高它的收敛速度。CIoU的公式如下: L C I o U = 1 − I o U + ρ 2 ( b , b g t ) c 2 + a v L_{CIoU}=1-IoU+\frac{\rho^2(b, b^{gt})}{c^2}+av LCIoU=1−IoU+c2ρ2(b,bgt)+av,其中 a a a是权重函数, v v v是度量长宽比的相似性,定义为 v = 4 π 2 ( a r c t a n w g t h g t − a r c t a n w h ) 2 v=\frac{4}{\pi^2}(arctan\frac{w^{gt}}{h^{gt}}-arctan\frac{w}{h})^2 v=π24(arctanhgtwgt−arctanhw)2。
代码实现
def box_ciou(b1, b2):
"""
输入为:
----------
b1: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
b2: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
返回为:
-------
ciou: tensor, shape=(batch, feat_w, feat_h, anchor_num, 1)
"""
# 求出预测框左上角右下角
b1_xy = b1[..., :2]
b1_wh = b1[..., 2:4]
b1_wh_half = b1_wh/2.
b1_mins = b1_xy - b1_wh_half
b1_maxes = b1_xy + b1_wh_half
# 求出真实框左上角右下角
b2_xy = b2[..., :2]
b2_wh = b2[..., 2:4]
b2_wh_half = b2_wh/2.
b2_mins = b2_xy - b2_wh_half
b2_maxes = b2_xy + b2_wh_half
# 求真实框和预测框所有的iou
intersect_mins = K.maximum(b1_mins, b2_mins)
intersect_maxes = K.minimum(b1_maxes, b2_maxes)
intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
b1_area = b1_wh[..., 0] * b1_wh[..., 1]
b2_area = b2_wh[..., 0] * b2_wh[..., 1]
union_area = b1_area + b2_area - intersect_area
iou = intersect_area / K.maximum(union_area,K.epsilon())
# 计算中心的差距
center_distance = K.sum(K.square(b1_xy - b2_xy), axis=-1)
# 找到包裹两个框的最小框的左上角和右下角
enclose_mins = K.minimum(b1_mins, b2_mins)
enclose_maxes = K.maximum(b1_maxes, b2_maxes)
enclose_wh = K.maximum(enclose_maxes - enclose_mins, 0.0)
# 计算对角线距离
enclose_diagonal = K.sum(K.square(enclose_wh), axis=-1)
ciou = iou - 1.0 * (center_distance) / K.maximum(enclose_diagonal ,K.epsilon())
v = 4*K.square(tf.math.atan2(b1_wh[..., 0], K.maximum(b1_wh[..., 1],K.epsilon())) - tf.math.atan2(b2_wh[..., 0], K.maximum(b2_wh[..., 1],K.epsilon()))) / (math.pi * math.pi)
alpha = v / K.maximum((1.0 - iou + v), K.epsilon())
ciou = ciou - alpha * v
ciou = K.expand_dims(ciou, -1)
ciou = tf.where(tf.math.is_nan(ciou), tf.zeros_like(ciou), ciou)
return ciou