一、背景:
- 现有基于IoU的边界框回归方法主要通过添加新的损失项来加速收敛,忽略了IoU损失项本身的局限性,且在不同检测器和检测任务中不能自我调整,泛化性不强。
- 通过分析边界框回归模型,
inner_iou
论文中发现区分不同的回归样本,并使用不同尺度的辅助边界框来计算损失,可以有效加速边界框回归过程。对于高IoU样本,使用较小的辅助边界框计算损失可加速收敛,而较大的辅助边界框适用于低IoU样本。
本文将YOLOv11
默认的CIoU
损失函数修改成inner_IoU
、inner_GIoU
、inner_DIoU
、inner_CIoU
、inner_EIoU
、inner_SIoU
。
文章目录
二、原理
Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box
2.1 Inner - IoU计算原理
- 定义相关参数:
- 真实(GT)框和锚点分别表示为 B g t B^{gt} Bgt和 B B B。
- GT框和内GT框的中心点表示为 ( x c g t , y c g t ) (x_{c}^{gt}, y_{c}^{gt}) (xcgt,ycgt),锚点和内锚点的中心点表示为 ( x c , y c ) (x_{c}, y_{c}) (xc,yc)。
- GT框的宽度和高度表示为 w g t w^{gt} wgt和 h g t h^{gt} hgt,锚点的宽度和高度表示为 w w w和 h h h。
- 引入比例因子
ratio
。
- 根据以下公式计算辅助边界框的坐标:
- b l g t = x c g t − w g t ∗ r a t i o 2 b_{l}^{g t} = x_{c}^{g t} - \frac{w^{g t} * ratio}{2} blgt=xcgt−2wgt∗ratio, b r g t = x c g t + w g t ∗ r a t i o 2 b_{r}^{g t} = x_{c}^{g t} + \frac{w^{g t} * ratio}{2} brgt=xcgt+2wgt∗ratio
- b t g t = y c g t − h g t ∗ r a t i o 2 b_{t}^{g t} = y_{c}^{g t} - \frac{h^{g t} * ratio}{2} btgt=ycgt−2hgt∗ratio, b b g t = y c g t + h g t ∗ r a t i o 2 b_{b}^{g t} = y_{c}^{g t} + \frac{h^{g t} * ratio}{2} bbgt=ycgt+2hgt∗ratio
- b l = x c − w ∗ r a t i o 2 b_{l} = x_{c} - \frac{w * ratio}{2} bl=xc−2w∗ratio, b r = x c + w ∗ r a t i o 2 b_{r} = x_{c} + \frac{w * ratio}{2} br=xc+2w∗ratio
- b t = y c − h ∗ r a t i o 2 b_{t} = y_{c} - \frac{h * ratio}{2} bt=yc−2h∗ratio, b b = y c + h ∗ r a t i o 2 b_{b} = y_{c} + \frac{h * ratio}{2} bb=yc+2h∗ratio
- 计算交并比:
- i n t e r = ( m i n ( b r g t , b r ) − m a x ( b l g t , b l ) ) ∗ ( m i n ( b b g t , b b ) − m a x ( b t g t , b t ) ) inter = (min(b_{r}^{g t}, b_{r}) - max(b_{l}^{g t}, b_{l})) * (min(b_{b}^{g t}, b_{b}) - max(b_{t}^{g t}, b_{t})) inter=(min(brgt,br)−max(blgt,bl))∗(min(bbgt,bb)−max(btgt,bt))
- u n i o n = ( w g t ∗ h g t ) ∗ ( r a t i o ) 2 + ( w ∗ h ) ∗ ( r a t i o ) 2 − i n t e r union = (w^{g t} * h^{g t}) * (ratio)^{2} + (w * h) * (ratio)^{2} - inter union=(wgt∗hgt)∗(ratio)2+(w∗h)∗(ratio)2−inter
- I o U i n n e r = i n t e r u n i o n IoU^{inner} = \frac{inter}{union} IoUinner=unioninter
Inner - IoU
损失的计算公式为: L I n n e r − I o U = 1 − I o U i n n e r L_{Inner - IoU} = 1 - IoU^{inner} LInner−IoU=1−IoUinner- 将
Inner - IoU
应用于现有基于IoU的边界框回归损失函数,得到:- L I n n e r − G I o U = L G I o U + I o U − I o U i n n e r L_{Inner - GIoU} = L_{GIoU} + IoU - IoU^{inner} LInner−GIoU=LGIoU+IoU−IoUinner
- L I n n e r − D I o U = L D I o U + I o U − I o U i n n e r L_{Inner - DIoU} = L_{DIoU} + IoU - IoU^{inner} LInner−DIoU=LDIoU+IoU−IoUinner
- L I n n e r − C I o U = L C I o U + I o U − I o U i n n e r L_{Inner - CIoU} = L_{CIoU} + IoU - IoU^{inner} LInner−CIoU=LCIoU+IoU−IoUinner
- L I n n e r − E I o U = L E I o U + I o U − I o U i n n e r L_{Inner - EIoU} = L_{EIoU} + IoU - IoU^{inner} LInner−EIoU=LEIoU+IoU−IoUinner
- L I n n e r − S I o U = L S I o U + I o U − I o U i n n e r L_{Inner - SIoU} = L_{SIoU} + IoU - IoU^{inner} LInner−SIoU=LSIoU+IoU−IoUinner
根据文章内容,在Inner - IoU
损失中,比例因子ratio
通常在 [0.5, 1.5] 范围内进行调整。
对于高IoU样本,为了加速其回归,将比例因子设置为小于1的值,使用较小的辅助边界框计算损失。例如在模拟实验中,为加速高IoU样本的回归,将比例因子ratio设置为0.8。
对于低IoU样本,为了加速其回归过程,将比例因子设置为大于1的值,使用较大的辅助边界框计算损失。例如在模拟实验中,低IoU回归样本场景中,将比例因子ratio设置为1.2。
2.2 优势
- 与IoU损失相比,当比例小于1且辅助边界框尺寸小于实际边界框时,回归的有效范围小于IoU损失,但梯度的绝对值大于从IoU损失获得的梯度,能够加速高IoU样本的收敛。
- 当比例大于1时,较大规模的辅助边界框扩大了回归的有效范围,增强了低IoU样本回归的效果。
- 通过一系列模拟和对比实验,验证了该方法在检测性能和泛化能力方面优于现有方法,对于不同像素大小的数据集都能达到较好的效果。
- 不仅适用于一般检测任务,对于目标非常小的检测任务也表现良好,证实了该方法的泛化性。
论文:https://arxiv.org/abs/2311.02877
源码:https://github.com/malagoutou/Inner-IoU
三、添加步骤
3.1 utils\metrics.py
此处需要查看的文件是ultralytics/utils/metrics.py
metrics.py
中定义了模型的损失函数和计算方法,我们想要加入新的损失函数就只需要将代码放到这个文件内即可
将Inner - IoU
的代码添加到metrics.py
中,如下:
def get_inner_iou(box1, box2, xywh=True, eps=1e-7, ratio=0.7):
if xywh: # transform from xywh to xyxy
(x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
inner_b1_x1, inner_b1_x2, inner_b1_y1, inner_b1_y2 = x1 - w1_* ratio, x1 + w1_ * ratio, y1 - h1_ * ratio, y1 + h1_ * ratio
inner_b2_x1, inner_b2_x2, inner_b2_y1, inner_b2_y2 = x2 - w2_* ratio, x2 + w2_ * ratio, y2 - h2_ * ratio, y2 + h2_ * ratio
else: # x1, y1, x2, y2 = box1
b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)
b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)
w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
# Intersection area
inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp_(0) * \
(b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp_(0)
# Union Area
union = w1 * h1 * ratio * ratio + w2 * h2 * ratio * ratio - inter + eps
return inter / union
def bbox_inner_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, EIoU=False, SIoU=False, eps=1e-7, ratio=0.7):
"""
Calculate Intersection over Union (IoU) of box1(1, 4) to box2(n, 4).
Args:
box1 (torch.Tensor): A tensor representing a single bounding box with shape (1, 4).
box2 (torch.Tensor): A tensor representing n bounding boxes with shape (n, 4).
xywh (bool, optional): If True, input boxes are in (x, y, w, h) format. If False, input boxes are in
(x1, y1, x2, y2) format. Defaults to True.
GIoU (bool, optional): If True, calculate Generalized IoU. Defaults to False.
DIoU (bool, optional): If True, calculate Distance IoU. Defaults to False.
CIoU (bool, optional): If True, calculate Complete IoU. Defaults to False.
EIoU (bool, optional): If True, calculate Efficient IoU. Defaults to False.
SIoU (bool, optional): If True, calculate Scylla IoU. Defaults to False.
eps (float, optional): A small value to avoid division by zero. Defaults to 1e-7.
Returns:
(torch.Tensor): IoU, GIoU, DIoU, or CIoU values depending on the specified flags.
"""
# Get the coordinates of bounding boxes
if xywh: # transform from xywh to xyxy
(x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
else: # x1, y1, x2, y2 = box1
b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)
b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)
w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
innner_iou = get_inner_iou(box1, box2, xywh=xywh, ratio=ratio)
# Intersection area
inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp_(0) * \
(b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp_(0)
# Union Area
union = w1 * h1 + w2 * h2 - inter + eps
# IoU
iou = inter / union
if CIoU or DIoU or GIoU or EIoU or SIoU:
cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1) # convex (smallest enclosing box) width
ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1) # convex height
if CIoU or DIoU or EIoU or SIoU: # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
c2 = cw ** 2 + ch ** 2 + eps # convex diagonal squared
rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4 # center dist ** 2
if CIoU: # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)
with torch.no_grad():
alpha = v / (v - iou + (1 + eps))
return innner_iou - (rho2 / c2 + v * alpha) # CIoU
elif EIoU:
rho_w2 = ((b2_x2 - b2_x1) - (b1_x2 - b1_x1)) ** 2
rho_h2 = ((b2_y2 - b2_y1) - (b1_y2 - b1_y1)) ** 2
cw2 = cw ** 2 + eps
ch2 = ch ** 2 + eps
return innner_iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2) # EIoU
elif SIoU:
# SIoU Loss https://arxiv.org/pdf/2205.12740.pdf
s_cw = (b2_x1 + b2_x2 - b1_x1 - b1_x2) * 0.5 + eps
s_ch = (b2_y1 + b2_y2 - b1_y1 - b1_y2) * 0.5 + eps
sigma = torch.pow(s_cw ** 2 + s_ch ** 2, 0.5)
sin_alpha_1 = torch.abs(s_cw) / sigma
sin_alpha_2 = torch.abs(s_ch) / sigma
threshold = pow(2, 0.5) / 2
sin_alpha = torch.where(sin_alpha_1 > threshold, sin_alpha_2, sin_alpha_1)
angle_cost = torch.cos(torch.arcsin(sin_alpha) * 2 - math.pi / 2)
rho_x = (s_cw / cw) ** 2
rho_y = (s_ch / ch) ** 2
gamma = angle_cost - 2
distance_cost = 2 - torch.exp(gamma * rho_x) - torch.exp(gamma * rho_y)
omiga_w = torch.abs(w1 - w2) / torch.max(w1, w2)
omiga_h = torch.abs(h1 - h2) / torch.max(h1, h2)
shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), 4) + torch.pow(1 - torch.exp(-1 * omiga_h), 4)
return innner_iou - 0.5 * (distance_cost + shape_cost) + eps # SIoU
return innner_iou - rho2 / c2 # DIoU
c_area = cw * ch + eps # convex area
return innner_iou - (c_area - union) / c_area # GIoU https://arxiv.org/pdf/1902.09630.pdf
return innner_iou # IoU
3.2 修改ultralytics/utils/loss.py
utils\loss.py
用于计算各种损失。
在ultralytics/utils/loss.py
在的引用中添加bbox_inner_iou
,然后在BboxLoss
函数内修改如下代码,使模型调用此bbox_inner_iou
损失函数。
3.2.1 Inner_CIou
iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, CIoU=True)
3.2.2 Inner_GIou
iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, GIoU=True)
3.2.3 Inner_DIou
iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, DIoU=True)
3.2.4 Inner_EIou
iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, EIoU=True)
3.2.5 Inner_SIou
iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, SIoU=True)
3.3 修改ultralytics/utils/tal.py
tal.py
中是一些损失函数的功能应用。
在ultralytics/utils/tal.py
在的引用中添加bbox_inner_iou
,然后在iou_calculation
函数内修改如下代码,使模型调用此bbox_inner_iou
损失函数。
此处仅以Inner_CIou
为例:
四、成功运行截图
五、总结
为了弥补现有 IoU 损失在不同检测任务中泛化性弱和收敛速度慢的问题,·Inner-IoU·通过引入比例因子 “ratio” 来控制辅助边界框的尺度大小,利用不同尺度的辅助边界框来计算损失,从而加速边界框回归过程。