目标检测 - Generalized Focal Loss 基于one-stage检测器无cost涨点（改进的Focal Loss,优于RetinaNet,FCOS,ATSS等）

最新推荐文章于 2024-05-30 17:26:19 发布

西笑生

最新推荐文章于 2024-05-30 17:26:19 发布

阅读量2.1k

点赞数 3

本文链接：https://blog.csdn.net/flyfish1986/article/details/110143467

版权

目标检测专栏收录该内容

60 篇文章 118 订阅

订阅专栏

目标检测 - Generalized Focal Loss基于one-stage检测器无cost涨点

改进的Focal Loss,优于RetinaNet,FCOS,ATSS等

flyfish

《Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection》
作者实现的源码和预训练模型地址
 MMDetection实现

什么问题

FCOS v1
在这里插入图片描述
FCOS v2

FCOS v2 改进 FCOS v1 有以下几个方面

FCOS v1（FCOS: Fully Convolutional One-Stage Object Detection）
FCOS v2（FCOS: A Simple and Strong Anchor-free Object Detector）
边框回归（bounding box regression）模块模块，模型的监督信息只有四个值
x、y、w、h (中心点坐标与宽高)，
x1、y1、x2、y2(左上角点与右下角点坐标)
t、b、l、r (采样点到上下左右四条边的距离)。
损失函数，通常为Ln范数损失，如L1，L2，Smooth L1损失或者基于IoU的损失
这里坐标用的是t、b、l、r。v1的损失用的是IoU loss，v2版改成了GIoU loss
边框回归的loss是有权重的，v1正样本点的权重相等，v2版正样本点距离GT bbox中心越小，权重越大
v1版的正样本GT bbox内点(bbox = bounding box的简写)
v2版的正样本变成了一个更小的边框 GT bbox = center bbox=radius * stride
Center-ness的label v1版使用t、b、l、r计算，v2版使用IoU计算。
v1版的Center-ness是与Classification在一起的
v2版的Center-ness是与Regression在一起的

one-stage anchor-free 检测器三个基本要素的表现形式。

classification 分类
localization 边框定位
quality estimation 质量估计

作者发现的问题

在这里插入图片描述
1、分类得分（classification score）和质量评估即centerness是各自独立训练，在推理时将分类得分和质量评估相乘然后作为NMS过滤的依据。即训练时两者无关，推理时两者有关，存在gap。
2、回归任务中，质量评估是只考虑正样本，很多负样本在训练过程中是没有监督信号，造成推理是一个未定义的行为即某些负样本不可控。
改进
在这里插入图片描述

作者的解决方案

QFL（Quality Focal Loss）的提出

Focal Loss是为one-stage的检测器的分类branch服务的，它支持0或者1离散类别的label。然而，作者需要一种联合表示能够兼顾分类score和质量评估score，这样可以保证training和test一致。对于分类-质量联合表示，label变成了0~1之间的连续值。对Focal Loss在连续label上的拓展形式就是QFL（Quality Focal Loss）这样既保证了Focal Loss的平衡正负、难易样本的特性，又支持连续数值的监督。
cls score和quality score是同一个变量.

DFL（Distribution Focal Loss）的提出

bbox预测只有四个输出值，对于每一个输出值等同于优化一个狄拉克分布，狄拉克分布(下面解释该分布)太严格，缺乏对不确定度的估计。
高斯分布将模型的预测值由4个变为8个，bbox每条边的均值与方差，方差代表了不确定性的程度。
KL散度就可以用高斯分布来拟合狄拉克分布。实际的数据所满足的概率分布是任意的，因此作者想到的是用一般概率分布来建模。真实的分布通常不会距离标注的位置太远，所以又加了个loss，希望网络能够快速地聚焦到标注位置附近的数值，使得他们概率尽可能大就是Distribution Focal Loss。

wiki对于Dirac delta的描述
Dirac delta 可以松散地认为是实线上的一个函数，除了原点是无穷大外，其他地方都是零。这里注意的是Dirac delta is not a function。这只是一个启发式的描述。Dirac delta不是传统意义上的函数，因为定义在实数上的函数都不具有这些性质。Dirac delta函数可以严格定义为分布（distribution ）或度量（measure）。
$\delta(x)=\left\{\begin{array}{ll} +\infty, & x=0 \\ 0, & x \neq 0 \end{array}\right.$
并被约束以满足
$\int_{-\infty}^{\infty} \delta(x) d x=1$
可以作为measure或者distribution。
代码实现

from sympy import DiracDelta, integrate,Symbol
import numpy as np
x = Symbol('x')
print(integrate(x*DiracDelta(x-1), (x, 0, 5.0)))

Dirac delta分布图形表示
在这里插入图片描述

Generalized Focal Loss是怎么来的？

交叉熵-》Focal Loss-》Generalized Focal Loss

1. 交叉熵损失

二分类的交叉熵损失：
$\text{CE}(p, y) = \begin{cases}-\log(p) \quad &\text{if}\quad y = 1\\ -\log(1-p) &\text{otherwise}\end{cases}$
重新整理下
$p_{\mathrm{t}}=\left\{\begin{array}{ll} p & \text { if } y=1 \\ 1-p & \text { otherwise } \end{array}\right.$
$\text{CE}(p, y) = -\log(p_t)$

加上权重的交叉熵损失
用参数 $\alpha_t$ 来平衡，作为比较的baseline。 $\text{CE}(p_t) = -\alpha_t\log(p_t)$

2.Focal Loss

一个自适应调节的权重即Focal Loss。 $\gamma=2$ 时能够获得最佳的效果提升。 $\text{FL}(p_t) = -(1-p_t)^\gamma\log(p_t)$

加入平衡因子 $\alpha$ 的focal loss变种
$\text{FL}(p_t) = -{\alpha}_t(1-p_t)^\gamma\log(p_t)$
$\alpha$ 取0.25，正样本要比负样本占比小，负例易分。

3.Quality Focal Loss (QFL)

改的自适应调节权重的Focal Loss（忽略 ${\alpha}_t$ ）
$\text{FL}(p_t) = -(1-p_t)^\gamma\log(p_t)$
$\mathbf{Q F L}(\sigma)=-|y-\sigma|^{\beta}((1-y) \log (1-\sigma)+y \log (\sigma))$
魔改步骤
由于Focal Loss仅支持离散标签，为了将其思想应用到分类与定位质量结合的连续标签，对其进行了扩展。
首先将交叉熵部分 $log(p_t)$ 扩展为完整形式 $-((1-y)log(1-\sigma) + y\ log(\sigma))$ ，
其次将缩放因子 $(1-p_t)^{\gamma}$ 泛化为预测值 $\sigma$ 与连续标签 $y$ 的绝对差值 $|y-\sigma|^{\beta}$ ，
将其组合得到QFL：

得到QFL：
$\mathbf{Q F L}(\sigma)=-|y-\sigma|^{\beta}((1-y) \log (1-\sigma)+y \log (\sigma))$
$\sigma=y$ 为QFL的全局最小解。

QFL的代码实现

def quality_focal_loss(pred, target, beta=2.0):
    r"""Quality Focal Loss (QFL) is from `Generalized Focal Loss: Learning
    Qualified and Distributed Bounding Boxes for Dense Object Detection
    <https://arxiv.org/abs/2006.04388>`_.

    Args:
        pred (torch.Tensor): Predicted joint representation of classification
            and quality (IoU) estimation with shape (N, C), C is the number of
            classes.
        target (tuple([torch.Tensor])): Target category label with shape (N,)
            and target quality label with shape (N,).
        beta (float): The beta parameter for calculating the modulating factor.
            Defaults to 2.0.

    Returns:
        torch.Tensor: Loss tensor with shape (N,).
    """
    assert len(target) == 2, """target for QFL must be a tuple of two elements,
        including category label and quality label, respectively"""
    # label denotes the category id, score denotes the quality score
    label, score = target

    # negatives are supervised by 0 quality score
    pred_sigmoid = pred.sigmoid()
    scale_factor = pred_sigmoid
    zerolabel = scale_factor.new_zeros(pred.shape)
    loss = F.binary_cross_entropy_with_logits(
        pred, zerolabel, reduction='none') * scale_factor.pow(beta)

    # FG cat_id: [0, num_classes -1], BG cat_id: num_classes
    bg_class_ind = pred.size(1)
    pos = torch.nonzero((label >= 0) & (label < bg_class_ind), as_tuple=False).squeeze(1)
    pos_label = label[pos].long()
    # positives are supervised by bbox quality (IoU) score
    scale_factor = score[pos] - pred_sigmoid[pos, pos_label]
    loss[pos, pos_label] = F.binary_cross_entropy_with_logits(
        pred[pos, pos_label], score[pos],
        reduction='none') * scale_factor.abs().pow(beta)

    loss = loss.sum(dim=1, keepdim=False)
    return loss

回归的目标是当前位置到目标边界的距离。常规的方法将回归目标y建模为Dirac delta分布，Dirac delta分布 $\delta(x-y)$
满足
$\int^{+\infty}_{-\infty}\delta(x-y)dx=1$

，可通过积分的形式求得标签y：
$y=\int_{-\infty}^{+\infty} \delta(x-y) x \mathrm{~d} x$

4.Distribution Focal Loss (DFL).

作者用general分布P(x)没有其他先验替代原来的Dirac delta或者 Gaussian。

给定标签y的取值范围 $y_0, y_n]$ ，
为了与神经网络兼容，将连续区域[y_0, y_n]的积分变为离散域 $\{y_0, y_1, \cdots, y_i, y_{i+1}, \cdots, y_{n-1}, y_n \}$ 的积分，离散区域的间隔 $\Delta=1$ ，预测值 $\hat{y}$ 可表示为：
$\hat{y}=\int_{-\infty}^{+\infty} P(x) x \mathrm{~d} x=\int_{y_{0}}^{y_{n}} P(x) x \mathrm{~d} x$

P(x)可通过softmax操作 $\mathcal{S}(\cdot)$ 获得，标记为 $\mathcal{S}_i$ ，预测值 $\hat{y}$ 可使用常规的方法进行后续的学习，比如Smooth L1、IoU loss或者GIoU Loss。

同一个积分结果y可由多种不同分布所得，会降低网络学习的效率。考虑到更多的分布应该集中于回归目标y的附近，正如前面所说真实的分布通常不会距离标注的位置太远，所以又加了个loss，希望网络能够快速地聚焦到标注位置附近的数值，使得他们概率尽可能大。DFL可以强制网络提高最接近y的 $y_i$ 和 $y_{i+1}$ 的概率，所以DFL的作用是在概率shape上，用于提高学习更合理概率分布的效率。由于回归预测不涉及正负样本不平衡的问题,只针对正样本，所以DFL仅需要交叉熵部分.DFL中的 $P (x)$ 可以通过一个含n+1个单元的softmax层来实现，它是离散化的概率分布，求和为1，所以可以用softmax实现。
$\mathbf{D F L}\left(\mathcal{S}_{i}, \mathcal{S}_{i+1}\right)=-\left(\left(y_{i+1}-y\right) \log \left(\mathcal{S}_{i}\right)+\left(y-y_{i}\right) \log \left(\mathcal{S}_{i+1}\right)\right)$

DFL代码实现

@weighted_loss
def distribution_focal_loss(pred, label):
    r"""Distribution Focal Loss (DFL) is from `Generalized Focal Loss: Learning
    Qualified and Distributed Bounding Boxes for Dense Object Detection
    <https://arxiv.org/abs/2006.04388>`_.

    Args:
        pred (torch.Tensor): Predicted general distribution of bounding boxes
            (before softmax) with shape (N, n+1), n is the max value of the
            integral set `{0, ..., n}` in paper.
        label (torch.Tensor): Target distance label for bounding boxes with
            shape (N,).

    Returns:
        torch.Tensor: Loss tensor with shape (N,).
    """
    dis_left = label.long()
    dis_right = dis_left + 1
    weight_left = dis_right.float() - label
    weight_right = label - dis_left.float()
    loss = F.cross_entropy(pred, dis_left, reduction='none') * weight_left \
        + F.cross_entropy(pred, dis_right, reduction='none') * weight_right
    return loss