Focal loss理解

最新推荐文章于 2024-08-21 18:05:55 发布

被自己蠢哭了

最新推荐文章于 2024-08-21 18:05:55 发布

阅读量1.4k

点赞数

分类专栏：深度学习

本文链接：https://blog.csdn.net/jicong44/article/details/84573432

版权

深度学习专栏收录该内容

34 篇文章 0 订阅

订阅专栏

首先上论文：https://arxiv.org/abs/1708.02002

CE

先从一般的二分类交叉熵（Cross Entropy，CE）开始：

$CE(p,y)=\left\{\begin{matrix} -log(p) &if y=1 \\ log(p)&otherwise \end{matrix}\right.$

为了方便书写，定义一个 $p_t$ :(这里 $p_t$ 的含义是预测的越准确， $p_t$ 越大)

$p_t=\left\{\begin{matrix} p &if y=1 \\ 1-p&otherwise \end{matrix}\right.$

这样 $C E (p, y)$ 就可以简写为 $CE(p_t)$ = $log(p_t)$
很明显，这个公式在样本不均衡的情况下会出问题，例如在正负样本不均衡时（正：负=1：3000），就导致loss绝大部分是由负样本贡献的，会将正阳本淹没。

$\alpha$ CE

有一种方法是在前面添加一个参数 $\alpha$ ，添加之后的公式为：
$CE(p_t)=-\alpha_t\log(p_t)$
当样本为正时, $\alpha_t$ = $\alpha$ ，当样本为负时 $\alpha_t$ = $1-\alpha$ 通常情况下 $\alpha$ 可以在训练时由正阳本比例的反数（负样本比例）给出，也可以作为超参数给出。这可以一定程度上减轻样本不均衡问题。

FL

Focal Loss是以上面的 $CE(p_t)$ 为基准进行比较的，Focal Loss解决的一个关键问题是针对难/易样本，而 $CE(p_t)$ 仅针对正/负样本。
Focal Loss的公式：
$FL(p_t)=-(1-p_t)^\gamma\log(p_t)$
预测的越准确（容易样本）， $p_t$ 越大， $1-p_t$ 越小，对总loss的贡献越小。
在实际应用中，还加入了一个超参数 $\alpha_t$ :
$FL(p_t)=-\alpha_t(1-p_t)^\gamma\log(p_t)$
$\alpha_t$ 于上面的功能类似，调节样本不均衡问题。通过实验发现 $\alpha_t$ =0.25 $,\gamma$ =2时，效果最好。（但奇怪的是 $\alpha_t$ =0.25是相当于减弱正阳本的比重，增强负样本的比重，可能是由于FL对负样本的抑制作用过于强烈 ? 。
下面是Keras版本的FL实现：

def focal(alpha=0.25, gamma=2.0):
    """ Create a functor for computing the focal loss.

    Args
        alpha: Scale the focal weight with alpha.
        gamma: Take the power of the focal weight with gamma.

    Returns
        A functor that computes the focal loss using the alpha and gamma.
    """
    def _focal(y_true, y_pred):
        """ Compute the focal loss given the target tensor and the predicted tensor.

        As defined in https://arxiv.org/abs/1708.02002

        Args
            y_true: Tensor of target data from the generator with shape (B, N, num_classes).
            y_pred: Tensor of predicted data from the network with shape (B, N, num_classes).

        Returns
            The focal loss of y_pred w.r.t. y_true.
        """
        labels         = y_true[:, :, :-1] # 获取真实的label
        anchor_state   = y_true[:, :, -1]  # 获取anchor的状态，-1 for ignore, 0 for background, 1 for object
        classification = y_pred # 预测的结果，shape和labels相同。

        # filter out "ignore" anchors
        indices        = backend.where(keras.backend.not_equal(anchor_state, -1))
        labels         = backend.gather_nd(labels, indices)
        classification = backend.gather_nd(classification, indices)

        # compute the focal loss
        alpha_factor = keras.backend.ones_like(labels) * alpha # 创建一个和labels的shape相同的全为alpha大小的张量。
        alpha_factor = backend.where(keras.backend.equal(labels, 1), alpha_factor, 1 - alpha_factor) # 判断作用，equal条件满足选第二个，不满足选第三个。
        focal_weight = backend.where(keras.backend.equal(labels, 1), 1 - classification, classification) 
        # 计算focal的权重。
        focal_weight = alpha_factor * focal_weight ** gamma

        cls_loss = focal_weight * keras.backend.binary_crossentropy(labels, classification)

        # compute the normalizer: the number of positive anchors
        normalizer = backend.where(keras.backend.equal(anchor_state, 1))
        normalizer = keras.backend.cast(keras.backend.shape(normalizer)[0], keras.backend.floatx())
        normalizer = keras.backend.maximum(1.0, normalizer)

        return keras.backend.sum(cls_loss) / normalizer

    return _focal

网络结构

基础网络-DenseNet121

源码地址：/home/tf/anaconda3/lib/python3.6/site-packages/keras_applications/densenet.py

DenseNet121的blocks == [6, 12, 24, 16]，表示有四个densenet块，每个块里面分别有6,12,24,16个conv块，每个conv块里面有两个conv层。这样总共就有2×（6+12+24+16）=116。

两densenet块之间有个卷积层，四个块之间就有三个卷积，再加上开始的卷积和最后的全连接，共有116+3+1+1=121。

这里选取blocks的后三个卷积层进行多特征融合。

融合

通过上采样，下采样及融合等过程，得到 $p_3,p_4,p_5,p_6,p_7$ 五个特征图，分别在五个特征图上进行分类和回归的子网络操作。
注意所有分类子网络最后都会reshape为（-1 , num_class），回归子网络都会reshape为（-1 , 4）,两者中的-1代表行，每一行表示一个anchors。
最后将所有的分类和回归结果进行Concatenate（合并）。

generate_anchors

/home/tf/keras-retinanet/keras_retinanet/utils/anchors.py
这里以参数：scales = [1 , 1.26 , 1.59 ] ; ratios = [0.5 , 1 , 2 ] ; base_size = 16为例来说下生成

def generate_anchors(base_size=16, ratios=None, scales=None):
    """
    Generate anchor (reference) windows by enumerating aspect ratios X
    scales w.r.t. a reference window.
    """

    if ratios is None:
        ratios = AnchorParameters.default.ratios

    if scales is None:
        scales = AnchorParameters.default.scales

    num_anchors = len(ratios) * len(scales)

    # initialize output anchors
    # 首先初始化一个shape为（9,4）全为零的array
    anchors = np.zeros((num_anchors, 4))

    # scale base_size
    # 第三列和第四列按照scales进行初始化
    """
    >>> anchors
array([[ 0.        ,  0.        , 16.        , 16.        ],
       [ 0.        ,  0.        , 20.15873718, 20.15873718],
       [ 0.        ,  0.        , 25.39841652, 25.39841652],
       [ 0.        ,  0.        , 16.        , 16.        ],
       [ 0.        ,  0.        , 20.15873718, 20.15873718],
       [ 0.        ,  0.        , 25.39841652, 25.39841652],
       [ 0.        ,  0.        , 16.        , 16.        ],
       [ 0.        ,  0.        , 20.15873718, 20.15873718],
       [ 0.        ,  0.        , 25.39841652, 25.39841652]])

    """
    anchors[:, 2:] = base_size * np.tile(scales, (2, len(ratios))).T

    # compute areas of anchors
    """
    >>> areas
array([256.        , 406.3746848 , 645.07956168, 256.        ,
       406.3746848 , 645.07956168, 256.        , 406.3746848 ,
       645.07956168])
    """
    areas = anchors[:, 2] * anchors[:, 3]

    # correct for ratios
    """>>> anchors
array([[ 0.        ,  0.        , 22.627417  , 16.        ],
       [ 0.        ,  0.        , 28.50875952, 20.15873718],
       [ 0.        ,  0.        , 35.9187851 , 25.39841652],
       [ 0.        ,  0.        , 16.        , 16.        ],
       [ 0.        ,  0.        , 20.15873718, 20.15873718],
       [ 0.        ,  0.        , 25.39841652, 25.39841652],
       [ 0.        ,  0.        , 11.3137085 , 16.        ],
       [ 0.        ,  0.        , 14.25437976, 20.15873718],
       [ 0.        ,  0.        , 17.95939255, 25.39841652]])

    """
    anchors[:, 2] = np.sqrt(areas / np.repeat(ratios, len(scales)))
    """>>> anchors
array([[ 0.        ,  0.        , 22.627417  , 11.3137085 ],
       [ 0.        ,  0.        , 28.50875952, 14.25437976],
       [ 0.        ,  0.        , 35.9187851 , 17.95939255],
       [ 0.        ,  0.        , 16.        , 16.        ],
       [ 0.        ,  0.        , 20.15873718, 20.15873718],
       [ 0.        ,  0.        , 25.39841652, 25.39841652],
       [ 0.        ,  0.        , 11.3137085 , 22.627417  ],
       [ 0.        ,  0.        , 14.25437976, 28.50875952],
       [ 0.        ,  0.        , 17.95939255, 35.9187851 ]])

    """
    anchors[:, 3] = anchors[:, 2] * np.repeat(ratios, len(scales))

    # transform from (x_ctr, y_ctr, w, h) -> (x1, y1, x2, y2)
    anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, (2, 1)).T
    anchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, (2, 1)).T

    return anchors