Loss——Focal Loss

Loss——Focal Loss

一、简介

  • Focal Loss论文地址:https://arxiv.org/pdf/1708.02002.pdf
  • Focal Loss是基于Cross Entropy修改的,是解决分类不均衡问题的一种方案。 reduces the relative loss for well-classified examples,(pt > :5), putting more focus on hard, misclassified examples,加入参数: α 、 γ \alpha、\gamma αγ,这两个参数自己设定。

二、原理、公式推导

p i = s i g m o i d ( x i ) = 1 1 + e − x i ( 1 ) p_i=sigmoid(x_i)=\frac{1}{1+e^{-x_i}}\qquad(1) pi=sigmoid(xi)=1+exi1(1)
p t = { p i i f , y i = 1 1 − p i o t h e r w i s e ( 2 ) p_t=\left\{ \begin{aligned} p_i & & {if, \quad y_i =1 } \\ 1-p_i & & {otherwise} \\ \end{aligned} \qquad(2)\right. pt={pi1piif,yi=1otherwise(2)
C E ( p t ) = − l o g ( p t ) = − ( y ∗ l o g ( p i ) + ( 1 − y ) l o g ( 1 − p i ) ) ( 3 ) CE(p_t) = -log(p_t)= -(y*log( p_i) + (1-y)log(1-p_i) )\qquad (3) CE(pt)=log(pt)=(ylog(pi)+(1y)log(1pi))(3)
F L ( p t ) = α ( 1 − p t ) γ × C E ( p t ) = − α ( 1 − p t ) γ × l o g ( p t ) = − ( α ( 1 − p i ) γ y ∗ l o g ( p i ) + α p i γ ( 1 − y ) l o g ( 1 − p i ) ) ( 4 ) FL(p_t) = \alpha (1-p_t)^{\gamma} \times CE(p_t)=-\alpha (1-p_t)^{\gamma} \times log(p_t)=-(\alpha (1-p_i)^{\gamma}y*log( p_i) + \alpha p _i^{\gamma}(1-y)log(1-p_i) )\qquad(4) FL(pt)=α(1pt)γ×CE(pt)=α(1pt)γ×log(pt)=(α(1pi)γylog(pi)+αpiγ(1y)log(1pi))(4)

  • 反向传播,求梯度, y ∈ { 1 , − 1 } y\in\{1,-1\} y{1,1}
    ∂ F L ∂ x i = ∂ F L ∂ p t × ∂ p t ∂ x i ( 5 ) \frac{\partial FL}{\partial x_i}=\frac{\partial FL}{\partial p_t}\times \frac{\partial p_t}{\partial x_i}\qquad(5) xiFL=ptFL×xipt(5)
    ∂ F L ∂ p t = − [ − α γ ( 1 − p t ) γ − 1 l o g ( p t ) + ( 1 − p t ) γ p t ] ( 6 ) \frac{\partial FL}{\partial p_t} = -[-\alpha\gamma(1-p_t)^{\gamma-1}log(p_t)+\frac{(1-p_t)^{\gamma}}{p_t}]\qquad(6) ptFL=[αγ(1pt)γ1log(pt)+pt(1pt)γ](6)
    ∂ p t ∂ x i = ∂ p t ∂ p i × ∂ p i ∂ x i = y × ( 1 − p i ) × p i = y × p t × ( 1 − p t ) ( 7 ) \frac{\partial p_t}{\partial x_i} = \frac{\partial p_t}{\partial p_i}\times\frac{\partial p_i}{\partial x_i}=y\times(1-p_i)\times p_i =y\times p_t \times(1-p_t) \qquad(7) xipt=pipt×xipi=y×(1pi)×pi=y×pt×(1pt)(7)
    ∂ F L ∂ x i = ∂ F L ∂ p t × ∂ p t ∂ x i = y α ( 1 − p t ) γ [ γ p t l o g ( p t ) + p t − 1 ] ( 8 ) \frac{\partial FL}{\partial x_i}=\frac{\partial FL}{\partial p_t}\times \frac{\partial p_t}{\partial x_i}=y\alpha(1-p_t)^{\gamma}[\gamma p_tlog(p_t)+p_t-1]\qquad(8) xiFL=ptFL×xipt=yα(1pt)γ[γptlog(pt)+pt1](8)

三、darknet yolov3-spp focal loss 代码(自己添加)

 if (focal_loss) {
        // Focal Loss
        float alpha = 0.5;    // 0.25 or 0.5
        int gamma = 2;
        if (delta[index + stride*class_id]){
            int index_classes = index + stride*class_id;
            float pt = output[index_classes];
            pt = max(pt,  0.000000000000001F);
            float grad = pow(1-pt, gamma) * (gamma * pt * logf(pt) + pt - 1);
            delta[index + stride*class_id] = (-1 * alpha*grad);
            if(avg_cat) *avg_cat += output[index + stride*class_id];
            return;
        }
        for (n = 0; n < classes; ++n) {
            int index_classes = index + stride*n;
            float pt = (n == class_id ? output[index_classes]: (1- output[index_classes]));
            pt = max(pt,  0.000000000000001F);
            float grad = (n == class_id ? pow(1-pt, gamma) * (gamma * pt * logf(pt) + pt - 1): \
                                         -1 * pow(1-pt, gamma) * (gamma * pt * logf(pt) + pt - 1));
            delta[index_classes]  = (-1 * alpha*grad);

            if (n == class_id) *avg_cat += output[index_classes];
        }
    }

四、总结

  • paper 中指出 γ = 2 \gamma=2 γ=2时效果最佳。Focal Loss在实际应用中并非都能提升分类能力。
  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值