Caffe Loss 层 - SigmoidCrossEntropyLoss 推导与Python实现

17 篇文章 0 订阅
16 篇文章 0 订阅

Caffe Loss 层 - SigmoidCrossEntropyLoss 推导与Python实现

[原文 - Caffe custom sigmoid cross entropy loss layer].

很清晰的一篇介绍,学习下.

1. Sigmoid Cross Entropy Loss 推导

Sigmoid Cross Entropy Loss 定义形式:

L=tln(P)+(1t)ln(1P) L = t l n ( P ) + ( 1 − t ) l n ( 1 − P ) ​

其中,

  • t t - target 或 label;
  • P - Sigmoid Score, P=11+ex P = 1 1 + e − x

则有:

L=tln(11+ex)+(1t)ln(111+ex) L = t l n ( 1 1 + e − x ) + ( 1 − t ) l n ( 1 − 1 1 + e − x )

公式推导有:

L=tln(11+ex)+(1t)ln(ex1+ex) L = t l n ( 1 1 + e − x ) + ( 1 − t ) l n ( e − x 1 + e − x )

L=tln(11+ex)+ln(ex1+ex)tln(ex1+ex) L = t l n ( 1 1 + e − x ) + l n ( e − x 1 + e − x ) − t l n ( e − x 1 + e − x )

L=t[ln1ln(1+ex)]+[ln(e1)ln(1+ex)]t[ln(ex)ln(1+ex)] L = t [ l n 1 − l n ( 1 + e − x ) ] + [ l n ( e − 1 ) − l n ( 1 + e − x ) ] − t [ l n ( e − x ) − l n ( 1 + e − x ) ]

L=[tln(1+ex)]+ln(ex)ln(1+ex)tln(ex)+[tln(1+ex)] L = [ − t l n ( 1 + e − x ) ] + l n ( e − x ) − l n ( 1 + e − x ) − t l n ( e − x ) + [ t l n ( 1 + e − x ) ]

合并相关项:

L=ln(ex)ln(1+ex)tln(ex) L = l n ( e − x ) − l n ( 1 + e − x ) − t l n ( e − x )

L=xln(e)ln(1+ex)+txln(e) L = − x l n ( e ) − l n ( 1 + e − x ) + t x l n ( e )

L=xln(1+ex)+xt L = − x − l n ( 1 + e − x ) + x t

即:

L=xtxln(1+ex) L = x t − x − l n ( 1 + e − x ) <1>

ex e − x (左) 和 ex e x (右) 的函数特点:
这里写图片描述

ex e − x 随着 x x 值的增加而减小,当 x 值为较大的负值时, ex e − x 值变得非常大,很容易引起溢出(overflow). 也就是说,函数需要避免出现这种数据类型.

因此,为了避免溢出,对损失函数 L L 进行改动. 即,当 x<0 时,采用 ex e x 进行修改损失函数:

原损失函数: L=xtxln(1+ex) L = x t − x − l n ( 1 + e − x ) <1>

有: L=xtx+ln(11+ex) L = x t − x + l n ( 1 1 + e − x )

最后一项乘以 ex e x

L=xtx+ln(1ex(1+ex)ex) L = x t − x + l n ( 1 ∗ e x ( 1 + e − x ) ∗ e x )

L=xtx+ln(ex1+ex) L = x t − x + l n ( e x 1 + e x )

L=xtx+[ln(ex)ln(1+ex)] L = x t − x + [ l n ( e x ) − l n ( 1 + e x ) ]

L=xtx+xlneln(1+ex) L = x t − x + x l n e − l n ( 1 + e x )

有:

L=xtln(1+ex) L = x t − l n ( 1 + e x ) <2>

根据 <1> 和 <2>,可以得到最终的损失函数:

L=xtxln(1+ex)(x>0) L = x t − x − l n ( 1 + e − x ) , ( x > 0 )

L=xt0ln(1+ex)(x<0) L = x t − 0 − l n ( 1 + e x ) , ( x < 0 )

合二为一,有:

L=xtmax(x,0)ln(1+e|x|)for all x L = x t − m a x ( x , 0 ) − l n ( 1 + e − | x | ) , f o r   a l l   x

2. Sigmoid Cross Entropy Loss 求导计算

x>0 x > 0 时, L=xtxln(1+ex) L = x t − x − l n ( 1 + e − x )

有:

Lx=(xtxln(1+ex))x ∂ L ∂ x = ∂ ( x t − x − l n ( 1 + e − x ) ) ∂ x

Lx=xtxxx(ln(1+ex))x ∂ L ∂ x = ∂ x t ∂ x − ∂ x ∂ x − ∂ ( l n ( 1 + e − x ) ) ∂ x

Lx=t111+ex(1+ex)x ∂ L ∂ x = t − 1 − 1 1 + e − x ∗ ∂ ( 1 + e − x ) ∂ x

Lx=t111+ex(ex)x ∂ L ∂ x = t − 1 − 1 1 + e − x ∗ ∂ ( e − x ) ∂ x

Lx=t1+ex1+ex ∂ L ∂ x = t − 1 + e − x 1 + e − x

有:

Lx=t11+ex ∂ L ∂ x = t − 1 1 + e − x

第二项为 Sigmoid 函数 P=11+ex P = 1 1 + e − x ,故,

Lx=tP ∂ L ∂ x = t − P

x<0 x < 0 时, L=xtln(1+ex) L = x t − l n ( 1 + e x )

Lx=(xtln(1+ex))x ∂ L ∂ x = ∂ ( x t − l n ( 1 + e x ) ) ∂ x

Lx=xtx(ln(1+ex))x ∂ L ∂ x = ∂ x t ∂ x − ∂ ( l n ( 1 + e x ) ) ∂ x

Lx=t11+ex(ex)x ∂ L ∂ x = t − 1 1 + e x ∗ ∂ ( e x ) ∂ x

Lx=tex1+ex ∂ L ∂ x = t − e x 1 + e x

Lx=texex(1+ex)(ex) ∂ L ∂ x = t − e x ∗ e − x ( 1 + e x ) ( e − x )

Lx=t11+ex ∂ L ∂ x = t − 1 1 + e − x

第二项为 Sigmoid 函数 P=11+ex P = 1 1 + e − x ,故,

Lx=tP ∂ L ∂ x = t − P

可以看出,对于 x>0 x > 0 x<0 x < 0 ,其求导的结果是一样的,都是 target 值与 Sigmoid 值的差值.

3. 基于 Python 定制 caffe loss layer

Caffe 官方给出了基于 Python 定制 EuclideanLossLayer 的 Demo.

这里,根据上面的公式推导,创建基于 Python 的 Caffe SigmoidCrossEntropyLossLayer.
Caffe 自带的是 C++ 实现 - SigmoidCrossEntropyLossLayer,可见 Caffe Loss层 - SigmoidCrossEntropyLossLayer.

假设 Labels{0,1} L a b e l s ∈ { 0 , 1 } .

3.1 SigmoidCrossEntropyLossLayer 实现

import caffe
import scipy

class CustomSigmoidCrossEntropyLossLayer(caffe.Layer):

    def setup(self, bottom, top):
        # check for all inputs
        if len(bottom) != 2:
            raise Exception("Need two inputs (scores and labels) to compute sigmoid crossentropy loss.")

    def reshape(self, bottom, top):
        # check input dimensions match between the scores and labels
        if bottom[0].count != bottom[1].count:
            raise Exception("Inputs must have the same dimension.")
        # difference would be the same shape as any input
        self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)
        # layer output would be an averaged scalar loss
        top[0].reshape(1)

    def forward(self, bottom, top):
        score=bottom[0].data
        label=bottom[1].data

        first_term=np.maximum(score,0)
        second_term=-1*score*label
        third_term=np.log(1+np.exp(-1*np.absolute(score)))

        top[0].data[...]=np.sum(first_term+second_term+third_term)
        sig=scipy.special.expit(score)
        self.diff=(sig-label)
        if np.isnan(top[0].data):
                exit()

    def backward(self, top, propagate_down, bottom):
        bottom[0].diff[...]=self.diff

3.2 prototxt 中定义

layer {
  type: 'Python'
  name: 'loss'
  top: 'loss_opt'
  bottom: 'score'
  bottom: 'label'
  python_param {
    # the module name -- usually the filename -- that needs to be in $PYTHONPATH
    module: 'loss_layers'
    # the layer name -- the class name in the module
    layer: 'CustomSigmoidCrossEntropyLossLayer'
  }
  include {
        phase: TRAIN
  }
  # set loss weight so Caffe knows this is a loss layer.
  # since PythonLayer inherits directly from Layer, this isn't automatically
  # known to Caffe
  loss_weight: 1
}

[1] - Caffe Loss层 - SigmoidCrossEntropyLossLayer

评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值