Caffe Loss 层 - SigmoidCrossEntropyLoss 推导与Python实现
[原文 - Caffe custom sigmoid cross entropy loss layer].
很清晰的一篇介绍,学习下.
1. Sigmoid Cross Entropy Loss 推导
Sigmoid Cross Entropy Loss 定义形式:
L=tln(P)+(1−t)ln(1−P) L = t l n ( P ) + ( 1 − t ) l n ( 1 − P )
其中,
- t t - target 或 label;
- - Sigmoid Score, P=11+e−x P = 1 1 + e − x
则有:
L=tln(11+e−x)+(1−t)ln(1−11+e−x) L = t l n ( 1 1 + e − x ) + ( 1 − t ) l n ( 1 − 1 1 + e − x )
公式推导有:
L=tln(11+e−x)+(1−t)ln(e−x1+e−x) L = t l n ( 1 1 + e − x ) + ( 1 − t ) l n ( e − x 1 + e − x )
L=tln(11+e−x)+ln(e−x1+e−x)−tln(e−x1+e−x) L = t l n ( 1 1 + e − x ) + l n ( e − x 1 + e − x ) − t l n ( e − x 1 + e − x )
L=t[ln1−ln(1+e−x)]+[ln(e−1)−ln(1+e−x)]−t[ln(e−x)−ln(1+e−x)] L = t [ l n 1 − l n ( 1 + e − x ) ] + [ l n ( e − 1 ) − l n ( 1 + e − x ) ] − t [ l n ( e − x ) − l n ( 1 + e − x ) ]
L=[−tln(1+e−x)]+ln(e−x)−ln(1+e−x)−tln(e−x)+[tln(1+e−x)] L = [ − t l n ( 1 + e − x ) ] + l n ( e − x ) − l n ( 1 + e − x ) − t l n ( e − x ) + [ t l n ( 1 + e − x ) ]
合并相关项:
L=ln(e−x)−ln(1+e−x)−tln(e−x) L = l n ( e − x ) − l n ( 1 + e − x ) − t l n ( e − x )
L=−xln(e)−ln(1+e−x)+txln(e) L = − x l n ( e ) − l n ( 1 + e − x ) + t x l n ( e )
L=−x−ln(1+e−x)+xt L = − x − l n ( 1 + e − x ) + x t
即:
L=xt−x−ln(1+e−x) L = x t − x − l n ( 1 + e − x ) <1>
e−x
e
−
x
(左) 和
ex
e
x
(右) 的函数特点:
e−x e − x 随着 x x 值的增加而减小,当 值为较大的负值时, e−x e − x 值变得非常大,很容易引起溢出(overflow). 也就是说,函数需要避免出现这种数据类型.
因此,为了避免溢出,对损失函数 L L 进行改动. 即,当 时,采用 ex e x 进行修改损失函数:
原损失函数: L=xt−x−ln(1+e−x) L = x t − x − l n ( 1 + e − x ) <1>
有: L=xt−x+ln(11+e−x) L = x t − x + l n ( 1 1 + e − x )
最后一项乘以 ex e x :
L=xt−x+ln(1∗ex(1+e−x)∗ex) L = x t − x + l n ( 1 ∗ e x ( 1 + e − x ) ∗ e x )
L=xt−x+ln(ex1+ex) L = x t − x + l n ( e x 1 + e x )
L=xt−x+[ln(ex)−ln(1+ex)] L = x t − x + [ l n ( e x ) − l n ( 1 + e x ) ]
L=xt−x+xlne−ln(1+ex) L = x t − x + x l n e − l n ( 1 + e x )
有:
L=xt−ln(1+ex) L = x t − l n ( 1 + e x ) <2>
根据 <1> 和 <2>,可以得到最终的损失函数:
L=xt−x−ln(1+e−x),(x>0) L = x t − x − l n ( 1 + e − x ) , ( x > 0 )
L=xt−0−ln(1+ex),(x<0) L = x t − 0 − l n ( 1 + e x ) , ( x < 0 )
合二为一,有:
L=xt−max(x,0)−ln(1+e−|x|),for all x L = x t − m a x ( x , 0 ) − l n ( 1 + e − | x | ) , f o r a l l x
2. Sigmoid Cross Entropy Loss 求导计算
当 x>0 x > 0 时, L=xt−x−ln(1+e−x) L = x t − x − l n ( 1 + e − x ) ,
有:
∂L∂x=∂(xt−x−ln(1+e−x))∂x ∂ L ∂ x = ∂ ( x t − x − l n ( 1 + e − x ) ) ∂ x
∂L∂x=∂xt∂x−∂x∂x−∂(ln(1+e−x))∂x ∂ L ∂ x = ∂ x t ∂ x − ∂ x ∂ x − ∂ ( l n ( 1 + e − x ) ) ∂ x
∂L∂x=t−1−11+e−x∗∂(1+e−x)∂x ∂ L ∂ x = t − 1 − 1 1 + e − x ∗ ∂ ( 1 + e − x ) ∂ x
∂L∂x=t−1−11+e−x∗∂(e−x)∂x ∂ L ∂ x = t − 1 − 1 1 + e − x ∗ ∂ ( e − x ) ∂ x
∂L∂x=t−1+e−x1+e−x ∂ L ∂ x = t − 1 + e − x 1 + e − x
有:
∂L∂x=t−11+e−x ∂ L ∂ x = t − 1 1 + e − x
第二项为 Sigmoid 函数 P=11+e−x P = 1 1 + e − x ,故,
∂L∂x=t−P ∂ L ∂ x = t − P
当 x<0 x < 0 时, L=xt−ln(1+ex) L = x t − l n ( 1 + e x ) ,
∂L∂x=∂(xt−ln(1+ex))∂x ∂ L ∂ x = ∂ ( x t − l n ( 1 + e x ) ) ∂ x
∂L∂x=∂xt∂x−∂(ln(1+ex))∂x ∂ L ∂ x = ∂ x t ∂ x − ∂ ( l n ( 1 + e x ) ) ∂ x
∂L∂x=t−11+ex∗∂(ex)∂x ∂ L ∂ x = t − 1 1 + e x ∗ ∂ ( e x ) ∂ x
∂L∂x=t−ex1+ex ∂ L ∂ x = t − e x 1 + e x
∂L∂x=t−ex∗e−x(1+ex)(e−x) ∂ L ∂ x = t − e x ∗ e − x ( 1 + e x ) ( e − x )
∂L∂x=t−11+e−x ∂ L ∂ x = t − 1 1 + e − x
第二项为 Sigmoid 函数 P=11+e−x P = 1 1 + e − x ,故,
∂L∂x=t−P ∂ L ∂ x = t − P
可以看出,对于 x>0 x > 0 和 x<0 x < 0 ,其求导的结果是一样的,都是 target 值与 Sigmoid 值的差值.
3. 基于 Python 定制 caffe loss layer
Caffe 官方给出了基于 Python 定制 EuclideanLossLayer 的 Demo.
这里,根据上面的公式推导,创建基于 Python 的 Caffe SigmoidCrossEntropyLossLayer.
Caffe 自带的是 C++ 实现 - SigmoidCrossEntropyLossLayer,可见 Caffe Loss层 - SigmoidCrossEntropyLossLayer.
假设 Labels∈{0,1} L a b e l s ∈ { 0 , 1 } .
3.1 SigmoidCrossEntropyLossLayer 实现
import caffe
import scipy
class CustomSigmoidCrossEntropyLossLayer(caffe.Layer):
def setup(self, bottom, top):
# check for all inputs
if len(bottom) != 2:
raise Exception("Need two inputs (scores and labels) to compute sigmoid crossentropy loss.")
def reshape(self, bottom, top):
# check input dimensions match between the scores and labels
if bottom[0].count != bottom[1].count:
raise Exception("Inputs must have the same dimension.")
# difference would be the same shape as any input
self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)
# layer output would be an averaged scalar loss
top[0].reshape(1)
def forward(self, bottom, top):
score=bottom[0].data
label=bottom[1].data
first_term=np.maximum(score,0)
second_term=-1*score*label
third_term=np.log(1+np.exp(-1*np.absolute(score)))
top[0].data[...]=np.sum(first_term+second_term+third_term)
sig=scipy.special.expit(score)
self.diff=(sig-label)
if np.isnan(top[0].data):
exit()
def backward(self, top, propagate_down, bottom):
bottom[0].diff[...]=self.diff
3.2 prototxt 中定义
layer {
type: 'Python'
name: 'loss'
top: 'loss_opt'
bottom: 'score'
bottom: 'label'
python_param {
# the module name -- usually the filename -- that needs to be in $PYTHONPATH
module: 'loss_layers'
# the layer name -- the class name in the module
layer: 'CustomSigmoidCrossEntropyLossLayer'
}
include {
phase: TRAIN
}
# set loss weight so Caffe knows this is a loss layer.
# since PythonLayer inherits directly from Layer, this isn't automatically
# known to Caffe
loss_weight: 1
}