Caffe Loss 层 - SigmoidCrossEntropyLoss 推导与Python实现

最新推荐文章于 2024-05-13 14:59:23 发布

AIHGF

最新推荐文章于 2024-05-13 14:59:23 发布

阅读量5.2k

点赞数

分类专栏： Python Caffe CaffeLayer 文章标签： Caffe Loss

Caffe 同时被 3 个专栏收录

37 篇文章 2 订阅

订阅专栏

CaffeLayer

17 篇文章 0 订阅

订阅专栏

Python

16 篇文章 0 订阅

订阅专栏

Caffe Loss 层 - SigmoidCrossEntropyLoss 推导与Python实现

[原文 - Caffe custom sigmoid cross entropy loss layer].

很清晰的一篇介绍，学习下.

1. Sigmoid Cross Entropy Loss 推导

Sigmoid Cross Entropy Loss 定义形式：

$L = tln(P) + (1-t)ln(1-P)$

其中，

$t$ - target 或 label；
$P$ - Sigmoid Score， $P = \frac{1}{1+ e^{-x}}$

则有：

$L = tln(\frac{1}{1 + e^{-x}}) + (1-t)ln(1 - \frac{1}{1 + e^{-x}})$

公式推导有：

$L = tln(\frac{1}{1 + e^{-x}}) + (1-t)ln(\frac{e^{-x}}{1 + e^{-x}})$

$L = tln(\frac{1}{1 + e^{-x}}) + ln(\frac{e^{-x}}{1 + e^{-x}}) -tln(\frac{e^{-x}}{1 + e^{-x}})$

$L = t[ln1 - ln(1 + e^{-x})] + [ln(e^{-1}) - ln(1+e^{-x})] - t[ln(e^{-x}) - ln(1 + e^{-x})]$

$L = [-tln(1 + e^{-x})] + ln(e^{-x}) - ln(1 + e^{-x}) - tln(e^{-x}) + [tln(1 + e^{-x})]$

合并相关项：

$L = ln(e^{-x}) - ln(1 + e^{-x}) - tln(e^{-x})$

$L = -x ln(e) - ln(1 + e^{-x}) + txln(e)$

$L = -x - ln(1 + e^{-x}) + xt$

即：

$L = xt - x -ln(1 + e^{-x})$ <1>

$e^{-x}$ (左) 和 $e^{x}$ (右) 的函数特点：
这里写图片描述

$e^{-x}$ 随着 $x$ 值的增加而减小，当 $x$ 值为较大的负值时， $e^{-x}$ 值变得非常大，很容易引起溢出(overflow). 也就是说，函数需要避免出现这种数据类型.

因此，为了避免溢出，对损失函数 $L$ 进行改动. 即，当 $x < 0$ 时，采用 $e^x$ 进行修改损失函数：

原损失函数： $L = xt -x - ln(1 + e^{-x})$ <1>

有： $L = xt - x + ln(\frac{1}{1 + e^{-x} })$

最后一项乘以 $e^x$ ：

$L = xt - x + ln(\frac{1 * e^x}{(1 + e^{-x}) * e^x})$

$L = xt - x + ln(\frac{e^x}{1 + e^x})$

$L = xt - x + [ln(e^x) - ln(1 + e^x)]$

$L = xt - x + xlne - ln(1 + e^x)$

有：

$L = xt - ln(1 + e^x)$ <2>

根据 <1> 和 <2>，可以得到最终的损失函数：

$L = xt - x -ln(1 + e^{-x})，(x > 0)$

$L = xt - 0 -ln(1 + e^{x})，(x < 0)$

合二为一，有：

$L = xt - max(x, 0) - ln(1 + e^{-|x|})，for\ all\ x$

2. Sigmoid Cross Entropy Loss 求导计算

当 $x > 0$ 时， $L = xt - x -ln(1 + e^{-x})$ ，

有：

$\frac{\partial L}{\partial x} = \frac{\partial (xt - x - ln(1 + e^{-x}))}{\partial x}$

$\frac{\partial L}{\partial x} = \frac{\partial xt}{\partial x} - \frac{\partial x}{\partial x} - \frac{\partial (ln(1 + e ^ {-x}))}{\partial x}$

$\frac{\partial L}{\partial x} = t - 1 - \frac{1}{1 + e^{-x}} * \frac{\partial (1 + e^{-x})}{\partial x}$

$\frac{\partial L}{\partial x} = t - 1 - \frac{1}{1 + e^{-x}} * \frac{\partial (e^{-x})}{\partial x}$

$\frac{\partial L}{\partial x} = t - 1 + \frac{e^{-x}}{1 + e^{-x}}$

有：

$\frac{\partial L}{\partial x} = t - \frac{1}{1 + e^{-x}}$

第二项为 Sigmoid 函数 $P = \frac{1}{1+ e^{-x}}$ ，故，

$\frac{\partial L}{\partial x} = t - P$

当 $x < 0$ 时， $L = xt - ln(1 + e^{x})$ ，

$\frac{\partial L}{\partial x} = \frac{\partial (xt - ln(1 + e^{x}))}{\partial x}$

$\frac{\partial L}{\partial x} = \frac{\partial xt}{\partial x} - \frac{\partial (ln(1 + e^x))}{\partial x}$

$\frac{\partial L}{\partial x} = t - \frac{1}{1 + e^x} * \frac{\partial (e^x)}{\partial x}$

$\frac{\partial L}{\partial x} = t - \frac{e^x}{1 + e^x}$

$\frac{\partial L}{\partial x} = t - \frac{e^x * e^{-x}}{(1 + e^x)(e^{-x})}$

$\frac{\partial L}{\partial x} = t - \frac{1}{1 + e^{-x}}$

第二项为 Sigmoid 函数 $P = \frac{1}{1+ e^{-x}}$ ，故，

$\frac{\partial L}{\partial x} = t - P$

可以看出，对于 $x > 0$ 和 $x < 0$ ，其求导的结果是一样的，都是 target 值与 Sigmoid 值的差值.

3. 基于 Python 定制 caffe loss layer

Caffe 官方给出了基于 Python 定制 EuclideanLossLayer 的 Demo.

这里，根据上面的公式推导，创建基于 Python 的 Caffe SigmoidCrossEntropyLossLayer.
Caffe 自带的是 C++ 实现 - SigmoidCrossEntropyLossLayer，可见 Caffe Loss层 - SigmoidCrossEntropyLossLayer.

假设 $Labels \in \{0, 1\}$ .

3.1 SigmoidCrossEntropyLossLayer 实现

import caffe
import scipy

class CustomSigmoidCrossEntropyLossLayer(caffe.Layer):

    def setup(self, bottom, top):
        # check for all inputs
        if len(bottom) != 2:
            raise Exception("Need two inputs (scores and labels) to compute sigmoid crossentropy loss.")

    def reshape(self, bottom, top):
        # check input dimensions match between the scores and labels
        if bottom[0].count != bottom[1].count:
            raise Exception("Inputs must have the same dimension.")
        # difference would be the same shape as any input
        self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)
        # layer output would be an averaged scalar loss
        top[0].reshape(1)

    def forward(self, bottom, top):
        score=bottom[0].data
        label=bottom[1].data

        first_term=np.maximum(score,0)
        second_term=-1*score*label
        third_term=np.log(1+np.exp(-1*np.absolute(score)))

        top[0].data[...]=np.sum(first_term+second_term+third_term)
        sig=scipy.special.expit(score)
        self.diff=(sig-label)
        if np.isnan(top[0].data):
                exit()

    def backward(self, top, propagate_down, bottom):
        bottom[0].diff[...]=self.diff

3.2 prototxt 中定义

layer {
  type: 'Python'
  name: 'loss'
  top: 'loss_opt'
  bottom: 'score'
  bottom: 'label'
  python_param {
    # the module name -- usually the filename -- that needs to be in $PYTHONPATH
    module: 'loss_layers'
    # the layer name -- the class name in the module
    layer: 'CustomSigmoidCrossEntropyLossLayer'
  }
  include {
        phase: TRAIN
  }
  # set loss weight so Caffe knows this is a loss layer.
  # since PythonLayer inherits directly from Layer, this isn't automatically
  # known to Caffe
  loss_weight: 1
}