TensorFlow内置交叉熵损失函数

最新推荐文章于 2023-01-30 22:51:49 发布

好运来2333

最新推荐文章于 2023-01-30 22:51:49 发布

阅读量830

点赞数 1

分类专栏： MachineLearning

本文链接：https://blog.csdn.net/qq_33254870/article/details/100557161

版权

MachineLearning 专栏收录该内容

12 篇文章

订阅专栏

本文详细解析了TensorFlow中的四种内置交叉熵损失函数，包括sigmoid_cross_entropy_with_logits、softmax_cross_entropy_with_logits_v2、sparse_softmax_cross_entropy_with_logits及weighted_cross_entropy_with_logits。涵盖了各自的应用场景、数学公式及实现细节。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

今天讲解Tensorflow内置4中交叉熵损失函数：

tf.nn.sigmoid_cross_entropy_with_logits
tf.nn.softmax_cross_entropy_with_logits_v2
tf.nn.sparse_softmax_cross_entropy_with_logits
tf.nn.weighted_cross_entropy_with_logits

1. tf.nn.sigmoid_cross_entropy_with_logits

写在前面：这个损失函数要求 logits/labels 类型为 float32或float64，因为在使用时不要将 labels 定义成了 int 型！

tf.nn.sigmoid_cross_entropy_with_logits(
    _sentinel=None,
    labels=None,
    logits=None,
    name=None
)

看看这个损失函数应用于什么场景？

Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.

这个损失函数计算的是概率误差，各个类别相互独立但不必相互排斥。（注：logits 表示未归一化处理的概率, 即网络输出层的输出结果，因为损失函数自己会先用Sigmoid/Softmax进行归一化，对此请参见这篇博客）
For brevity, let x = logits, z = labels. The logistic loss is
$\begin{aligned} &z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x)) \\ = &z * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x))) \\ = &z * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x))) \\ = &z * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x)) \\ = &(1 - z) * x + log(1 + exp(-x)) \\ = &x - x * z + log(1 + exp(-x)) \\ \end{aligned}$ For x < 0, to avoid overflow in exp(-x), we reformulate the above
$\begin{aligned} &x - x * z + log(1 + exp(-x)) \\ = &log(exp(x)) - x * z + log(1 + exp(-x)) \\ = &- x * z + log(1 + exp(x)) \end{aligned}$

PS：那么什么是溢出呢？
定义：当变量的数据类型所提供的位数无法适应某个值时，就会发生溢出（上溢）或下溢。
不妨来看一个例子，假设在一个使用了 2 个字节内存的 short int 类型变量中存储了以下值：
在这里插入图片描述
这是 32 767 的二进制表示，也是能存储在该数据类型中的最大值。这里先不讲负数如何存储的细节，只要知道 short int 数据类型既可以存储正数也可以存储负数就可以了。高阶位(即最左侧位）是 0 的数字被解释为正数，高阶位为 1 的数字则被解释为负数。
如果上面示例中存储的数字加 1，则该变量将变成以下位模式：
在这里插入图片描述
但这不是 32 768。相反，它被解释为负数，所以这不是预期的结果。二进制 1 已经“流入”到高阶位的位置，这就是所谓的溢出（上溢）。
同样地，当一个整数变量保存的数值在其数据类型负值范围的最远端（即最小负值），那么当它被减去 1 时，其高位中的 1 将变为 0，结果数将被解释为正数。这是溢出的另一个例子。

除了溢出以外，浮点值还会遇到下溢的情况。当一个值太接近于零时，就可能会发生这种问题，过小的数字需要更多数位的精度来表示它，因而无法存储在保存它的变量中。
简而言之，溢出就是变量数据类型的位数存储不了给定数据！

当 $x < 0$ 且非常小时，对于 $e^{-x}$ 值可能会非常大，造成溢出！

Hence, to ensure stability and avoid overflow, the implementation uses this equivalent formulation.
$m a x (x, 0) - x * z + l o g (1 + e x p (- a b s (x)))$
再来看一个运用这个损失函数的具体例子：

import numpy as np
import tensorflow as tf
 
def sigmoid(x):
    return 1.0/(1+np.exp(-x))
 
labels = np.array([[1.,0.,0.],[0.,1.,0.],[0.,0.,1.]])
logits = np.array([[11.,8.,7.],[10.,14.,3.],[1.,2.,4.]])

# 根据API内部源码计算Loss
# 单目标
y_pred = sigmoid(logits)
prob_error1 = -labels * np.log(y_pred) - (1 - labels) * np.log(1 - y_pred)
print(".................................................................")
print("----------单目标loss: \n", prob_error1)

# 多目标：张图片可以有多个类别标签
labels1 = np.array([[0.,1.,0.],[1.,1.,0.],[0.,0.,1.]]) 
logits1 = np.array([[1.,8.,7.],[10.,14.,3.],[1.,2.,4.]])
y_pred1 = sigmoid(logits1)
prob_error2 = -labels1 * np.log(y_pred1) - (1 - labels1) * np.log(1-y_pred1)
print(".................................................................")
print("----------多目标loss: \n", prob_error2)


with tf.Session() as sess:
    # 直接调用API， logits 表示未归一化处理的概率
    print("***********************************************************************")
    print("----------单目标loss: \n", sess.run(tf.nn.sigmoid_cross_entropy_with_logits(labels=labels,logits=logits)))
    print("***********************************************************************")
    print("----------多目标loss: \n", sess.run(tf.nn.sigmoid_cross_entropy_with_logits(labels=labels1,logits=logits1)))

观察结果你发现了什么？并尝试一下当 $x < 0$ 且非常小时造成溢出是什么样的？最后用上面的优化公式解决溢出问题。

2. tf.nn.softmax_cross_entropy_with_logits_v2

tf.nn.softmax_cross_entropy_with_logits_v2(
    labels,
    logits,
    axis=None,
    name=None,
    dim=None
)

看看这个损失函数应用于什么场景？

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

这个损失函数只适用于单目标的二分类或多分类问题，即一张图片只能有一个类别标签，而tf.nn.sigmoid_cross_entropy_with_logits一张图片可以有多个类别标签。另外，有效概率分布是指所有的类别是互斥的，但它们对应的概率不须如此。
注意：

tf.nn.sparse_softmax_cross_entropy_with_logits要求概率有且只有一个类别。
该 op 内部对 logits 有 softmax 处理，效率更高，因此其输入需要未归一化的 logits。即不需使用 softmax 的输出，否则结果会不正确。
tf.nn.softmax_cross_entropy_with_logits 反向传播只会发生在 logits中；tf.nn.softmax_cross_entropy_with_logits_v2 反向传播将发生在 logits 和 labels 中。如果要禁止反向传播到 labels 中，请先将 labels 张量传递一个tf.stop_gradient参数，然后再将其传递给此函数。

3. tf.nn.sparse_softmax_cross_entropy_with_logits

tf.nn.sparse_softmax_cross_entropy_with_logits(
    _sentinel=None,
    labels=None,
    logits=None,
    name=None
)

看看这个损失函数应用于什么场景？

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.

NOTE: For this operation, the probability of a given label is considered exclusive. That is, soft classes are not allowed, and the labels vector must provide a single specific index for the true class for each row of logits (each minibatch entry). For soft softmax classification with a probability distribution for each entry, see softmax_cross_entropy_with_logits_v2.

这个损失函数与 tf.nn.softmax_cross_entropy_with_logits_v2 基本一致，不同之处在于：tf.nn.sparse_softmax_cross_entropy_with_logits 给定 label 对应的概率也必须是互斥的，即 labels向量只能在一个特定位置表示真实类别。

4. tf.nn.weighted_cross_entropy_with_logits

tf.nn.softmax_cross_entropy_with_logits_v2(
    labels,
    logits,
    axis=None,
    name=None,
    dim=None
)

看看这个损失函数应用于什么场景？

This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error. The usual cross-entropy cost is defined as: labels * -log(sigmoid(logits)) + (1 - labels) * -log(1 - sigmoid(logits)) .
A value pos_weights > 1 decreases the false negative count, hence increasing the recall. Conversely setting pos_weights < 1 decreases the false positive count and increases the precision.

通常的交叉熵成本定义为：
$t a r g e t s * - l o g (s i g m o i d (l o g i t s)) + (1 - t a r g e t s) * - l o g (1 - s i g m o i d (l o g i t s))$ pos_weight是作为损失表达式中的正目标项的乘法系数引入的：
$targets * -log(sigmoid(logits)) * pos\_weight + (1 - targets) * -log(1 - sigmoid(logits))$

其实这个损失函数类似 sigmoid_cross_entropy_with_logits()，因此也是用于解决二分类问题的。与 sigmoid_cross_entropy_with_logits() 的区别就在于这个损失函数添加了一个权重参数，用于调节正样本损失的比例，显示这是针对正负样本不均衡时提出的方法。

For brevity, let x = logits, z = labels, q = pos_weight. The loss is:
$\begin{aligned} &qz * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x)) \\ = &qz * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x))) \\ = &qz * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x))) \\ = &qz * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x)) \\ = &(1 - z) * x + (qz + 1 - z) * log(1 + exp(-x)) \\ = &(1 - z) * x + (1 + (q - 1) * z) * log(1 + exp(-x)) \\ \end{aligned}$ Setting l = (1 + (q - 1) * z), to ensure stability and avoid overflow, the implementation uses:
$(1 - z) * x + l * (l o g (1 + e x p (- a b s (x))) + m a x (- x, 0))$

参考：TF官网API介绍