TensorFlow深入了解损失函数Categorical Cross-Entropy Loss、Binary Cross-Entropy Loss等

最新推荐文章于 2025-02-25 17:04:10 发布

XerCis

最新推荐文章于 2025-02-25 17:04:10 发布

阅读量4.4k

点赞数 3

分类专栏： Python Tensorflow 文章标签： python tensorflow

本文链接：https://blog.csdn.net/lly1122334/article/details/118934108

版权

Python 同时被 2 个专栏收录

529 篇文章

订阅专栏

Tensorflow

37 篇文章

订阅专栏

简介

深度学习中各种各样的损失函数容易让人混淆，根据任务对它们分门别类便于理解。

损失函数又称成本函数或目标函数，它用于寻找真实值与预测值之间的差异，帮助模型最小化它们之间的距离。

安装

pip install tensorflow-gpu==2.3.0
pip install tensorflow-addons

初试

TensorFlow Keras 会在模型编译时使用损失函数，可以传递实例或字符串别名

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential()
model.add(layers.Dense(64, kernel_initializer='uniform', input_shape=(10,)))
model.add(layers.Activation('softmax'))

loss_function = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(loss=loss_function, optimizer='adam')  # 传递实例

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')  # 字符串别名

交叉熵 Cross Entropy

$CE=-\sum_i^C{t_i\log \left( s_i \right)}$

$t_i$ 和 $s_i$ 是 $C$ 中每一个类 $Class_i$ 的真实值

二分类 Binary Classification

二分类损失函数用于解决只有两个分类的问题。如：

预测交易中的欺诈行为，要么欺诈，要么不是。
猫狗分类，要么是猫，要么是狗。

激活函数：sigmoid

二元交叉熵 Binary Cross Entropy

二元交叉熵 BinaryCrossentropy 将计算真实值与预测值之间的交叉熵损失，在二分类中，使用的激活函数是 sigmoid，将输出限制为0到1。

sum_over_batch_size：返回批次中每个样本损失的平均值
sum：返回批次中每个样本损失的和
none：返回每个样本损失的完整数组

预测值取值范围为 $\left[ 0,1 \right]$

import tensorflow as tf

y_true = [[0., 1.], [0.2, 0.8], [0.3, 0.7], [0.4, 0.6]]
y_pred = [[0.6, 0.4], [0.4, 0.6], [0.6, 0.4], [0.8, 0.2]]

bce = tf.keras.losses.BinaryCrossentropy(reduction='sum_over_batch_size')
print(bce(y_true, y_pred).numpy())  # 0.839445
bce = tf.keras.losses.BinaryCrossentropy(reduction='sum')
print(bce(y_true, y_pred).numpy())  # 3.35778
bce = tf.keras.losses.BinaryCrossentropy(reduction='none')
print(bce(y_true, y_pred).numpy())  # [0.9162905  0.5919184  0.79465103 1.0549198 ]

若不使用激活函数 sigmoid，预测值为浮点数，预测值取值范围为 $\begin{bmatrix}-\infty, +\infty\end{bmatrix}$ 使用参数 from_logits=True

import tensorflow as tf

# 批次大小为1，样本数为4
y_true = [0, 1, 0, 0]  # 真实值
y_pred = [-18.6, 0.51, 2.94, -12.8]  # 预测值
bce = tf.keras.losses.BinaryCrossentropy(from_logits=True)  # 预测值为浮点数时使用from_logits=True
print(bce(y_true, y_pred).numpy())  # 0.865458

# 批次大小为2，样本数为4
y_true = [[0, 1], [0, 0]]
y_pred = [[-18.6, 0.51], [2.94, -12.8]]
bce = tf.keras.losses.BinaryCrossentropy(from_logits=True, reduction='sum_over_batch_size')
print(bce(y_true, y_pred).numpy())  # 0.865458
print(bce(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy())  # 0.2436386 带样本权重

多分类 Multiclass classification

激活函数：softmax

分类交叉熵 Categorical Crossentropy

分类交叉熵 CategoricalCrossentropy 计算真实值与预测值之间的交叉熵损失，标签以独热编码 one_hot 的形式给出

import tensorflow as tf

y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]

cce = tf.keras.losses.CategoricalCrossentropy()
print(cce(y_true, y_pred).numpy())  # 1.1769392
print(cce(y_true, y_pred, sample_weight=tf.constant([0.3, 0.7])).numpy())  # 0.8135988 带样本权重

cce = tf.keras.losses.CategoricalCrossentropy(reduction='sum')
print(cce(y_true, y_pred).numpy())  # 2.3538785
cce = tf.keras.losses.CategoricalCrossentropy(reduction='none')
print(cce(y_true, y_pred).numpy())  # [0.05129331 2.3025851 ]

稀疏分类交叉熵 Sparse Categorical Crossentropy

稀疏分类交叉熵 SparseCategoricalCrossentropy 计算真实值与预测值之间的交叉熵损失，标签以整数 int 的形式给出

import tensorflow as tf

y_true = [1, 2]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]

scce = tf.keras.losses.SparseCategoricalCrossentropy()
print(scce(y_true, y_pred).numpy())  # 1.1769392
print(scce(y_true, y_pred, sample_weight=tf.constant([0.3, 0.7])).numpy())  # 0.8135988 带样本权重

scce = tf.keras.losses.SparseCategoricalCrossentropy(reduction='sum')
print(scce(y_true, y_pred).numpy())  # 2.3538785
scce = tf.keras.losses.SparseCategoricalCrossentropy(reduction='none')
print(scce(y_true, y_pred).numpy())  # [0.05129331 2.3025851 ]

泊松损失 Poisson

泊松损失 Poisson ，适用于符合泊松分布的数据集，如呼叫中心每小时接到的电话数

import tensorflow as tf

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [0., 0.]]
p = tf.keras.losses.Poisson()
print(p(y_true, y_pred).numpy())  # 0.5
print(p(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy())  # 0.4 带样本权重

p = tf.keras.losses.Poisson(reduction='sum')
print(p(y_true, y_pred).numpy())  # 0.999
p = tf.keras.losses.Poisson(reduction='none')
print(p(y_true, y_pred).numpy())  # [0.99999994 0.        ]

相对熵 Kullback-Leibler Divergence Loss

相对熵 KLDivergence ，也称KL散度，是连续分布的一种距离度量，通常在离散采样连续输出分布空间上直接回归。

loss = y_true * log(y_true / y_pred)

import tensorflow as tf

y_true = [[0, 1], [0, 0]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
kl = tf.keras.losses.KLDivergence()
print(kl(y_true, y_pred).numpy())  # 0.45814306
print(kl(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy())  # 0.3665154

kl = tf.keras.losses.Poisson(reduction='sum')
print(kl(y_true, y_pred).numpy())  # 1.4581453
kl = tf.keras.losses.Poisson(reduction='none')
print(kl(y_true, y_pred).numpy())  # [0.95814526 0.5       ]

目标检测 Object Detection

焦点损失 Focal Loss

焦点损失 SigmoidFocalCrossEntropy 适用于涉及不平衡数据和目标检测的分类问题，它弱化了分类良好的样本而侧重于难以分类的样本。

与分类良好的样本对应的损失值相比，被分类器误分类的样本的损失值要高得多。

焦点损失的经典用例是背景类和其他类的多分类问题。

$\text{FL}\left( p_t \right) =-\left( 1-p_t \right) ^{\gamma}\log \left( p_t \right)$

import tensorflow_addons as tfa

y_true = [[1.0], [1.0], [0.0]]
y_pred = [[0.97], [0.91], [0.03]]
sfc = tfa.losses.SigmoidFocalCrossEntropy()
print(sfc(y_true, y_pred).numpy())  # [6.8532745e-06 1.9097870e-04 2.0559824e-05]

GIoU Generalized Intersection over Union

GIoULoss 是对 IoU 进行目标检测的一种改进，适用于不重叠边界框的问题

import tensorflow_addons as tfa

boxes1 = [[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]
boxes2 = [[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0]]
gl = tfa.losses.GIoULoss()
print(gl(boxes1, boxes2).numpy())  # 1.5041667

回归 Regression

回归多用于预测一个具体的数值，如预测房价、未来的天气等。

均方差 Mean Squared Error

MeanSquaredError 计算真实值和预测值之间的误差的平方平均值，适用于大误差比小误差更受惩罚的情况。

import tensorflow as tf

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
mse = tf.keras.losses.MeanSquaredError()
print(mse(y_true, y_pred).numpy())  # 0.5
print(mse(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy())  # 0.25

mse = tf.keras.losses.MeanSquaredError(reduction='sum')
print(mse(y_true, y_pred).numpy())  # 1.0
mse = tf.keras.losses.MeanSquaredError(reduction='none')
print(mse(y_true, y_pred).numpy())  # [0.5 0.5]

平均绝对误差百分比 Mean Absolute Percentage Error

MeanAbsolutePercentageError 常用于销量预测

loss = 100 * abs(y_true - y_pred) / y_true

import tensorflow as tf

y_true = [[2., 1.], [2., 3.]]
y_pred = [[1., 1.], [1., 0.]]
mape = tf.keras.losses.MeanAbsolutePercentageError()
print(mape(y_true, y_pred).numpy())  # 50.0
print(mape(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy())  # 20.0

mape = tf.keras.losses.MeanAbsolutePercentageError(reduction='sum')
print(mape(y_true, y_pred).numpy())  # 100.0
mape = tf.keras.losses.MeanAbsolutePercentageError(reduction='none')
print(mape(y_true, y_pred).numpy())  # [25. 75.]

均方对数误差 Mean Squared Logarithmic Error

MeanSquaredLogarithmicError 对低估的惩罚要大于高估，适用于处理离群值。

loss = square(log(y_true + 1.) - log(y_pred + 1.))

import tensorflow as tf

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
msle = tf.keras.losses.MeanSquaredLogarithmicError()
print(msle(y_true, y_pred).numpy())  # 0.24022643
print(msle(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy())  # 0.12011322

msle = tf.keras.losses.MeanSquaredLogarithmicError(reduction='sum')
print(msle(y_true, y_pred).numpy())  # 0.48045287
msle = tf.keras.losses.MeanSquaredLogarithmicError(reduction='none')
print(msle(y_true, y_pred).numpy())  # [0.24022643 0.24022643]

余弦相似度 Cosine Similarity Loss

CosineSimilarity 结果是 -1 到 1 之间的数，越接近 -1 越相似

loss = -sum(l2_norm(y_true) * l2_norm(y_pred))

import tensorflow as tf

y_true = [[0., 1.], [1., 1.]]
y_pred = [[1., 0.], [1., 1.]]
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
print(cosine_loss(y_true, y_pred).numpy())  # -0.5
print(cosine_loss(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy())  # -0.0999

cosine_loss = tf.keras.losses.CosineSimilarity(axis=1, reduction='sum')
print(cosine_loss(y_true, y_pred).numpy())  # -0.999
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1, reduction='none')
print(cosine_loss(y_true, y_pred).numpy())  # [-0.         -0.99999994]

LogCosh Loss

LogCosh 类似于均方误差，但不会受到偶尔严重错误预测的强烈影响。

x = y_pred - y_true

logcosh = log((exp(x) + exp(-x))/2)

import tensorflow as tf

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [0., 0.]]
l = tf.keras.losses.LogCosh()
print(l(y_true, y_pred).numpy())  # 0.108
print(l(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy())  # 0.087

l = tf.keras.losses.LogCosh(reduction='sum')
print(l(y_true, y_pred).numpy())  # 0.217
l = tf.keras.losses.LogCosh(reduction='none')
print(l(y_true, y_pred).numpy())  # [0.2168904 0.       ]

Huber loss

Huber 适用于对异常值不太敏感的回归问题

x = y_true - y_pred

loss = 0.5 * x^2                  if |x| <= d
loss = 0.5 * d^2 + d * (|x| - d)  if |x| > d

import tensorflow as tf

y_true = [[0, 1], [0, 0]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
h = tf.keras.losses.Huber()
print(h(y_true, y_pred).numpy())  # 0.155
print(h(y_true, y_pred, sample_weight=[1, 0]).numpy())  # 0.09

h = tf.keras.losses.LogCosh(reduction='sum')
print(h(y_true, y_pred).numpy())  # 0.29417965
h = tf.keras.losses.LogCosh(reduction='none')
print(h(y_true, y_pred).numpy())  # [0.17013526 0.12404439]

学习嵌入 Learning Embeddings

三元组损失 Triplet Loss

TripletSemiHardLoss 鼓励具有相同标签的嵌入对之间的正距离小于最小负距离

TensorFlow Addons Losses: TripletSemiHardLoss

自定义损失函数

import numpy as np
import tensorflow as tf


def custom_loss_function(y_true, y_pred):
    """自定义损失函数"""
    squared_difference = tf.square(y_true - y_pred)
    return tf.reduce_mean(squared_difference, axis=-1)


y_true = [12, 20, 29., 60.]
y_pred = [14., 18., 27., 55.]
print(custom_loss_function(np.array(y_true), np.array(y_pred)).numpy())  # 9.25

样本权重

阅读 Python计算数据集不同类别的权重class_weight

在调用 Model.compile() 时指定参数 loss_weights

或调用 Model.fit() 时指定参数 class_weight

为什么损失函数的值为nan

当发生损失函数的值为 nan 时，模型将停止学习不会更新权重。

造成 nan 的原因有很多：

训练集 nan 导致损失函数的 nan
NumPy 在训练集中的无限大导致 nan
用了没有缩放的训练集
很大的 L2正则化
大于 1 的学习率
错误的优化器
大梯度导致训练期间网络权值的大更新

总结

任务	损失函数	激活函数
二分类	`BinaryCrossentropy`	`sigmoid`
多分类	`CategoricalCrossentropy` `SparseCategoricalCrossentropy`	`softmax`
目标检测	`SigmoidFocalCrossEntropy` `GIoULoss`
回归	`MeanSquaredError`