Pytorch常用的函数(十)交叉熵损失函数nn.BCELoss()、nn.BCELossWithLogits()、nn.CrossEntropyLoss()详解

undo_try

已于 2024-05-13 23:27:06 修改

阅读量1.1k

点赞数 17

分类专栏： # python语法文章标签： pytorch 人工智能 python

于 2024-05-13 23:20:43 首次发布

本文链接：https://blog.csdn.net/qq_44665283/article/details/138823892

版权

python语法专栏收录该内容

10 篇文章 1 订阅

订阅专栏

Pytorch常用的函数(九)交叉熵损失函数nn.BCELoss()、nn.BCELossWithLogits()、nn.CrossEntropyLoss()详解

关于交叉熵公式推导以及理解，可以参考：

信息量、熵、KL散度、交叉熵概念理解

通过上面链接中的推导，二分类交叉熵损失函数：
$loss=-\frac{1}{n}\sum\limits_{i=1}^n(y_{i}log\hat{y_{i}}+(1-y_{i})log(1-\hat{y_i}))\\ n为批量样本$

多分类的交叉熵损失函数：
$loss=-\frac{1}{n}\sum\limits_{i=1}^n\sum\limits_{c=1}^my_{ic}log\hat{y}_{ic} \\ n为批量样本，m为分类数$
我们上面公式继续化简：
$loss=-\frac{1}{n}\sum\limits_{i=1}^n\sum\limits_{c=1}^my_{ic}log\hat{y}_{ic} \\ 我们现在只看一个样本：\\ loss(x,class)=-\sum\limits_{c=1}^my_{xc}log\hat{y}_{xc}\\ =-[y_{x1}log(\hat{y}_{x1})+ y_{x2}log(\hat{y}_{x2}) + ... + y_{xm}log(\hat{y}_{xm})] \\ 这个样本，只有class处y_{x[class]}=1，其他地方都为0，因此\\ loss(x,class)=-log(\hat{y}_{x[class]}) \\ 需要注意的是，在pytorch中x需要先进行softmax,因此\\ loss(x,class)=-log(\hat{y}_{x[class]})\\ =-log(\frac{e^{x[class]}}{\sum\limits_{j}e^{x[j]}}) \\ =-x[class]+log(\sum\limits_{j}e^{x[j]}) \\ 我们举个例子，假设预测三个类别的概率为[0.1, 0.2, 0.3]，标签class=1\\ loss(x,class)=-x[class]+log(\sum\limits_{j}e^{x[j]})\\ =-0.2+log(e^{x[0]}+e^{x[1]}+e^{x[2]})\\ =-0.2 + log(e^{0.1}+e^{0.2}+e^{0.3})$
现在得到了化简后的多分类交叉熵损失函数：
$对于单个样本x：\\ loss(x,class)=-log(\frac{e^{x[class]}}{\sum\limits_{j}e^{x[j]}}) =-x[class]+log(\sum\limits_{j}e^{x[j]})$

（1）二分类损失函数nn.BCELoss()、nn.BCELossWithLogits()

$loss=-\frac{1}{n}\sum\limits_{i=1}^n(y_{i}log\hat{y_{i}}+(1-y_{i})log(1-\hat{y_i}))\\ n为批量样本$

Pytorch链接：BCEWithLogitsLoss

torch.nn.BCEWithLogitsLoss(
    weight=None, 
    size_average=None, 
    reduce=None, 
    reduction='mean', # 默认计算的是批量样本损失的平均值,还可以为'sum'或者'none'
    pos_weight=None
)

# 可以输入参数的shape可以为任意维度，只不过Target要和Input一致
Input: (*), where *∗ means any number of dimensions.
Target: (*), same shape as the input.
# 如果reduction='mean'，输出标量，
# reduction='none'，输出Output的shape和input一致
Output: scalar. If reduction is 'none', then (*), same shape as input.

Pytorch链接：BCELoss

torch.nn.BCELoss(
    weight=None, 
    size_average=None, 
    reduce=None, 
    reduction='mean' # 默认计算的是批量样本损失的平均值,还可以为'sum'或者'none'
)

# 可以输入参数的shape可以为任意维度，只不过Target要和Input一致
Input: (*), where *∗ means any number of dimensions.
Target: (*), same shape as the input.
# 如果reduction='mean'，输出标量，
# reduction='none'，输出Output的shape和input一致
Output: scalar. If reduction is 'none', then (*), same shape as input.

在PyTorch中，提供了nn.BCELoss()、nn.BCELossWithLogits()作为二分类的损失函数；
其中BCEWithLogitsLoss方法，它可以直接将输入的值规范到0和1之间，相当于将Sigmoid和BCELoss集成在了一个方法中；
我们用代码了解下这两个二分类损失函数的区别和联系。

import numpy as np
import torch
from torch import nn
import torch.nn.functional as F

y = torch.tensor([1, 0, 1], dtype=torch.float)
y_hat = torch.tensor([0.8, 0.2, 0.4], dtype=torch.float)


bce_loss = nn.BCELoss()

# nn.BCELoss()需要先对输入数据进行sigmod
print("官方BCELoss = ", bce_loss(torch.sigmoid(y_hat), y))

# nn.BCEWithLogitsLoss()不需要自己sigmod
bcelogits_loss = nn.BCEWithLogitsLoss()
print("官方BCEWithLogitsLoss = ", bcelogits_loss(y_hat, y))

# 我们根据二分类交叉熵损失函数实现：
def loss(y_hat, y):
    y_hat = torch.sigmoid(y_hat)
    l = -(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))
    l = sum(l) / len(l)
    return l

print('自己实现Loss = ', loss(y_hat, y))

# 可以看到结果值相同
官方BCELoss           =  tensor(0.5608)
官方BCEWithLogitsLoss =  tensor(0.5608)
自己实现Loss           =  tensor(0.5608)

(2) nn.CrossEntropyLoss()

化简后的多分类交叉熵损失函数：
$对于单个样本x：\\ loss(x,class)=-log(\frac{e^{x[class]}}{\sum\limits_{j}e^{x[j]}}) =-x[class]+log(\sum\limits_{j}e^{x[j]})$
Pytorch链接：CrossEntropyLoss

torch.nn.CrossEntropyLoss(
    weight=None, 
    size_average=None, 
    ignore_index=-100, 
    reduce=None, 
    reduction='mean', 
    label_smoothing=0.0 # 同样，默认计算的是批量样本损失的平均值,还可以为'sum'或者'none'
)

shape如下所示：

可以看到Input的shape相对于Target的shape多了C
Target的shape和Output相等(当reduction=‘none’)

在这里插入图片描述

在关于二分类的问题中，输入交叉熵公式的网络预测值必须经过Sigmoid进行映射
而在多分类问题中，需要将Sigmoid替换成Softmax，这是两者的一个重要区别
CrossEntropyLoss = softmax+log+nll_loss的集成

cross_loss = nn.CrossEntropyLoss(reduction="none") # 设置为none，这里输入每个样本的loss值，不计算平均值
target = torch.tensor([0, 1, 2])

predict = torch.tensor([[0.9, 0.2, 0.8],
                        [0.5, 0.2, 0.4],
                        [0.4, 0.2, 0.9]]) # 未经过softmax
print('官方实现CrossEntropyLoss: ', cross_loss(predict, target))


# 自己实现方便理解版本的CrossEntropyLoss
def cross_loss(predict, target, reduction=None):
    all_loss = []
    for index, value in enumerate(target):
        # 利用多分类简化后的公式，对每一个样本求loss值
        loss = -predict[index][value] + torch.log(sum(torch.exp(predict[index])))
        all_loss.append(loss)
    all_loss = torch.stack(all_loss)
    if reduction == 'none':
        return all_loss
    else:
        return torch.mean(all_loss)

print('实现方便理解的CrossEntropyLoss: ', cross_loss(predict, target, reduction='none'))

# 利用F.nll_loss实现的CrossEntropyLoss
def cross_loss2(predict, target, reduction=None):
    # Softmax的缺点：
    # 1、如果有得分值特别大的情况，会出现上溢情况；
    # 2、如果很小的负值很多，会出现下溢情况（超出精度范围会向下取0），分母为0，导致计算错误。
    # 引入log_softmax可以解决上溢和下溢问题
    logsoftmax = F.log_softmax(predict)
    print('target = ', target)
    print('logsoftmax:\n', logsoftmax)
    # nll_loss不是将标签值转换为one-hot编码，而是根据target的值，索引到对应元素，然后取相反数。
    loss = F.nll_loss(logsoftmax, target, reduction=reduction)
    return loss

print('F.nll_loss实现的CrossEntropyLoss: ', cross_loss2(predict, target, reduction='none'))

官方实现CrossEntropyLoss:          tensor([0.8761, 1.2729, 0.7434])
实现方便理解的CrossEntropyLoss:     tensor([0.8761, 1.2729, 0.7434])

target =  tensor([0, 1, 2])
logsoftmax:
tensor([[-0.8761, -1.5761, -0.9761],
        [-0.9729, -1.2729, -1.0729],
        [-1.2434, -1.4434, -0.7434]])
F.nll_loss实现的CrossEntropyLoss:  tensor([0.8761, 1.2729, 0.7434])

最后提出一个问题，二分类问题，应该选择sigmoid还是softmax？

可以参考：二分类问题，应该选择sigmoid还是softmax？

undo_try

关注

17
点赞
踩
26

收藏

觉得还不错? 一键收藏
0
评论
Pytorch常用的函数(十)交叉熵损失函数nn.BCELoss()、nn.BCELossWithLogits()、nn.CrossEntropyLoss()详解

Pytorch常用的函数(九)交叉熵损失函数nn.BCELoss()、nn.BCELossWithLogits()、nn.CrossEntropyLoss()详解
复制链接

扫一扫