softmax sigmoid计算交叉熵损失

一、softmax计算交叉熵损失

1.计算softmax输出

假设模型对某种图像的输出logits为2, 3, 0.1,则经过softmax后,为0.2585, 0.7028, 0.0387。假设该图像真实标签为1,则one-hot编码为[0, 1, 0]。

import torch
from torch import nn

logits = torch.tensor([2, 3, 0.1], dtype=torch.float)
print('logits:', logits)
print('logits.shape:', logits.shape)

softmax_out = torch.softmax(logits, dim=0)
print('softmax_out:', softmax_out)
print('softmax_out.shape:', softmax_out.shape)


def softmax_cal(x):
    return torch.exp(x) / torch.sum(torch.exp(x))


softmax_out_v2 = softmax_cal(logits)
print('softmax_out_v2:', softmax_out_v2)
print('softmax_out_v2.shape:', softmax_out_v2.shape)
logits: tensor([2.0000, 3.0000, 0.1000])
logits.shape: torch.Size([3])
softmax_out: tensor([0.2585, 0.7028, 0.0387])
softmax_out.shape: torch.Size([3])
softmax_out_v2: tensor([0.2585, 0.7028, 0.0387])
softmax_out_v2.shape: torch.Size([3])

Process finished with exit code 0

2.计算loss

计算logits经过softmax后与真实标签的交叉熵损失,使用nn.CrossEntropyLoss()计算,与手动计算结果均为0.3527。nn.CrossEntropyLoss()中已经包含了softmax操作。
loss = -(0 * log(0.2585) + 1 * log(0.7028) + 0 * log(0.0387)) = 0.3527

import torch
from torch import nn

logits = torch.tensor([2, 3, 0.1], dtype=torch.float)
logits = logits.unsqueeze(0)
label = torch.tensor([1], dtype=torch.long)

print('logits:', logits)
print('logits.shape:', logits.shape)
print('label:', label)
print('label.shape', label.shape)
print('---softmax输出---')
softmax_out = torch.softmax(logits, dim=1)
print('softmax_out:', softmax_out)
print('softmax_out.shape:', softmax_out.shape)


def softmax_cal(x):
    return torch.exp(x) / torch.sum(torch.exp(x))


softmax_out_v2 = softmax_cal(logits)
print('softmax_out_v2.shape:', softmax_out_v2.shape)

print('---计算loss---')
softmax_loss = nn.CrossEntropyLoss()(logits, label)
print('nn.CrossEntropyLoss()计算softmax_loss:', softmax_loss)
print(softmax_loss.shape)

softmax_loss_v2 = -1 * torch.log(torch.tensor(0.7028))
print('手动计算softmax_loss_v2', softmax_loss_v2)
print(softmax_loss_v2.shape)

计算结果

logits: tensor([[2.0000, 3.0000, 0.1000]])
logits.shape: torch.Size([1, 3])
label: tensor([1])
label.shape torch.Size([1])
---softmax输出---
softmax_out: tensor([[0.2585, 0.7028, 0.0387]])
softmax_out.shape: torch.Size([1, 3])
softmax_out_v2.shape: torch.Size([1, 3])
---计算loss---
nn.CrossEntropyLoss()计算softmax_loss: tensor(0.3527)
torch.Size([])
手动计算softmax_loss_v2 tensor(0.3527)
torch.Size([])

二、sigmoid计算交叉熵损失

1.sigmoid输出

同样假设模型对某种图像的输出logits为2, 3, 0.1,则经过sigmoid后,为0.8808, 0.9526, 0.5250。假设该图像真实标签为1,则one-hot编码为[0, 1, 0]。

logits = torch.tensor([2, 3, 0.1], dtype=torch.float)
logits = logits.unsqueeze(0)
label = torch.tensor([1], dtype=torch.long)

print('---sigmoid输出---')
sigmoid_out = torch.sigmoid(logits)
print('sigmoid_out:', sigmoid_out)
print('sigmoid_out.shape:', sigmoid_out.shape)


def sigmoid_cal(x):
    return 1 / (1 + torch.exp(-x))


sigmoid_out_v2 = sigmoid_cal(logits)
print('sigmoid_out_v2:', sigmoid_out_v2)
print('sigmoid_v2.shape:', sigmoid_out_v2.shape)

sigmoid输出

sigmoid_out: tensor([[0.8808, 0.9526, 0.5250]])
sigmoid_out.shape: torch.Size([1, 3])
sigmoid_out_v2: tensor([[0.8808, 0.9526, 0.5250]])
sigmoid_v2.shape: torch.Size([1, 3])

2.计算loss

logits = [2, 3, 0.1];label = 1
首先使用nn.BCEWithLogitsLoss()进行计算,与nn.CrossEntropyLoss()不同的是,如果直接将label传入nn.nn.BCEWithLogitsLoss()会报错。应对先将label转为one-hot形式,再传入计算。

logits = torch.tensor([2, 3, 0.1], dtype=torch.float)
logits = logits.unsqueeze(0)
label = torch.tensor([1], dtype=torch.long)

print('---sigmoid输出---')
sigmoid_out = torch.sigmoid(logits)
print('sigmoid_out:', sigmoid_out)
print('sigmoid_out.shape:', sigmoid_out.shape)


def sigmoid_cal(x):
    return 1 / (1 + torch.exp(-x))


sigmoid_out_v2 = sigmoid_cal(logits)
print('sigmoid_out_v2:', sigmoid_out_v2)
print('sigmoid_v2.shape:', sigmoid_out_v2.shape)


true_label = torch.tensor([0, 1, 0], dtype=torch.float)
true_label = true_label.unsqueeze(0)
#sigmoid_loss = nn.BCEWithLogitsLoss()(logits, true_label)
sigmoid_loss = nn.BCEWithLogitsLoss()(logits, label)
print('nn.BCEWithLogitsLoss()计算sigmoid_loss:', sigmoid_loss)

报错

raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([1])) must be the same as input size (torch.Size([1, 3]))

将label=1,转为one-hot, [0, 1, 0]进行计算

logits = torch.tensor([2, 3, 0.1], dtype=torch.float)
logits = logits.unsqueeze(0)
label = torch.tensor([1], dtype=torch.long)

print('---sigmoid输出---')
sigmoid_out = torch.sigmoid(logits)
print('sigmoid_out:', sigmoid_out)
print('sigmoid_out.shape:', sigmoid_out.shape)

def sigmoid_cal(x):
    return 1 / (1 + torch.exp(-x))

sigmoid_out_v2 = sigmoid_cal(logits)
print('sigmoid_out_v2:', sigmoid_out_v2)
print('sigmoid_v2.shape:', sigmoid_out_v2.shape)

true_label = torch.tensor([0, 1, 0], dtype=torch.float)
true_label = true_label.unsqueeze(0)
sigmoid_loss = nn.BCEWithLogitsLoss()(logits, true_label)
#sigmoid_loss = nn.BCEWithLogitsLoss()(logits, label)
print('nn.BCEWithLogitsLoss()计算sigmoid_loss:', sigmoid_loss)
sigmoid_out: tensor([[0.8808, 0.9526, 0.5250]])
sigmoid_out.shape: torch.Size([1, 3])
sigmoid_out_v2: tensor([[0.8808, 0.9526, 0.5250]])
sigmoid_v2.shape: torch.Size([1, 3])
nn.BCEWithLogitsLoss()计算sigmoid_loss: tensor(0.9733)

那么,这个0.9733是如何得到的呢?如果按照刚刚softmax计算交叉熵的思路:
loss = -(0 * log(0.8808) + 1 * log(0.9526) + 0 * log(0.5250)) = 0.0486,结果并不是0.9733。
在使用nn.CrossEntropyLoss()时,传入label = 1即可计算,而在使用nn.BCEWithLogitsLoss()却需要使用one-hot label = [0,1,0]。个人认为在nn.BCEWithLogitsLoss()是在计算3个二分类损失并求平均值,而nn.CrossEntropyLoss()是在计算1个三分类损失。
为了验证这一点,结合logits = [2, 3, 0.1]经过sigmoid后的输出[0.8808, 0.9526, 0.5250]与one-hot label = [0,1,0],使用二分类交叉熵计算表达式
在这里插入图片描述

计算loss:

loss_0 = -(0 * log(0.8808) + (1-0) * log(1-0.8808))
loss_1 = -(1 * log(0.9526) + (1-1) * log(1-0.9526))
loss_2 = -(0*log(0.5250)) + (1-0) * log(0.5250))
loss = (loss_0 + loss_1 + loss_2) / 3 = 0.9733
结果与使用nn.BCEWithLogitsLoss()计算一致
在此处one-hot label = [0,1,0]可以理解成[[1,0], [0,1], [1,0]],对每个类别又one-hot一次,0为无该类别属性,1为有该类别属性。

logits = torch.tensor([2, 3, 0.1], dtype=torch.float)
logits = logits.unsqueeze(0)
label = torch.tensor([1], dtype=torch.long)

print('---sigmoid输出---')
sigmoid_out = torch.sigmoid(logits)
print('sigmoid_out:', sigmoid_out)
print('sigmoid_out.shape:', sigmoid_out.shape)


def sigmoid_cal(x):
    return 1 / (1 + torch.exp(-x))


sigmoid_out_v2 = sigmoid_cal(logits)
print('sigmoid_out_v2:', sigmoid_out_v2)
print('sigmoid_v2.shape:', sigmoid_out_v2.shape)


true_label = torch.tensor([0, 1, 0], dtype=torch.float)
true_label = true_label.unsqueeze(0)
sigmoid_loss = nn.BCEWithLogitsLoss()(logits, true_label)
#sigmoid_loss = nn.BCEWithLogitsLoss()(logits, label)
print('nn.BCEWithLogitsLoss()计算sigmoid_loss:', sigmoid_loss)

sigmoid_loss_v2 = -1 * torch.log(torch.tensor([1 - 0.8808])) - 1 * torch.log(torch.tensor(0.9526)) - 1 * torch.log(
    torch.tensor(1 - 0.5250))
print('手动计算sigmoid_loss_v2', sigmoid_loss_v2 / 3)

err_loss = -1 * torch.log(torch.tensor(0.9526))
print('错误err_loss:', err_loss)

输出结果:

---sigmoid输出---
sigmoid_out: tensor([[0.8808, 0.9526, 0.5250]])
sigmoid_out.shape: torch.Size([1, 3])
sigmoid_out_v2: tensor([[0.8808, 0.9526, 0.5250]])
sigmoid_v2.shape: torch.Size([1, 3])
nn.BCEWithLogitsLoss()计算sigmoid_loss: tensor(0.9733)
手动计算sigmoid_loss_v2 tensor([0.9733])
错误err_loss: tensor(0.0486)

三、总结

1.在上述例子中使用nn.CrossEntropyLoss(),计算的是3分类损失。而使用nn.BCEWithLogitsLoss(),计算的是3个二分类损失。
2.使用sigmoid激活,可以用于多标签图像分类。
3.二分类损失计算公式中的pi为正样本的概率。具体来说,在使用softmax时为对类别1的预测概率,在使用sigmoid时,就为sigmoid的输出。

参考链接:
CrossEntropy交叉熵损失函数及softmax函数的理解
一文读懂交叉熵损失函数

完整实验代码

import torch
from torch import nn

logits = torch.tensor([2, 3, 0.1], dtype=torch.float)
logits = logits.unsqueeze(0)
label = torch.tensor([1], dtype=torch.long)

print('logits:', logits)
print('logits.shape:', logits.shape)
print('label:', label)
print('label.shape', label.shape)
print('---softmax输出---')
softmax_out = torch.softmax(logits, dim=1)
print('softmax_out:', softmax_out)
print('softmax_out.shape:', softmax_out.shape)


def softmax_cal(x):
    return torch.exp(x) / torch.sum(torch.exp(x))


softmax_out_v2 = softmax_cal(logits)
print('softmax_out_v2.shape:', softmax_out_v2.shape)

print('---计算loss---')
softmax_loss = nn.CrossEntropyLoss()(logits, label)
print('nn.CrossEntropyLoss()计算softmax_loss:', softmax_loss)
print(softmax_loss.shape)

softmax_loss_v2 = -1 * torch.log(torch.tensor(0.7028))
print('手动计算softmax_loss_v2', softmax_loss_v2)
print(softmax_loss_v2.shape)
##########################
logits = torch.tensor([2, 3, 0.1], dtype=torch.float)
logits = logits.unsqueeze(0)
label = torch.tensor([1], dtype=torch.long)

print('---sigmoid输出---')
sigmoid_out = torch.sigmoid(logits)
print('sigmoid_out:', sigmoid_out)
print('sigmoid_out.shape:', sigmoid_out.shape)


def sigmoid_cal(x):
    return 1 / (1 + torch.exp(-x))


sigmoid_out_v2 = sigmoid_cal(logits)
print('sigmoid_out_v2:', sigmoid_out_v2)
print('sigmoid_v2.shape:', sigmoid_out_v2.shape)


true_label = torch.tensor([0, 1, 0], dtype=torch.float)
true_label = true_label.unsqueeze(0)
sigmoid_loss = nn.BCEWithLogitsLoss()(logits, true_label)
#sigmoid_loss = nn.BCEWithLogitsLoss()(logits, label)
print('nn.BCEWithLogitsLoss()计算sigmoid_loss:', sigmoid_loss)

sigmoid_loss_v2 = -1 * torch.log(torch.tensor([1 - 0.8808])) - 1 * torch.log(torch.tensor(0.9526)) - 1 * torch.log(
    torch.tensor(1 - 0.5250))
print('手动计算sigmoid_loss_v2', sigmoid_loss_v2 / 3)

err_loss = -1 * torch.log(torch.tensor(0.9526))
print('错误err_loss:', err_loss)
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值