【损失函数】NLLLoss损失、CrossEntropy_Loss交叉熵损失以及Label Smoothing示例与代码

最新推荐文章于 2024-01-13 11:22:24 发布

风巽·剑染春水

最新推荐文章于 2024-01-13 11:22:24 发布

阅读量1.7k

点赞数 7

文章标签：深度学习损失函数 pytorch 计算机视觉

本文链接：https://blog.csdn.net/qq_43426908/article/details/127596364

版权

机缘巧合下，近期又详细学习了一遍各损失函数的计算，特此记录以便后续回顾。

为了公式表示更加清晰，我们设 ${{y_n} \in \{ 1,2, \ldots ,K\} }$ 为样本 ${n}$ 的真实标签， $({v_1},{v_2}, \ldots {v_K})}$ 为网络的输出，即样本 ${n}$ 的预测结果，设 ${N}$ 为一批样本的数目（即Batch size）， ${K}$ 为分类任务的类别数目。

为了本文例子的统一展示，我们为网络的输出 ${preds}$ 和标签 ${target}$ 赋值，即只有两个样本，一个标签为 ${2}$ ，另一个标签为 ${3}$ 。
${preds = [[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]]}$
${target = [2, 3]}$

一、NLLLoss损失与CrossEntropy_Loss交叉熵损失

1. Softmax

Softmax是网络输出后第一步操作，其公式可表示为：
${\frac{{{e^{{v_{{y_n}}}}}}}{{\sum\nolimits_{m = 1}^K {{e^{{v_m}}}} }}}$ 由于网络的输出有正有负，有大有小，Softmax主要是将输出概率标准化到 ${[0,1]}$ 之间，方便比较，示例计算如下：
${[0.1,0.2,0.3,0.4]\mathop \to \limits^{{\mathop{\rm softmax}\nolimits} } \left[ {\frac{{{e^{0.1}}}}{{{S_1}}},\frac{{{e^{0.2}}}}{{{S_1}}},\frac{{{e^{0.3}}}}{{{S_1}}},\frac{{{e^{0.4}}}}{{{S_1}}}} \right] = [0.2138,0.2363,0.2612,0.2887]}$ ${[0.1,0.1,0.1,0.1]\mathop \to \limits^{{\mathop{\rm softmax}\nolimits} } \left[ {\frac{{{e^{0.1}}}}{{{S_2}}},\frac{{{e^{0.1}}}}{{{S_2}}},\frac{{{e^{0.1}}}}{{{S_2}}},\frac{{{e^{0.1}}}}{{{S_2}}}} \right] = [0.2500,0.2500,0.2500,0.2500]}$ 其中， ${{S_1} = {e^{0.1}} + {e^{0.2}} + {e^{0.3}} + {e^{0.4}}}$ ， ${{S_2} = {e^{0.1}} + {e^{0.1}} + {e^{0.1}} + {e^{0.1}}}$

代码实现为：

import torch
import torch.nn.functional as F

preds = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]])

exp = torch.exp(preds)
sum_ = torch.sum(exp, dim=1).reshape(-1, 1)
softmax = exp / sum_
print('手动计算softmax:\n', softmax)

softmax_ = F.softmax(preds, dim=1)
print('函数计算softmax:\n', softmax_)

输出是一样的：

手动计算softmax:
 tensor([[0.2138, 0.2363, 0.2612, 0.2887],
        [0.2500, 0.2500, 0.2500, 0.2500]])
函数计算softmax:
 tensor([[0.2138, 0.2363, 0.2612, 0.2887],
        [0.2500, 0.2500, 0.2500, 0.2500]])

2. Log_Softmax

Log_Softmax，算如其名，就是在Softmax之后进行 ${Log}$ ，其公式可表示为：
${\log (\frac{{{e^{{v_{{y_n}}}}}}}{{\sum\nolimits_{m = 1}^K {{e^{{v_m}}}} }})}$
值得注意的是，这里的 ${Log}$ 是以 ${e}$ 为底的，即数学中的 ${In}$ ，示例计算如下：
$\to [In(0.2138),In(0.2363),In(0.2612),In(0.2887)]}$ $\to [In(0.2500),In(0.2500),In(0.2500),In(0.2500)]}$ 此处手动计算与代码计算会由于保留小数问题存在微小的差异，保留小数更多时，就一样了。

代码实现为：

import torch
import torch.nn.functional as F

preds = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]])

exp = torch.exp(preds)
sum_ = torch.sum(exp, dim=1).reshape(-1, 1)
softmax = exp / sum_
log_softmax = torch.log(softmax) 
print('手动计算log_softmax:\n', log_softmax)

softmax_ = F.log_softmax(preds, dim=1)
print('函数计算log_softmax:\n', softmax_)

输出是一样的：

手动计算log_softmax:
 tensor([[-1.5425, -1.4425, -1.3425, -1.2425],
        [-1.3863, -1.3863, -1.3863, -1.3863]])
函数计算log_softmax:
 tensor([[-1.5425, -1.4425, -1.3425, -1.2425],
        [-1.3863, -1.3863, -1.3863, -1.3863]])

3. NLLLoss

NLLLoss损失，即对Log_Softmax之后的结果，将样本标签对应位置的数值进行相加，再除以样本量，最后再去负号，因为 ${Log}$ 之后是负数，损失需要转换为正值。在我们的示例中：
对 ${[ - 1.5425, - 1.4425, - 1.3425, - 1.2425]}$ 和 ${[-1.3863, -1.3863, -1.3863, -1.3863]}$ 标签对应位置 ${target = [2, 3]}$ 上的数值相加除 ${2}$ 再取负，即：
$\frac{{( - 1.3425) + ( - 1.3863)}}{2}{\rm{ = }}1.3644}$ 代码实现为：

import torch
import torch.nn.functional as F

preds = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]])
target = torch.tensor([2, 3])

exp = torch.exp(preds)
sum_ = torch.sum(exp, dim=1).reshape(-1, 1)
softmax = exp / sum_
log_softmax = torch.log(softmax) 

one_hot = F.one_hot(target).float() 
nllloss = -torch.sum(one_hot * log_softmax) / target.shape[0]
print('手动计算nllloss:\n', nllloss)

Log_Softmax = F.log_softmax(preds, dim=1)  
Nllloss = F.nll_loss(Log_Softmax, target)  
print('函数计算nllloss:\n', Nllloss)

输出是一样的：

手动计算nllloss:
 tensor(1.3644)
函数计算nllloss:
 tensor(1.3644)

4. CrossEntropy_Loss

有了对Softmax、Log_Softmax和NLLLoss损失的了解，交叉熵损失CrossEntropy_Loss就是他们的齐活：
${CrossEntropy\_Loss = Softmax + Log + NLLLoss}$ = ${Log\_Softmax + NLLLoss}$
公式可表示为：
$\frac{1}{N}\sum\limits_{n = 1}^N {\log (\frac{{{e^{{v_{{y_n}}}}}}}{{\sum\nolimits_{m = 1}^K {{e^{{v_m}}}} }})}}$
CrossEntropy_Loss与NLLLoss计算结果是一致的，因为NLLLoss的输入一般也是Log_Softmax之后的结果。

完整代码实现为：

import torch
import torch.nn.functional as F

preds = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]])
target = torch.tensor([2, 3])

one_hot = F.one_hot(target).float() # 对标签作 one_hot 编码
print('[1]one_hot编码target:\n', one_hot)
exp = torch.exp(preds)
print('[2]对网络预测preds求指数:\n', exp)
sum_ = torch.sum(exp, dim=1).reshape(-1, 1)  # 按行求和
softmax = exp / sum_  # 计算 softmax()
print('[3]softmax操作:\n', softmax)
log_softmax = torch.log(softmax) # 计算 log_softmax()
print('[4]softmax后取对数:\n', log_softmax)
nllloss = -torch.sum(one_hot * log_softmax) / target.shape[0]  # 标签乘以激活后的数据，求平均值，取反
print("[5]手动使用nllloss计算交叉熵:", nllloss)

print('----------------------------------------------')
# 调用 NLLLoss() 函数计算
Log_Softmax = F.log_softmax(preds, dim=1)  # log_softmax() 激活
Nllloss = F.nll_loss(Log_Softmax, target)  # 无需对标签作 one_hot 编码
print("函数使用Nllloss计算交叉熵:", Nllloss)
# 直接使用交叉熵损失函数 CrossEntropy_Loss()
cross_entropy = F.cross_entropy(preds, target)  # 无需对标签作 one_hot 编码
print('函数交叉熵cross_entropy:', cross_entropy)

输出为：

[1]one_hot编码target:
 tensor([[0., 0., 1., 0.],
        [0., 0., 0., 1.]])
[2]对网络预测preds求指数:
 tensor([[1.1052, 1.2214, 1.3499, 1.4918],
        [1.1052, 1.1052, 1.1052, 1.1052]])
[3]softmax操作:
 tensor([[0.2138, 0.2363, 0.2612, 0.2887],
        [0.2500, 0.2500, 0.2500, 0.2500]])
[4]softmax后取对数:
 tensor([[-1.5425, -1.4425, -1.3425, -1.2425],
        [-1.3863, -1.3863, -1.3863, -1.3863]])
[5]手动使用nllloss计算交叉熵: tensor(1.3644)
----------------------------------------------
函数使用Nllloss计算交叉熵: tensor(1.3644)
函数交叉熵cross_entropy: tensor(1.3644)

二、交叉熵损失的Label Smoothing

Label Smoothing (论文传送) 是一种正则化手段，在一定程度上可以避免模型的过拟合。在交叉熵损失CrossEntropy_Loss中，非标签对应位置的预测信息是没有被使用的，而Label Smoothing使用了这种信息，宏观上讲，也是略微改变了标签的分布，使得标签不在是非0即1了，故而称为标签平滑。
Label Smoothing的公式可表示为：
$\varepsilon ) \cdot [ - \frac{1}{N}\sum\limits_{n = 1}^N {\log (\frac{{{e^{{v_{{y_n}}}}}}}{{\sum\nolimits_{m = 1}^K {{e^{{v_m}}}} }})} ] + \varepsilon \cdot [ - \frac{1}{{NK}}\sum\limits_{n = 1}^N {\sum\limits_{k = 1}^K {\log (\frac{{{e^{{v_k}}}}}{{\sum\nolimits_{m = 1}^K {{e^{{v_m}}}} }})} } ]}$ 从公式可以看出，系数为 $\varepsilon )}$ 的前一部分就是交叉熵损失，后一部分，涵盖了非标签对应位置上的预测信息，在我们的示例中，后一部分的计算为：
$1.2425]\mathop \to \limits^{{\rm{sum}}} - 5.5700}$ $1.3863]\mathop \to \limits^{{\rm{sum}}} - 5.5452}$ 对Log_Softmax之后的结果求和，取负数，再除以样本量 ${2}$ 和分类类别数 ${4}$ ，得到：
$\frac{{( - 5.5700) + ( - 5.5452)}}{{2 \times 4}} = {\rm{1}}{\rm{.3894}}}$ 最后以 ${\varepsilon}$ 系数与交叉熵损失进行加权，设 ${\varepsilon}=0.1$ ，得到：
$\times 1.3644 + 0.1 \times 1.3894 = 1.3669}$ 代码实现为：

import torch
import torch.nn.functional as F
import torch.nn as nn

def linear_combination(x, y, epsilon):
    return epsilon * x + (1 - epsilon) * y

def reduce_loss(loss, reduction='mean'):
    return loss.mean() if reduction == 'mean' else loss.sum() if reduction == 'sum' else loss

class LabelSmoothing_CrossEntropy(nn.Module):
    def __init__(self, epsilon: float = 0.1, reduction='mean'):
        super().__init__()
        self.epsilon = epsilon
        self.reduction = reduction

    def forward(self, preds, target):
        n = preds.size()[-1]    
        log_preds = F.log_softmax(preds, dim=-1)    
        loss = reduce_loss(-log_preds.sum(dim=-1), self.reduction)      
        nll = F.nll_loss(log_preds, target, reduction=self.reduction)   
        return linear_combination(loss / n, nll, self.epsilon)

preds = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]])
target = torch.tensor([2, 3])

ls = LabelSmoothing_CrossEntropy()
lsloss = ls(preds, target)
print('Label smoothing损失:', lsloss)

输出为：

Label smoothing损失: tensor(1.3669)

风巽·剑染春水

关注

7
点赞
踩
21

收藏

觉得还不错? 一键收藏
0
评论
【损失函数】NLLLoss损失、CrossEntropy_Loss交叉熵损失以及Label Smoothing示例与代码

机缘巧合下，近期又详细学习了一遍各损失函数的计算，特此记录以便后续回顾。为了公式表示更加清晰，我们设yn∈{1,2,…,K}为样本n的真实标签，v=(v1,v2,…vK)为网络的输出，即样本n的预测结果，设N为一批样本的数目（即Batch size），K为分类任务的类别数目。
复制链接

扫一扫