损失函数：交叉熵、KLDivLoss、标签平滑（LabelSmoothing）

FY_2018

已于 2022-08-07 10:00:26 修改

阅读量8.1k

点赞数 5

分类专栏：机器学习文章标签：深度学习机器学习神经网络

于 2021-08-15 17:43:26 首次发布

本文链接：https://blog.csdn.net/FY_2018/article/details/119716777

版权

机器学习专栏收录该内容

47 篇文章 3 订阅

订阅专栏

写在前面的话：input或x表示模型预测结果，target表示标签

1. torch.nn.CrossEntropyLoss()是交叉熵计算，输入的预测值不需要进行softmax，也不需要进行log运算！！！！直接用原始的预测输出，标签用整数序列。

官网说明：

The `input` is expected to contain raw, unnormalized scores for each class.

`input` size: (minibatch, C)

`target` : 1D tensor of size `minibatch`

2. torch.nn KLDivLoss()是计算KL散度的损失函数，要将模型输出的原始预测值要先进行softmax，然后进行log运算（torch.nn.functional.log_softmax可以直接实现），得到结果作为input输入到KLDivLoss中。target是二维的，形状与input一样

Shape:
    - Input: :math:`(N, *)` where :math:`*` means, any number of additional
      dimensions
    - Target: :math:`(N, *)`, same shape as the input
    - Output: scalar by default. If :attr:``reduction`` is ``'none'``, then :math:`(N, *)`,
      the same shape as the input

size_average=False

nn.KLDivLoss_咕噜咕噜day的博客-CSDN博客_kldivloss pytorch

KLDivLoss

作用：

用于连续分布的距离度量；并且对离散采用的连续输出空间分布进行回归通常很有用；用label_smoothing就采用这个；

公式：

所以这里 xn=log(q(xi)),所以预测值要进行log运算之后再传入到kldivloss函数中

公式理解：
p(x)是真实分布，q（x）是拟合分布；实际计算时；通常p（x）作为target，只是概率分布；而xn则是把输出做了LogSoftmax计算；即把概率分布映射到log空间；所以

K-L散度值实际是看log(p(x))-log(q(x))的差值，差值越小，说明拟合越相近。

pytorch使用:

当前版本torch(1.3.1)要想获得真正的KL散度；设置：

reduce=False；size_average=False

（reduce默认也是True，返回所有元素loss的和；size_average=默认是True，是对batch中每个元素进行求平均，当为False时，返回各样本各维度的loss之和；

因为reduce为False会忽略size_average参数，所以其实只需要把reduce=False即可）

二、Label Smoothing

Pytorch：交叉熵损失(CrossEntropyLoss)以及标签平滑(LabelSmoothing)的实现_我是大黄同学呀的博客-CSDN博客_标签平滑交叉熵

Label Smoothing也称之为标签平滑，其实是一种防止过拟合的正则化方法。传统的分类loss采用softmax loss，先对全连接层的输出计算softmax，视为各类别的置信度概率，再利用交叉熵计算损失。

	(1)
	(2)

三、CrossEntropyLoss

Pytorch：交叉熵损失(CrossEntropyLoss)以及标签平滑(LabelSmoothing)的实现_我是大黄同学呀的博客-CSDN博客_标签平滑交叉熵

实现 pytorch 中 torch.nn.CrossEntropyLoss_Agwave的博客-CSDN博客

相信大家对于如何计算交叉熵已经非常熟悉，常规步骤是①计算softmax得到各类别置信度；②计算交叉熵损失。但其实从Pytorch的官方文档可以看出，还有更一步到位的方法，如下：

(3)

x 的维度是 (batch_size, C)
class 的维度是 (batch_size)
(这里的 C 是分类的个数）

所以交叉熵最通俗的理解：目标标签值（class）对应的预测概率（softmax）取对数(-log，注意负号)，即为交叉熵。交叉熵之所以可以(2)式写成(3)式，是因为在(2)式中，pi是真实标签值，在遍历i的过程中，除了正确的标签值为1，其余的标签值都为0。所以用（2）式时，标签要用one-hot进行编码，用（3）式时，只需要标签的整数形式即可。

综合示例：

# -*- coding: utf-8 -*-
"""
Created on 2021/8/9 16:52
@author: Janben
参考链接：
1. 交叉熵：https://blog.csdn.net/qq_41805511/article/details/99438838
2. KLDivLoss: https://blog.csdn.net/qq_36533552/article/details/104034759
3. LabelSmoothing: https://blog.csdn.net/qq_36560894/article/details/118424356
"""
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as Function
from torch.autograd import Variable
import warnings
warnings.simplefilter('ignore')

class MyCrossEntropyLoss():

    def __init__(self, weight=None, size_average=True):
        """
        初始化参数，因为要实现 torch.nn.CrossEntropyLoss 的两个比较重要的参数

        :param weight: 给予每个类别不同的权重
        :param size_average: 是否要对 loss 求平均
        """

        self.weight = weight
        self.size_average = size_average


    def __call__(self, input, target):
        """
        计算损失
        这个方法让类的实例表现的像函数一样，像函数一样可以调用

        :param input: (batch_size, C)，C是类别的总数
        :param target: (batch_size, 1)
        :return: 损失
        """

        batch_loss = 0.
        for i in range(input.shape[0]):
            # print('***',input[i, target[i]],i,target[i],np.exp(input[i, :]))
            numerator = torch.exp(input[i, target[i]])     # 分子
            denominator = torch.sum(torch.exp(input[i, :]))   # 分母

            # 计算单个损失
            loss = -torch.log(numerator / denominator)
            if self.weight:
                loss = self.weight[target[i]] * loss
            print("单个损失： ",loss)

            # 损失累加
            batch_loss += loss

        # 整个 batch 的总损失是否要求平均
        if self.size_average == True:
            batch_loss /= input.shape[0]

        return batch_loss


class MyKLDivLossFunc(nn.Module):
    def __init__(self,reduce = True):
        super(MyKLDivLossFunc,self).__init__()
        self.reduce = reduce

    def forward(self,x,target):
        logtarget = torch.log(target+0.00001)  #加一个非常小的数，防止当target中有0时log得到-inf
        loss = target*(logtarget-x)
        if self.reduce == False:
            return loss
        else:
            return torch.sum(loss)

class LabelSmoothingLoss(nn.Module):
    "Implement label smoothing."

    def __init__(self, class_num, smoothing):
        '''
        :param class_num: 有5个类别，那么class_num=5
        :param smoothing: 标签平滑的程度，为0时表示不进行标签平滑
        '''
        super(LabelSmoothingLoss, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.class_num = class_num
        # self.criterion = nn.KLDivLoss(size_average=True)
        # self.criterion = nn.KLDivLoss(size_average=False)

    def forward(self, x, target):
        '''
        :param x: 预测结果，形状为(batchsize,classnum)
        :param target: 真实标签，形状为(batchsize,)
        :return:
        '''
        # print(x.shape)
        assert x.size(1) == self.class_num
        # if self.smoothing <=0.0 or self.smoothing == None:
        if self.smoothing == None:
            return nn.CrossEntropyLoss()(x,target)

        true_dist = x.data.clone()
        true_dist.fill_(self.smoothing / (self.class_num-1))
        true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)  #此行代码实现了标签平滑
        #计算交叉熵，与nn.CrossEntropyLoss()公式一样，所以当smoothing=0.0时，输出的损失值与nn.CrossEntropyLoss()的一样的
        logprobs = Function.log_softmax(x,dim=1)  #softmax+log
        mean_loss = -torch.sum(true_dist*logprobs)/x.size(-2)  #平均损失，所以要除以样本数量
        return mean_loss,true_dist

if __name__ == "__main__":
    input = np.array([[-1.5616, -0.7906,  1.4143, -0.0957,  0.1657],
        [-1.4285,  0.3045,  1.5844, -2.1508,  1.8181],
        [ 1.0205, -1.3493, -1.2965,  0.1715, -1.2118]])
    target = np.array([2, 0, 3])
    test_input = torch.from_numpy(input)
    test_target = torch.from_numpy(target).long()
    #自定义的交叉熵函数
    criterion = MyCrossEntropyLoss()
    # 类中实现了 __call__，所以类实例可以像函数一样可以调用
    loss = criterion(test_input, test_target)
    print("+++My_CrossEntroy： ", loss) #输出：  tensor(1.9606, dtype=torch.float64)

    #torch.nn中库函数
    #The `input` is expected to contain raw, unnormalized scores for each class.
    #Input: 形状为(N, C)` where `C = number of classes`，N是batchsize
    #交叉熵的input不需要进行任何标准化（不需要softmax，不需要log)，用原始的数据
    #Target: :math:`(N)`
    test_loss = nn.CrossEntropyLoss()
    test_out = test_loss(test_input,test_target)
    print('+++Normal CrossEntroy:',test_out) #test loss: tensor(1.9606, dtype=torch.float64)

    print('+'*50)
    lloss = LabelSmoothingLoss(5,smoothing=0.1)
    loss_result, true_dist = lloss.forward(test_input,test_target)
    print('label smoothing loss result:',loss_result)  #label smoothing loss result: tensor(2.2265, dtype=torch.float64)
    lloss = LabelSmoothingLoss(5,smoothing=0.)
    loss_result,_ = lloss.forward(test_input,test_target)
    print('label smoothing loss result:',loss_result)


    print('-' * 50)  #以下是验证自定义的KLDivLoss的正确性
    #以下的test_input要理解为已经完成log运算之后的数据
    print('normal kld loss:\n\t\t', nn.KLDivLoss(size_average=True, reduce=True)(test_input, Variable(true_dist, requires_grad=False)))
    print('normal kld loss(default):\n\t\t',nn.KLDivLoss()(test_input,Variable(true_dist,requires_grad=False))) #默认size_average=True, reduce=True
    print('normal kld loss:\n\t\t', nn.KLDivLoss(size_average=False, reduce=True)(test_input, Variable(true_dist, requires_grad=False)))
    print('my kld loss:\n\t\t', MyKLDivLossFunc(reduce=True)(test_input, true_dist)) #自己实现的不进行size_average,size_average是指除以元素个数，本例中元素个数为15
    print('-'*50)
    print('normal kld loss:\n\t\t', nn.KLDivLoss(size_average=True, reduce=False)(test_input, Variable(true_dist, requires_grad=False)))
    print('normal kld loss:\n\t\t', nn.KLDivLoss(size_average=False, reduce=False)(test_input, Variable(true_dist, requires_grad=False)))
    print('my kld loss:\n\t\t', MyKLDivLossFunc(reduce=False)(test_input, true_dist))

    p = torch.Tensor([[0,0.1,0.3],[0.1,0.9,0.3],[0,0.1,0.]])
    t = torch.Tensor([[0,0,1.],[0,1.,0],[1.,0,0]])
    c = nn.KLDivLoss(size_average=False)
    print(c(p,t))
    print(MyKLDivLossFunc()(p,t))
    print(torch.log(t+0.00001))

运行结果：