[十六]深度学习Pytorch-损失函数loss function

yanzhiwen2

已于 2022-04-14 19:50:47 修改

阅读量895

点赞数

分类专栏：深度学习Pyrotch 文章标签： pytorch 深度学习 python 机器学习人工智能

于 2022-04-11 21:47:13 首次发布

本文链接：https://blog.csdn.net/yanzhiwen2/article/details/124096032

版权

0. 往期内容

[一]深度学习Pytorch-张量定义与张量创建

[二]深度学习Pytorch-张量的操作：拼接、切分、索引和变换

[三]深度学习Pytorch-张量数学运算

[四]深度学习Pytorch-线性回归

[五]深度学习Pytorch-计算图与动态图机制

[六]深度学习Pytorch-autograd与逻辑回归

[七]深度学习Pytorch-DataLoader与Dataset(含人民币二分类实战)

[八]深度学习Pytorch-图像预处理transforms

[九]深度学习Pytorch-transforms图像增强(剪裁、翻转、旋转)

[十]深度学习Pytorch-transforms图像操作及自定义方法

[十一]深度学习Pytorch-模型创建与nn.Module

[十二]深度学习Pytorch-模型容器与AlexNet构建

[十三]深度学习Pytorch-卷积层(1D/2D/3D卷积、卷积nn.Conv2d、转置卷积nn.ConvTranspose)

[十四]深度学习Pytorch-池化层、线性层、激活函数层

[十五]深度学习Pytorch-权值初始化

[十六]深度学习Pytorch-18种损失函数loss function

1. 损失函数概念

在这里插入图片描述

regularization-正则化
在这里插入图片描述

2. 18种损失函数

2.1 CrossEntropyLoss(weight=None, ignore_index=- 100, reduction=‘mean’)

CrossEntropyLoss(weight=None, size_average=None, ignore_index=- 100, reduce=None, reduction='mean', label_smoothing=0.0)

交叉熵损失函数并不是公式意义上的交叉熵计算，而是有不同之处。它采用softmax对数据进行了归一化，把数据值归一化到概率输出的形式。交叉熵损失函数常常用于分类任务中。交叉熵是衡量两个概率分布之间的差异，交叉熵值越低，表示两个概率分布越近。
在这里插入图片描述
(1)熵用来描述一个事件的不确定性，不确定性越大则熵越大.
(2)信息熵是自信息的期望，自信息用于衡量单个输出单个事件的不确定性.
(3)相对熵又叫做KL散度，用于衡量两个分布之间的差异，即两个分布之间的距离，但不是距离的函数，不具备对称性。P是真实的分布，Q是模型输出的分布，Q需要逼近、拟合P的分布.
(4)交叉熵用于衡量两个分布之间的相似度.
(5)P是真实的分布，即训练集中样本的分布。由于训练集是固定的，概率分布也是固定的，H(P)是个常数.

在这里插入图片描述
p(xi)=1，则可以转换为如下：

在这里插入图片描述
需要将Q(xi)概率值归一化：

代码示例：

# fake data
#二分类任务，输出神经元2个，batchsize是3，即三个样本：[1,2] [1,3] [1,3]
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
#dtype必须是long，有多少个样本，tensor（1D）就有多长。
target = torch.tensor([0, 1, 1], dtype=torch.long)

# ----------------------------------- CrossEntropy loss: reduction -----------------------------------
flag = 0
# flag = 1
if flag:
    # def loss function
    loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

    # forward
    loss_none = loss_f_none(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)

    # view
    print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)
    #输出为[1.3133, 0.1269, 0.1269] 1.5671 0.5224
    
# --------------------------------- compute by hand
flag = 0
# flag = 1
if flag:

    idx = 0

    input_1 = inputs.detach().numpy()[idx]      # [1, 2]
    target_1 = target.numpy()[idx]              # [0]

    # 第一项
    x_class = input_1[target_1]

    # 第二项
    sigma_exp_x = np.sum(list(map(np.exp, input_1)))
    log_sigma_exp_x = np.log(sigma_exp_x)

    # 输出loss
    loss_1 = -x_class + log_sigma_exp_x

    print("第一个样本loss为: ", loss_1)


# ----------------------------------- weight -----------------------------------
flag = 0
# flag = 1
if flag:
    # def loss function
    #有多少个类别weights这个向量就要设置多长
    weights = torch.tensor([1, 2], dtype=torch.float)
    # weights = torch.tensor([0.7, 0.3], dtype=torch.float)

    loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)

    # view
    print("\nweights: ", weights)
    print(loss_none_w, loss_sum, loss_mean)
    #权值为[1,2],则输出[1.3133, 0.2539, 0.2539] 1.8210 0.3642
    # target=[0,1,1]所以对应0的需要乘以权值1，对应1的需要乘以权值2.
    # 1.3133是1.3133*1=1.3133，0.2530是0.1269*2=0.2539，0.2530是0.1269*2=0.2539.
    #0.3642是1.8210/(1+2+2)=0.3642，分母是权值的份数（5）.


# --------------------------------- compute by hand
flag = 0
# flag = 1
if flag:
    weights = torch.tensor([1, 2], dtype=torch.float)
    #weights_all=5
    weights_all = np.sum(list(map(lambda x: weights.numpy()[x], target.numpy())))  # [0, 1, 1] ---> # [1 2 2]

    mean = 0
    loss_sep = loss_none.detach().numpy()
    for i in range(target.shape[0]):

        x_class = target.numpy()[i]
        tmp = loss_sep[i] * (weights.numpy()[x_class] / weights_all)
        mean += tmp

    print(mean)

官网示例：

在这里插入图片描述

# Example of target with class indices，此时target为1D，长度为batchsize的大小
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
# Example of target with class probabilities，此时target的形状与input一致
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
output.backward()

2.2 nn.NLLLoss(weight=None, ignore_index=- 100, reduction=‘mean’)

nn.NLLLoss(weight=None, size_average=None, ignore_index=- 100, reduce=None, reduction='mean')

在这里插入图片描述
代码示例：

# fake data
#二分类任务，输出神经元2个，batchsize是3，即三个样本：[1,2] [1,3] [1,3]
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
#dtype必须是long，有多少个样本，tensor（1D）就有多长。
target = torch.tensor([0, 1, 1], dtype=torch.long)

# ----------------------------------- 2 NLLLoss -----------------------------------
flag = 0
# flag = 1
if flag:

    weights = torch.tensor([1, 1], dtype=torch.float)

    loss_f_none_w = nn.NLLLoss(weight=weights, reduction='none')
    loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)

    # view
    print("\nweights: ", weights)
    print("NLL Loss", loss_none_w, loss_sum, loss_mean)
    #输出[-1,-3,-3] -7 -2.3333
    # target[0, 1, 1]
    #-1是因为第一个样本[1,2]是第0类，因此只对第一个神经元输出操作：-1*1。
    #-3是因为第二个样本[1,3]是第1类，因此只对第二个神经元输出操作：-1*3。
    #-3是因为第三个样本[1,3]是第1类，因此只对第二个神经元输出操作：-1*3。
    #求mean时的分母是权重weight的和。

官方示例：

在这里插入图片描述

m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
output.backward()
# 2D loss example (used, for example, with image inputs)
N, C = 5, 4
loss = nn.NLLLoss()
# input is of size N x C x height x width
data = torch.randn(N, 16, 10, 10)
conv = nn.Conv2d(16, C, (3, 3))
m = nn.LogSoftmax(dim=1)
# each element in target has to have 0 <= value < C
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
output = loss(m(conv(data)), target)
output.backward()

2.3 nn.BCELoss(weight=None, reduction=‘mean’)

nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')

在这里插入图片描述
注意事项：输入值取值在[0,1]
xn是模型输出的概率取值，yn是标签（二分类中对应0或1）。
代码示例：

# ----------------------------------- 3 BCE Loss -----------------------------------
flag = 0
# flag = 1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float) #4个样本
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float) #float类型，每个神经元一一对应去计算loss

    target_bce = target

    # itarget
    #一定要加上sigmoid，将输入值取值在[0,1]区间
    inputs = torch.sigmoid(inputs)

    weights = torch.tensor([1, 1], dtype=torch.float)

    loss_f_none_w = nn.BCELoss(weight=weights, reduction='none')
    loss_f_sum = nn.BCELoss(weight=weights, reduction='sum')
    loss_f_mean = nn.BCELoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)

    # view
    print("\nweights: ", weights)
    print("BCE Loss", loss_none_w, loss_sum, loss_mean)
    #输出[[0.3133, 2.1269], [0.1269, 2.1269], [3.0486, 0.0181], [4.0181, 0.0067]] 11.7856 1.4732
    #每个神经元一一对应去计算loss,因此loss个数是2*4=8个


# --------------------------------- compute by hand
flag = 0
# flag = 1
if flag:

    idx = 0

    x_i = inputs.detach().numpy()[idx, idx]
    y_i = target.numpy()[idx, idx]              #

    # loss
    # l_i = -[ y_i * np.log(x_i) + (1-y_i) * np.log(1-y_i) ]      # np.log(0) = nan
    l_i = -y_i * np.log(x_i) if y_i else -(1-y_i) * np.log(1-x_i)

    # 输出loss
    print("BCE inputs: ", inputs)
    print("第一个loss为: ", l_i) #0.3133

一定要加上sigmoid，将输入值取值在[0,1]区间。
官网示例：
在这里插入图片描述

m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
output.backward()

2.4 nn.BCEWithLogitsLoss(weight=None, reduction=‘mean’, pos_weight=None)

nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)

在这里插入图片描述（1）pos_weight用于均衡正负样本，正样本的loss需要乘以pos_weight。
（2）比如正样本100个，负样本300个，此时可以将pos_weight设置为3，就可以实现正负样本均衡。
（3）pos_weight里是一个tensor列表，需要和标签个数相同，比如现在有一个多标签分类，类别有200个，那么 pos_weight 就是为每个类别赋予的权重值，长度为200
（4）如果现在是二分类，只需要将正样本loss的权重写上即可，比如我们有正负两类样本，正样本数量为100个，负样本为400个，我们想要对正负样本的loss进行加权处理，将正样本的loss权重放大4倍，通过这样的方式缓解样本不均衡问题：

criterion = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([4]))

代码示例：

# ----------------------------------- 4 BCE with Logis Loss -----------------------------------
# flag = 0
flag = 1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

    target_bce = target

    # inputs = torch.sigmoid(inputs) 不能加sigmoid！！！

    weights = torch.tensor([1, 1], dtype=torch.float)

    loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none')
    loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)

    # view
    print("\nweights: ", weights)
    print(loss_none_w, loss_sum, loss_mean)
    #输出[[0.3133, 2.1269], [0.1269, 2.1269], [3.0486, 0.0181], [4.0181, 0.0067]] 11.7856 1.4732

# --------------------------------- pos weight

# flag = 0
flag = 1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

    target_bce = target

    # itarget
    # inputs = torch.sigmoid(inputs)

    weights = torch.tensor(