多分类交叉熵函数计算过程(包含numpy和pytorch代码实现)

是大糊涂不聪明

已于 2022-11-19 19:37:34 修改

阅读量2.8k

点赞数 1

分类专栏： pytorch系列文章标签：分类深度学习人工智能

于 2022-08-26 16:35:42 首次发布

本文链接：https://blog.csdn.net/weixin_47289438/article/details/126388570

版权

pytorch系列专栏收录该内容

30 篇文章 1 订阅

订阅专栏

文章目录

1.具体示例
2.计算步骤
3.sigmoid做多分类
4. loss反向传播是本质

调库调用太久了
以至于把最基本的给忘了
连交叉熵的数学公式都记不清了，自己写代码验证还验证错误，就是因此公式记错了
所以重新记录一下，交叉熵的求解过程

当然，交叉熵的由来是一个最优化问题，在sigmoid函数相关最优化求解里会有相关的内容，而实际上原理就是求偏导后的，得的最小值等等，这里不概述。
仅给出计算公式以及手写代码和pytorch调库结果对比

1.具体示例

Alt

2.计算步骤

先得到网络输出， pred = model(x)，model可以用简单的fc为例
对pred进行softmax处理，得到概率P
根据真实标签label，得到yi 以及对应的Pi
将这些对应的Pi求log，再负号，求和
除以batch

下面的代码，实际上debug了很久才写出来的，注释已经很清楚了

2.1 numpy代码手动实现

以一个shape:(3,4)的矩阵为例，计算

"""
batch=3, nc = 4
"""
import numpy as np

target = np.array([
    [-1.0606, 1.5613, 1.2007, -0.2481],
    [-1.9652, -0.4367, -0.0645, -0.5104],
    [0.1011, -0.5904, 0.0243, 0.1002]
])

label = np.array([0, 2, 1])


def np_softmax(arr):
    assert len(arr.shape) == 2
    arr_exp = np.exp(arr)
    arr_sum = np.sum(arr_exp, axis=1)  # (3,4) -->(3,)
    arr_sum = arr_sum[:, None]  # 增加维度，才可以通过广播，进行矩阵除法。(3, )-->(3, 1)
    return arr_exp / arr_sum


def np_onehot(nc, true_label):
    """
    param nc: nc代表划分的类别数目
    param true_label: 传入的标签 shape :（batch, )
    return: 返回一个(batch, nc)形式的one_hot变量
    """
    tmp = np.arange(nc)
    tmp = tmp[None, :]  # 增加行的维度，[0,1,2,3] -->[[0, 1, 2, 3]] 。后续才能广播，每一行都是0,1,2,3
    true_label = true_label[:, None]  # 增加列的维度，[0, 2, 1] -->[[0],[2],[1]] 。按照列广播
    ans = tmp == true_label  # 自动广播，返回(batch, nc)形式，为True或False
    return ans.astype(int)  # bool --> int


# 1.对预测结果softmax处理
target_soft = np_softmax(target)
# 2. 取log对数
target_log = np.log(target_soft)
# 3. one-hot变量
label_one = np_onehot(4, label)
# 4. python矩阵乘法，按照元素相乘； 取负数  ， 负对数
res = -target_log * label_one
# 5. 按照列求和， 取均值
loss = np.mean(np.sum(res, axis=1))

print(loss)

在这里插入图片描述

2.2 pytorch实现

import torch
import numpy as np
import torch.nn as nn

criterion = nn.CrossEntropyLoss()

target = np.array([
    [-1.0606, 1.5613, 1.2007, -0.2481],
    [-1.9652, -0.4367, -0.0645, -0.5104],
    [0.1011, -0.5904, 0.0243, 0.1002]
])
target = torch.tensor(target)
label = torch.tensor([0, 2, 1])
res = criterion(target, label)
print(res)

在这里插入图片描述

2.3 pytorch等价实现

上面使用的是nn.CrossEntropyLoss()进行实现，但是也可以调用nn.LogSoftmax()结合nn.NLLLoss()，即，可以简单认为
nn.CrossEntropyLoss()等价于nn.LogSoftmax()加nn.NLLLoss()

import torch
import numpy as np
import torch.nn as nn


target = np.array([
    [-1.0606, 1.5613, 1.2007, -0.2481],
    [-1.9652, -0.4367, -0.0645, -0.5104],
    [0.1011, -0.5904, 0.0243, 0.1002]
])
target = torch.tensor(target)
label = torch.tensor([0, 2, 1])

# 先取softmax，再取log操作

m = nn.LogSoftmax(dim=1)
# The negative log likelihood loss. It is useful to train a classification
# problem with `C` classes
loss = nn.NLLLoss()


tmp = m(target)
output = loss(tmp, label)
print(output)

在这里插入图片描述

解释如下：截图参考来源
在这里插入图片描述

3.sigmoid做多分类

注意BCELoss 和 BCELosswithLogit是有区别的

其实损失函数的相关计算，都是自己定义的
多分类中，使用sigmoid也是可以的。
即，每个类别的预测都是独立的、互不干扰，概率值的和也不为1

# loss_function = nn.CrossEntropyLoss()
criterion = nn.BCELoss()

for step, data in enumerate(train_bar):
    images, labels = data

    images = images.to(device)
    labels = labels.to(device)

    output = net(images.to(device))
    # 对labels  one-hot处理
    labels = nn.functional.one_hot(labels, num_classes=5)
    # 对output  sigmoid处理
    output = torch.sigmoid(output)
    loss = criterion(output, labels.float())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()