关于Pytorch的softmax与Cross_entropy交叉熵的测试

最新推荐文章于 2024-04-30 07:12:50 发布

重头再来69

最新推荐文章于 2024-04-30 07:12:50 发布

阅读量1.2k

点赞数 3

分类专栏：解决问题的经验文章标签： python 算法机器学习

本文链接：https://blog.csdn.net/dhldxcycsdn/article/details/114169573

版权

解决问题的经验专栏收录该内容

4 篇文章 0 订阅

订阅专栏

主要是搞不清在使用CrossEntropyLoss是的输入大小size。于是分析了softmax，Nll_loss，crossentropyloss三个函数。因为 crossentropyloss = softmax + Nll_loss。
（这点参考@小风的文章[PyTorch的 nn.CrossEntropyLoss() 方法详解]）(https://blog.csdn.net/weixin_44298740/article/details/104928756)

主要结论：
1、softmax是输入与输出一样size，但是对不同维度做，得到的结果tensor的shape要把那个dim去掉。
2、交叉熵的Target比输入少一维，少的是C分类数。target 的tensor的值也必须是分类数。shape和值不对，都不能运算。

1、softmax / log_softmax

先看官方文档

def log_softmax(input, dim=None, _stacklevel=3, dtype=None):
    # type: (Tensor, Optional[int], int, Optional[int]) -> Tensor

log_softmax 是输入为一个tensor和一个维度，这个维度是指对tensor中那个维度进行概率计算。softmax的计算结果，输出的size=输入的size，该维度的数值加起来=1。如果对该维度的数求和，则维度减去该维度。
例如：输入[2,3,4], dim=2, 则求和结果为[2,3]
dim=1, [2,4],
dim=0, [3,4]
log_softmax对softmax的结果求对数，和就不一定了。维度变化一致。

看例子：

a = list(range(0,24,1))
a[1]=9
a[15]=5    # 这里修改了2个数，以免激活后的位置结果一样
input = torch.Tensor(a).reshape(2,3,4)  # N,C,in_size
print()
print('Input：')
print(input.shape)   # [2,3,4]
print(input)

SF = nn.Softmax(dim=2)
out_softmax = SF(input)
print()
print('Result of softmax:')
print(out_softmax.shape)   # [2,3,4]  与 输入一样
print(out_softmax)
s_max = out_softmax.max(dim=2)[0]
s_loc = out_softmax.max(dim=2)[1]
print()
print('计算结果的最大值的大小/位置:')
print(s_max)
print(s_loc)

运行结果：

Input：
torch.Size([2, 3, 4])
tensor([[[ 0.,  9.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.]],

        [[12., 13., 14.,  5.],
         [16., 17., 18., 19.],
         [20., 21., 22., 23.]]])

Result of softmax:
torch.Size([2, 3, 4])
tensor([[[1.2298e-04, 9.9650e-01, 9.0869e-04, 2.4701e-03],
         [3.2059e-02, 8.7144e-02, 2.3688e-01, 6.4391e-01],
         [3.2059e-02, 8.7144e-02, 2.3688e-01, 6.4391e-01]],

        [[9.0023e-02, 2.4471e-01, 6.6519e-01, 8.2091e-05],
         [3.2059e-02, 8.7144e-02, 2.3688e-01, 6.4391e-01],
         [3.2059e-02, 8.7144e-02, 2.3688e-01, 6.4391e-01]]])
         
计算结果的最大值的大小/位置:
tensor([[0.9965, 0.6439, 0.6439],  # 可以理解为每页，每行中所有列的最大值，及其位置
        [0.6652, 0.6439, 0.6439]])
tensor([[1, 3, 3],
        [2, 3, 3]])```
注意，输出结果是降维了的，从2,3,4 变成 2，3。

求和结果看以下代码：

sum = out_softmax.sum(dim=2)
print()
print('求和结果:')
print(sum.shape)
print(sum)

结果：

求和结果:
torch.Size([2, 3])
tensor([[1., 1., 1.],
        [1., 1., 1.]])

手工测试一下：

a = [1.2298e-04, 9.9650e-01, 9.0869e-04, 2.4701e-03]
print(sum(a))   # 1.0000017700000001

如果把两处的dim都换为1，

Result of softmax:
torch.Size([2, 3, 4])
tensor([[[3.2932e-04, 4.9546e-01, 3.2932e-04, 3.2932e-04],
         [1.7980e-02, 9.0747e-03, 1.7980e-02, 1.7980e-02],
         [9.8169e-01, 4.9546e-01, 9.8169e-01, 9.8169e-01]],

        [[3.2932e-04, 3.2932e-04, 3.2932e-04, 1.4956e-08],
         [1.7980e-02, 1.7980e-02, 1.7980e-02, 1.7986e-02],
         [9.8169e-01, 9.8169e-01, 9.8169e-01, 9.8201e-01]]])

计算结果的最大值的大小/位置:
tensor([[4.9546e-01, 1.7980e-02, 9.8169e-01],
        [3.2932e-04, 1.7986e-02, 9.8201e-01]])
tensor([[1, 0, 0],
        [0, 3, 3]])

求和结果:
torch.Size([2, 4])
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])   #注意，这里shape变了，因为是对行计算，所以每列都有值，共4列

求和测试：

b = [3.2932e-04, 1.7980e-02, 9.8169e-01]
print(sum(b))  # 0.9999993199999999

如果dim改为0

Result of softmax:
torch.Size([2, 3, 4])
tensor([[[6.1442e-06, 1.7986e-02, 6.1442e-06, 1.1920e-01],
         [6.1442e-06, 6.1442e-06, 6.1442e-06, 6.1442e-06],
         [6.1442e-06, 6.1442e-06, 6.1442e-06, 6.1442e-06]],

        [[9.9999e-01, 9.8201e-01, 9.9999e-01, 8.8080e-01],
         [9.9999e-01, 9.9999e-01, 9.9999e-01, 9.9999e-01],
         [9.9999e-01, 9.9999e-01, 9.9999e-01, 9.9999e-01]]])

计算结果的最大值的大小/位置:
tensor([[1.1920e-01, 6.1442e-06, 6.1442e-06],
        [9.9999e-01, 9.9999e-01, 9.9999e-01]])
tensor([[3, 0, 0],
        [0, 0, 0]])
求和结果:
torch.Size([3, 4])
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

求和测试：

c = [6.1442e-06,9.9999e-01]
print(sum(c))   # 0.9999961442

2、NLL_Loss

def nll_loss(input, target, weight=None, size_average=None, ignore_index=-100,
reduce=None, reduction=‘mean’):
# type: (Tensor, Tensor, Optional[Tensor], Optional[bool], int, Optional[bool], str) -> Tensor
主要的是input和target，

Args:
        input: :math:`(N, C)` where `C = number of classes` or :math:`(N, C, H, W)`
            in case of 2D Loss, or :math:`(N, C, d_1, d_2, ..., d_K)` where :math:`K \geq 1`
            in the case of K-dimensional loss.
        target: :math:`(N)` where each value is :math:`0 \leq \text{targets}[i] \leq C-1`,
            or :math:`(N, d_1, d_2, ..., d_K)` where :math:`K \geq 1` for
            K-dimensional loss.

可见简单的情况，输入是一个二维矩阵，N代表要预测的项数，C代表分类数，目标是一个长度为N的向量。相当于，有N个东西要分为C类，每一个类由前面的softmax给出了一个概率，概率最大的那个分类数就是Target的值。下面是官方的示例：

import torch
import torch.nn.functional as F

input = torch.randn(3, 5, requires_grad=True)
print(input)
target = torch.randint(5, (3,), dtype=torch.int64)
 # 在 [0,5）中随机选3个数, 必须是Long整数，必须走分类数中，
 # 这是一个大坑哈。也就是说，结果必须代表分类的数。在class的定义中有说明：
 # each element in target has to have 0 <= value < C
print(target)
loss = F.cross_entropy(input, target)
print(loss)
loss.backward()
print(loss)

结果如下：

tensor([[ 0.1578,  0.2282, -0.6279, -1.0986,  1.5271],
        [ 0.7805, -0.4901, -1.2760, -0.7056, -0.0250],
        [ 0.1768, -0.7692,  0.7770,  1.9761,  0.5800]], requires_grad=True)
tensor([2, 0, 0])
tensor(1.9343, grad_fn=<NllLossBackward>)
tensor(1.9343, grad_fn=<NllLossBackward>)

注意：每次运行的结果不一样。
还有，F中的cross_entropy与nn中的class NLLLoss用法不完全一样哈：
以下例子是官方给的，可以自己试一下：

        >>> m = nn.LogSoftmax(dim=1)
        >>> loss = nn.NLLLoss()
        >>> # input is of size N x C = 3 x 5
        >>> input = torch.randn(3, 5, requires_grad=True)
        >>> # each element in target has to have 0 <= value < C
        >>> target = torch.tensor([1, 0, 4])
        >>> output = loss(m(input), target)
        >>> output.backward()
        >>>
        >>>
        >>> # 2D loss example (used, for example, with image inputs)
        >>> N, C = 5, 4
        >>> loss = nn.NLLLoss()
        >>> # input is of size N x C x height x width
        >>> data = torch.randn(N, 16, 10, 10)
        >>> conv = nn.Conv2d(16, C, (3, 3))   # C为通道数，3为核数，卷积10x10的结果为8x8
        >>> m = nn.LogSoftmax(dim=1)
        >>> # each element in target has to have 0 <= value < C
        >>> target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)  #在0~C中取任意数
        >>> output = loss(m(conv(data)), target)
        >>> output.backward()

3、 class CrossEntropyLoss（）

这个用起来就简单了：

        >>> loss = nn.CrossEntropyLoss()
        >>> input = torch.randn(3, 5, requires_grad=True)
        >>> target = torch.empty(3, dtype=torch.long).random_(5)
        # random_(5), 在0~5（不包括5）中选随机数
        >>> output = loss(input, target)
        >>> output.backward()

不需要做softmax了。

所以说，要想用交叉熵，target必须是分类结果。
我有个实验，直接给的数值，总是报错，现在看来显然是不行的。
可以考虑把数值转换为分类数。

重头再来69

关注

3
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
关于Pytorch的softmax与Cross_entropy交叉熵的测试

主要是搞不清在使用CrossEntropyLoss是的输入大小size。于是分析了softmax，Nll_loss，crossentropyloss三个函数。因为 crossentropyloss = softmax + Nll_loss。1、官方文档def nll_loss(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean'):
复制链接

扫一扫