softmax torch 多分类_Pytorch之简洁实现Softmax多分类

最新推荐文章于 2024-09-07 14:53:28 发布

ae1915d

最新推荐文章于 2024-09-07 14:53:28 发布

阅读量548

点赞数

文章标签： softmax torch 多分类

本文链接：https://blog.csdn.net/weixin_42347778/article/details/113708004

版权

在上一篇文章中，我们自己手动实现了对于softmax操作和交叉熵的计算，可其实这些在Pytorch框架中已经被实现了，我们直接拿来使用即可。但是，为了能够对这些内容有着更深刻的理解，通常我们都会自己手动实现一次，然后在今后的使用中就可以直接拿现成的来用了。在接下来这篇文章中，笔者将首先介绍如何调用Pytorch中的交叉熵损失函数，然后再同时借助nn.Linear()来实现一个简洁版的Softmax回归。

1 交叉熵损失函数

在前一篇文章中，我们首先分别自己实现了softmax和交叉熵的操作；然后再将两者结合实现了交叉熵损失函数的计算过程。但其实这两步通过Pytorch中的CrossEntropyLoss()就能实现。

def softmax(x):
    s = torch.exp(x)
    return s / torch.sum(s, dim=1, keepdim=True)

def crossEntropy(y_true, logits):
    c = -torch.log(logits.gather(1, y_true.reshape(-1, 1)))
    return torch.sum(c)

logits = torch.tensor([[0.5, 0.3, 0.6], [0.5, 0.4, 0.3]])
y = torch.LongTensor([2, 1])
c = crossEntropy(y, softmax(logits)) / len(y)
print(c)

loss = torch.nn.CrossEntropyLoss(reduction='mean')  
# 返回的均值是除以的每一批样本的个数(不一定是batchsize，因为最后一个Batch的样本可能很少)
cc = loss(logits, y)
print(cc)

#结果：
tensor(1.0374)
tensor(1.0374)

从上述代码可以看出，仅仅用Pytorch中的两行代码就能实现我们需要的功能。同时，需要注意的是当CrossEntropyLoss中指定参数reduction='mean'时，返回的均值是总损失除以输入的样本数量，而不是batchsize或者batchsize*c(c表示类别数)。

2 实现Softmax分类模型

在上一篇文章中，我们已经实现了用于载入数据的函数loadDataset()，用于计算分类准确率的函数accuracy()以及用于模型评估(本质上也是准确率的计算)的函数evaluate()。因此，接下来我们只需要用Pytorch封装好的API实现网络模型即可。

2.1 Softmax模型实现

def train():
    input_nodes = 28 * 28
    output_nodes = 10
    epochs = 5
    lr = 0.1
    batch_size = 256
    train_iter, test_iter = loadDataset(batch_size)
    net = nn.Sequential(nn.Flatten(),
                        nn.Linear(input_nodes, output_nodes))
    loss = nn.CrossEntropyLoss(reduction='mean')
    optimizer = torch.optim.SGD(net.parameters(), lr=lr)  # 定义优化

同之前实现简洁版的线性回归一样，通过Pytorch封装好的API，仅仅两行代码就能够实现Softmax分类模型。其中nn.Flatten()用于将输入“拉平”成一个向量，在此处就是将输入的图片展成一个784维的向量。同时，在倒数第2行代码中，我们还直接调用了Pytorch中的交叉熵损失函数。最后1行代码我们定义了一个SGD优化器。接下来，就是通过一个循环来对网络进行训练：

     for epoch in range(epochs):
            for i, (x, y) in enumerate(train_iter):
                logits = net(x)
                l = loss(logits, y)
                optimizer.zero_grad()
                l.backward()
                optimizer.step()  # 执行梯度下降

上述代码的含义在之前的文章中已经介绍过，在此就不再赘述。同时，完整示例代码可在引用[2]中进行获取。

2.2 运行结果

if __name__ == '__main__':
    mnist_train, mnist_test = loadDataset()
    train(mnist_train, mnist_test)

#结果
Epochs[5/2]---batch[234/150]---acc 0.8438---loss 0.5209
Epochs[5/2]---batch[234/200]---acc 0.8086---loss 0.559
Epochs[5/2]--acc on test 0.8158
Epochs[5/3]---batch[234/0]---acc 0.8438---loss 0.4491
Epochs[5/3]---batch[234/50]---acc 0.8047---loss 0.5441
Epochs[5/3]---batch[234/100]---acc 0.8203---loss 0.5268
Epochs[5/3]---batch[234/150]---acc 0.8125---loss 0.4648
Epochs[5/3]---batch[234/200]---acc 0.875---loss 0.4342
Epochs[5/3]--acc on test 0.8124