Softmax算法笔记

最新推荐文章于 2024-07-31 15:02:43 发布

ACplus464

最新推荐文章于 2024-07-31 15:02:43 发布

阅读量958

点赞数 14

分类专栏：机器学习文章标签：算法笔记深度学习 python 人工智能 pytorch 分类

本文链接：https://blog.csdn.net/weixin_69718693/article/details/135603250

版权

机器学习专栏收录该内容

10 篇文章 0 订阅

订阅专栏

摘要：本文将通过搭建Softmax网络，对MNIST Fashion时装数据集进行图像分类训练与测试。同样地，这一次也是从0开始搭建。

在机器学习尤其是分类问题，Softmax函数通常用于将原始输出向量转换为概率分布，使得每个类别的概率值介于0和1之间，并且所有类别的概率之和为1。这使得我们可以将输出解释为各个类别的概率。函数的定义如下：

$Softmax(x_i) = \frac {e^{x_i}} {\sum_{j=1}^N e^{x_j}}$

即：将原始输出向量中的每个元素进行指数运算，然后将指数运算结果归一化，使得所有元素之和等于1。这样，每个元素就可以被解释为对应类别的概率。

**本实验没有隐层**
以上图片仅为示例，本实验没有隐层。

0、问题描述

使用Softmax模型，对Fashion MNIST数据集进行分类，因此，输入尺寸为 $784$ ，输出维度为 $10$ 。

1、拉取数据

每批次训练256条数据，从d2l官方库中下载数据。

加载完成后，train_iter共计60000条数据，test_iter共计10000条数据。

import torch.nn
import torch, d2l
from IPython import display
from d2l import torch as d2l

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

2、初始化模型参数

下面定义输入维度为784，输出维度为10。

每幅图像为 $28\times 28$ 像素，展平成一维向量即为784。这样可能会无法处理空间信息，由此引出Yann Lecun提出的卷积神经网络的提升方法。

此外，还需要模型权重 $W, b$ 。

设某个批次batch_size==N，则训练集 $X$ 的尺寸为N*784，输出 $y$ 的尺寸即为N*10。
因此，本实验中， $W$ 是一个 $784\times 10$ 的矩阵（其初始数值由高斯随机分布给出）， $b$ 是长度为10的向量。这样就可以建立线性模型 $y = X W + b$ 进行预测。
注意， $y = X W + b$ 使用了广播机制，即：对于 $X W$ （尺寸为N*10）的每一行，都与 $b$ （尺寸为1*10）做加法。

num_inputs = 784
num_outputs = 10
W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
b = torch.zeros(num_outputs, requires_grad=True)

3、实现Softmax模型

3.1、模型定义

根据前面的定义，定义softmax函数。输入一个矩阵X，对其按行作Softmax操作。即：

$Softmax(X)_{ij}=\frac{exp(X_{ij})}{\sum_k X_{ik}}$

def softmax(X):
    X_exp = torch.exp(X)
    partition = X_exp.sum(1, keepdim=True)
    return X_exp / partition

# 进行测试
X = torch.normal(0, 1, (2,5))
X_Prob = softmax(X)
X_Prob, X_Prob.sum(1)  # 按行求和

输出如下：

(tensor([[0.0213, 0.0529, 0.1971, 0.5951, 0.1336],
         [0.6233, 0.1595, 0.0414, 0.1553, 0.0205]]),
 tensor([1.0000, 1.0000]))

3.2、前向传播过程

实现Softmax回归网络：

将训练集 $X$ 和 $W$ 做乘法，使用广播与 $b$ 相加，得到 $y = X W + b$ ；
将 $y$ 按行进行softmax处理，得到N*10的预测概率分布矩阵。

def net(X):
    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)

3.3、损失函数

使用交叉熵作为损失函数。（有关该函数的介绍，可见交叉熵函数）

神经网络的输出经过softmax激活函数之后，得到了预测的概率分布。假设预测结果为 $\hat y = [0.3, 0.4, 0.3]$ 。我们可以使用交叉熵损失函数来计算这个预测结果和真实标签 $y$ 的损失值。

交叉熵损失函数的计算公式为：

$-Σ_i(y_i \times \log(p_i))$

其中， $y_i$ 是真实标签的独热编码， $p_i$ 是模型的预测概率。

独热编码就是一串编码只有1位有效。比如预测真实值为8，则输出为 $[0, 0, 0, 0, 0, 0, 0, 0, 1, 0]$ 。
交叉熵损失的输入应为等规模的两个数组，但本实验为了方便起见，以有效位index代表独热编码本身，效果等价。

代入数据后，我们可以计算出损失值：

$\times \log(0.3) + 1 \times \log(0.4) + 0 \times \log(0.3)) ≈ 0.916$

通过最小化这个损失值，我们可以使用反向传播算法来更新神经网络的参数，使得模型的预测结果更接近真实的标签，从而提高分类准确性。

# 创建一个数据y_hat，其中，2个样本在3个类别的预测概率，使用y作为y_hat的概率索引
y = torch.tensor([0, 2])
y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y_hat[[0, 1], y]

# 交叉熵损失函数
# y_hat是预测值，y是真实值（不做独热编码处理，仅以编号表示正确类别）
def cross_entropy(y_hat, y):
    return -torch.log(y_hat[range(len(y_hat)), y])

# 测试
cross_entropy(y_hat, y)

3.3 评估函数

计算给定data_iter训练集上预测的准确率，例如，本批次256组样本，预测对的有128个，则准确率为50%：

# 计算预测对的数量
def accuracy(y_hat, y):
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = y_hat.argmax(axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())

print(accuracy(y_hat, y)) # 预测对的样本数
print(accuracy(y_hat, y) / len(y)) # 预测对的概率

基于此，实现网络评估函数：

def eval_acc(net, data_iter):
    if isinstance(net, torch.nn.Module):
        net.eval()
    metric = Accumulator(2)
    for X, y in data_iter:
        metric.add(accuracy(net(X),y), y.numel())
    return metric[0] / metric[1]

4、训练

4.1、训练过程

首先是每一个epoch所需要完成的操作：

def train_epoch_ch3(net, train_iter, loss, updater):
    if isinstance(net, torch.nn.Module):  # 此处考虑的是使用Pytorch的情况
        net.train()
    metric = Accumulator(3)
    for X, y in train_iter:
        y_hat = net(X)
        l = loss(y_hat, y)
        if isinstance(updater, torch.optim.Optimizer):
            updater.zero_grad()
            l.backward()
            updater.step()
            metric.add(
                float(l) * len(y), accuracy(y_hat, y),
                y.size().numel()
            )
        else:
            l.sum().backward()
            updater(X.shape[0])
            metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    return metric[0] / metric[2], metric[1] / metric[2]

Accumulator实例中创建了 $2$ 个变量，用于分别存储：

正确预测的数量
预测的总数量

实现如下：

class Accumulator:
    def __init__(self, n):
        self.data = [0.0] * n
        
    def add(self, *args):
        self.data = [a + float(b) for a, b in zip(self.data, args)]
        
    def reset(self):
        self.data = [0.0] * len(self.data)
        
    def __getitem__(self, idx):
        return self.data[idx]

之后是整个训练的过程：

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):
    for epoch in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = eval_acc(net, test_iter)
        train_loss, train_acc = train_metrics
        print(f'epoch {epoch+1} \t train_loss {train_loss:.6f} \t train_acc {train_acc:.6f} \t test_acc {test_acc}')

其中，updater方法是优化器。这里还是使用SGD算法实现。

lr = 0.1
def updater(batch_size):
    return d2l.sgd([W,b], lr, batch_size)

4.2、执行

下面就可以开始训练了。我们训练20个轮次，逐步输出精度和损失：

num_epochs=20
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)

5、预测

定义一个预测函数，从测试集中随机选取几个样本输出预测标签。

def predict_ch3(net, test_iter, n=6):
    for X,y in test_iter:
        break
    trues = d2l.get_fashion_mnist_labels(y)
    preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
    titles = [true + '\n' + pred for true, pred in zip(trues, preds)]
    d2l.show_images(X[0:n].reshape((n,28,28)), 1, n, titles=titles[0:n])
    
predict_ch3(net, test_iter)