softmax回归的从零开始实现

最新推荐文章于 2024-03-18 23:12:07 发布

菜小鸡同志要永远朝光明前进

最新推荐文章于 2024-03-18 23:12:07 发布

阅读量608

点赞数 2

分类专栏：动手学深度学习+pytorch 文章标签：深度学习机器学习

原文链接：https://zh-v2.d2l.ai/

版权

动手学深度学习+pytorch 专栏收录该内容

30 篇文章 3 订阅

订阅专栏

初始化模型参数

和之前线性回归的例子一样，这里的每个样本都将用固定长度的向量表示。原始数据集中的每个样本都是28×2828×28的图像。在本节中，我们[将展平每个图像，把它们看作长度为784的向量。]在后面的章节中，将讨论能够利用图像空间结构的更为复杂的策略，但现在我们暂时只把每个像素位置看作一个特征。因为我们的数据集有10个类别，所以网络输出维度为10)。因此，权重将构成一个784×10784×10的矩阵，偏置将构成一个1×101×10的行向量。与线性回归一样，我们将使用正态分布初始化我们的权重W，偏置初始化为0

num_inputs = 784
num_outputs = 10

W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
b = torch.zeros(num_outputs, requires_grad=True)

定义softmax操作

softmax由三个步骤组成：

（1）对每个项求幂（使用exp）；

（2）对每一行求和（小批量中每个样本是一行），得到每个样本的归一化常数；

（3）将每一行除以其归一化常数，确保结果的和为1

def softmax(X):
    X_exp = torch.exp(X)
    partition = X_exp.sum(1, keepdim=True)
    return X_exp / partition  # 这里应用了广播机制

定义模型

实现softmax回归模型。下面的代码定义了输入如何通过网络映射到输出

Notes:在将数据传递到我们的模型之前，我们使用reshape函数将每张原始图像展平为向量

def net(X):
    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)

定义损失函数

实现交叉熵损失函数

def cross_entropy(y_hat, y):
    return - torch.log(y_hat[range(len(y_hat)), y])

cross_entropy(y_hat, y)

分类准确率

def accuracy(y_hat, y):  #@save
    """计算预测正确的数量。"""
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = y_hat.argmax(axis=1) #tensor([2,2]) 找到在2个样本在3个类别的预测概率的预测中，最大预测概率的索引
    cmp = y_hat.type(y.dtype) == y  #将y_hat转换为y的数据类型然后作比较  cmp为bool类型  cmp=[False,True]
    return float(cmp.type(y.dtype).sum())     #将cmp转化为y的数据类型再求和——得到找出来预测正确的类别数 正确为1错误为0
#print(accuracy(y_hat, y) / len(y))  ##除以整个y的长度（样本数），就是预测正确的概率  accuracy = 0.5

#另一种方法
def evaluate_accuracy(net,data_iter):
    """计算在指定数据集上模型的精度。"""
    ## isinstance()：判断一个对象是否是一个已知的类型
    # 判断输入的net模型是否是torch.nn.Module类型
    if isinstance(net,torch.nn.Module):
        net.eval()  # 将模型设置为评估模式（不用计算梯度）
    metric = Accumulator(2) #通过Accumulator实例中创建了2个变量，用于分别存储正确预测的数量和预测的总数量
    for X,y in data_iter:    # 每次从迭代器中拿出一个x和y
        # 1、net(X)：X放在net模型中进行softmax操作
        # 2、accuracy(net(X), y)：再计算所有预算正确的样本数
        # numel()函数：返回数组中元素的个数，在此可以求得样本数
        metric.add(accuracy(net(X),y),y.numel())
    return metric[0]/metric[1] #metric[0]:分类正确的样本数，metric[1]:总的样本数

#这里Accumulator是一个实用程序类，用于对多个变量进行累加。 在上面的evaluate_accuracy函数中

class Accumulator:  #@save
    """在`n`个变量上累加。"""
    def __init__(self, n):
        self.data = [0.0] * n
    def add(self, *args):
        '''
        zip() 函数用于将可迭代的对象作为参数，将对象中对应的元素打包成一个个元组，然后返回由这些元组组成的列表。
        zip()用法:
            #>>>a = [1,2,3]
            #>>> b = [4,5,6]
            #>>> c = [4,5,6,7,8]
            #>>> zipped = zip(a,b)     # 打包为元组的列表
            [(1, 4), (2, 5), (3, 6)]
            #>>> zip(a,c)              # 元素个数与最短的列表一致
            [(1, 4), (2, 5), (3, 6)]
            #>>> zip(*zipped)          # 与 zip 相反，*zipped 可理解为解压，返回二维矩阵式
            [(1, 2, 3), (4, 5, 6)]
        '''
        self.data = [a + float(b) for a, b in zip(self.data, args)]

    def reset(self):
        self.data = [0.0] * len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

训练

def train_epoch_ch3(net,train_iter,loss,updater): #返回训练损失和训练准确率
    """训练模型一个迭代周期（定义见第3章）。"""
    # 将模型设置为训练模式
    if isinstance(net,torch.nn.Module):
        net.train() #告诉pytorch我要计算梯度
    ## 训练损失总和、训练准确度总和、样本数
    metric = Accumulator(3)
    for X,y in train_iter:
        #计算梯度并更新参数
        y_hat = net(X) # 计算梯度并更新参数
        l = loss(y_hat,y)
        if isinstance(updater,torch.optim.Optimizer):
            #使用PyTorch内置的优化器和损失函数
            updater.zero_grad() #先把梯度设置为零
            l.backward() #计算梯度
            updater.step() #自更新
            metric.add(float(l)*len(l),accuracy((y_hat, y)),y.size().numel())
        else:
            #使用定制的优化器和损失函数
            # 使用定制的优化器和损失函数
            # 如果是自我实现的话，l出来就是向量，我们先做求和，再求梯度
            l.sum().backward()
            updater(X.shape[0])
            metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
        # 返回训练损失和训练准确率
        # metric[0]就是损失样本数目；metric[1]是训练正确的样本数；metric[2]是总的样本数
        return metric[0] / metric[2], metric[1] / metric[2]

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):  #@save
    """训练模型（定义见第3章）。"""
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch + 1, train_metrics + (test_acc,))
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc

小批量随机梯度下降来优化模型的损失函数]，设置学习率为0.1

lr = 0.1

def updater(batch_size):
    return d2l.sgd([W, b], lr, batch_size)

预测

def predict_ch3(net, test_iter, n=6):  #@save
    """预测标签（定义见第3章）。"""
    for X, y in test_iter:
        break
    trues = d2l.get_fashion_mnist_labels(y)
    preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
    titles = [true +'\n' + pred for true, pred in zip(trues, preds)]
    d2l.show_images(
        X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])

predict_ch3(net, test_iter)

小结

借助softmax回归，我们可以训练多分类的模型。
softmax回归的训练循环与线性回归中的训练循环非常相似：读取数据、定义模型和损失函数，然后使用优化算法训练模型。正如你很快就会发现的那样，大多数常见的深度学习模型都有类似的训练过程。

菜小鸡同志要永远朝光明前进

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
softmax回归的从零开始实现

初始化模型参数和之前线性回归的例子一样，这里的每个样本都将用固定长度的向量表示。原始数据集中的每个样本都是28×2828×28的图像。在本节中，我们[将展平每个图像，把它们看作长度为784的向量。]在后面的章节中，将讨论能够利用图像空间结构的更为复杂的策略，但现在我们暂时只把每个像素位置看作一个特征。因为我们的数据集有10个类别，所以网络输出维度为10)。因此，权重将构成一个784×10784×10的矩阵，偏置将构成一个1×101×10的行向量。与线性回归一样，我们将使用正态分布初始化我...
复制链接

扫一扫

专栏目录