Pytorch实践记录--FashionMNIST

天天向上sd

已于 2022-06-27 10:08:14 修改

阅读量2.9k

点赞数 8

文章标签： pytorch

于 2022-06-26 20:19:00 首次发布

原文链接：https://tangshusen.me/Dive-into-DL-PyTorch/#/

版权

根据教材自己操作了一遍，记录一下自己的学习过程

一、数据集介绍

FashionMNIST为服饰分类数据集，其包含的类别共有10种，数据集大小约为80M。

1.1 数据的获取

可以使用Pytorch提供的torchvision包，它主要由以下几部分构成：

torchvision.datasets：包含加载数据的函数以及常用的数据集接口，如CIFAR10、MNIST、ImagNet等
torchvision.models：包含常用的模型结构，如AlexNet、VGG、ResNet、ConvNet等
torchvision.transform：常用的图片变换，如ToTensor（转化成张量）、LinearTransformation（线性变换）、GaussianBlur（高斯模糊）等
torchvision.utils：包含一些处理方法，比如save_image（将张量保存为图片）、make_grid（差不多是拼图的意思）等

调用代码：

mnist_train = torchvision.datasets.FashionMNIST(root='', train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='', train=False, download=True, transform=transforms.ToTensor())

其中，root是数据文件的存放路径，如果为空的话，就会在当前python文件所在的目录创建文件夹用于存放下载的数据，参数transform=transform.ToTensor()将数据转化为Tensor，如果不进行转化，则返回的是PIL图片（PIL，Python Image Library，是Python的第三方图像处理库）。transform.ToTensor()将尺寸为(H,W,C)且数据位于[0,255]的PIL图片或者数据类型为np.uint8的NumPy数组转化为(C,H,W)且数据类型为torch.float32，取值位于[0.0,1.0]的Tensor。

上面得到mnist_train和mnist_test是torch.utils.data.Dataset的子类，因此可以使用该类的一些方法，如：

print(type(mnist_train))
print(len(mnist_train), len(mnist_test))

输出为：
<class 'torchvision.datasets.mnist.FashionMNIST'>
60000 10000


feature, label = mnist_train[0]
print(feature.shape, label)

输出为：
torch.Size([1, 28, 28]) tensor(9)
可以看出mnist_train的每一行是两个Tensor
通道数在第一位，并且标签用数字表示

将数字标签转化为文本标签的函数：

def get_labels(labels):
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]

在一行画出多张图片和对应标签的函数：

def show_fashion_mnist(images, labels):
    display.set_matplotlib_formats('svg')#规定显示格式为矢量图，需要导入IPython中的display，另外会提示该方法有新的代替方法
    _, figs = plt.subplots(1, len(images), figsize=(12, 12))#需要导入matplotlib中的pyplot，subplots方法是画子图，这里是一行，列数为images中的图片个数，figsize规定显示大小，这里的_表示忽略（不使用）的变量
    for f, img, lbl in zip(figs, images, labels):
        f.imshow(img.view((28, 28)).numpy())#这里的view将原来的3维变成了2维
        f.set_title(lbl)#将标签作为图片标题
        f.axes.get_xaxis().set_visible(False)
        f.axes.get_yaxis().set_visible(False)#不显示坐标轴
    plt.show()

调用上面的函数就可以显示出样本内容和标签，下面这段代码显示了10个样本的内容：

X, y = [], []
for i in range(10):
    X.append(mnist_train[i][0])
    y.append(mnist_train[i][1])
show_fashion_mnist(X, get_labels(y))

1.2 数据的读取

作为torch.utils.data.Dataset的子类，可以将数据传入torch.utils.data.DataLoader创建一个读取小批量数据样本的DataLoader实例（注意L大写），下面这段代码用于创建两个DataLoader对象：train_iter和test_iter：

batch_size = 256
if sys.platform.startswith('win'):
    num_workers = 0  # 0表示不用额外的进程来加速读取数据
else:
    num_workers = 4
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)

第一个参数dataset表示数据集，第二个参数batch_size指定一批次的大小，第三个参数shuffle表示是否打乱数据，True表示在每个epoch开始的时候打乱数据，第四个参数num_workers指定处理数据的进程数。完整的DataLoader拥有其他的参数，这里只指定这四个参数。

二、实现softmax回归

一般写代码的时候遵循模块化的原则，因此将上面处理数据的代码改写为函数，并且将整个任务分成若干小模块，用函数去实现每个模块，最后将函数组合在一起。

2.1 数据处理

def load_data_fashion_mnist(batch_size):
    if sys.platform.startswith('win'):
        num_workers = 0
    else:
        num_workers = 4
    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)
    return train_iter, test_iter

2.2 softmax运算的实现

def softmax(X):
    X_exp = X.exp()
    partition = X_exp.sum(dim=1, keepdims=True)
    return X_exp / partition

对于传入的一个2维Tensor，先进行指数运算，然后对每一行求和，sum()中的第一个参数dim=1表示同行元素求和，第二个参数keepdims=True使得结果保留行和列这两个维度，如果改为False，结果变为1维Tensor

2.3 模型的构造

这里只采用最简单的线性模型，即输入为图片包含的像素数，输入为图片的类别数，然后对结果进行softmax处理

def net(X):
    return softmax(torch.mm(X.view(-1, num_inputs), W) + b)

这里系数矩阵W和偏置项b没有作为函数的参数，因此需要事先给定好

2.4 损失函数的确定

对于分类问题采用交叉熵损失函数

def cross_entropy(y_hat, y):
    return - torch.log(y_hat.gather(1, y.view(-1, 1)))

y_hat为预测结果，y为真实标签，gather函数可以找到对应于真实标签的预测概率，然后计算负对数（因为交叉熵的公式为 $H(y,\hat{y})=- \sum_{j=1}^{q}y_j \log{\hat{y}_j}$ ，真实标签不是1就是0，故只需考虑对应的真实标签为1的预测概率）

2.5 计算分类准确率

def accuracy(y_hat, y):
    return (y_hat.argmax(dim=1) == y).float().sum().item()

argmax(dim=1)找出在每一行上最大值对应的列号，通过与真实标签对比可以反映出该样本是否分类正确，由于这是布尔类型的值因此先转化为浮点型，然后求和，最后通过item()获得张量的数值并返回，因此返回的是分类正确的样本数目

2.6 计算模型在数据集上的准确率

def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        acc_sum += accuracy(net(X), y)
        n += y.shape[0]
    return acc_sum / n

用分类正确的数目除以总数目得到准确率

2.7 定义梯度下降法

这里手动实现梯度下降法：

def sgd(params, lr, batch_size):
    for param in params:
        param.data -= lr * param.grad / batch_size

.data就是取出Tensor自带的数据，类似的还有一个.detach，detach在反向传播过程中发现原数据被修改时会报错，因此更加安全。param.grad就是梯度，lr为学习率

2.8 定义训练过程

def train(net, train_iter, test_iter, loss, num_epochs, batch_size,
          params = None, lr = None, optimizer = None):
#依次传入模型，训练数据，测试数据，损失函数，迭代次数，批次大小，参数，学习率，优化器
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0#分别记录损失，分类正确数，样本数
        start = time.time()#记录每一轮需要的时间
        for X, y in train_iter:#遍历全部的batch来训练，X是4维[256,1,28,28]，y是1维[256]，由于样本总数不能均分，最后一个batch包含的样本数为96个
            y_hat = net(X)#net会对X进行重排
            l = loss(y_hat, y).sum()#记录一个batch的预测损失

            if optimizer is not None:
                optimizer.zero_grad()#优化器梯度清零
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()#参数的梯度清零

            l.backward()#调用backward进行梯度求解，这样param.grad才可以使用
            if optimizer is None:
                sgd(params, lr, batch_size)#使用SGD更新参数
            else:
                optimizer.step()

            train_l_sum += l.item()#累加一个batch的损失
            train_acc_sum += accuracy(y_hat, y)累加一个batch分类正确的数目
            n += y.shape[0]累加一个batch的总样本数
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time_cost %.3fs'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc, time.time()-start))

2.9 初始化参数

batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size)

num_inputs = 784
num_outputs = 10

W = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_outputs)), dtype=torch.float)
b = torch.zeros(num_outputs, dtype=torch.float)
W.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)

num_epochs, lr = 10, 0.1

参数W用正态分布进行初始化，参数b初始化为0

2.10 训练

train(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W,b], lr)

2.11 完整代码

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import time
import sys
from IPython import display
import numpy as np

mnist_train = torchvision.datasets.FashionMNIST(root='~/FashionMNIST', train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='~/FashionMNIST', train=False, download=True, transform=transforms.ToTensor())

#用于展示数据的信息
# print(type(mnist_train))
# print(len(mnist_train), len(mnist_test))

def get_fashion_mnist_labels(labels):
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                  'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]

def use_svg_display():
    display.set_matplotlib_formats('svg')

def show_fashion_mnist(images, labels):
    use_svg_display()
    _, figs = plt.subplots(1, len(images), figsize=(12, 12))
    for f, img, lbl in zip(figs, images, labels):
        f.imshow(img.view((28, 28)).numpy())
        f.set_title(lbl)
        f.axes.get_xaxis().set_visible(False)
        f.axes.get_yaxis().set_visible(False)
    plt.show()

#用于展示10张样本数据
# X, y = [], []
# for i in range(10):
#     X.append(mnist_train[i][0])
#     y.append(mnist_train[i][1])
# show_fashion_mnist(X, get_fashion_mnist_labels(y))

def load_data_fashion_mnist(batch_size):
    if sys.platform.startswith('win'):
        num_workers = 0
    else:
        num_workers = 4
    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)
    return train_iter, test_iter
# start = time.time()
# for X, y in train_iter:
#     continue
# print('%.2f sec' % (time.time() - start))



def softmax(X):
    X_exp = X.exp()
    partition = X_exp.sum(dim=1, keepdims=True)
    return X_exp / partition

def net(X):
    return softmax(torch.mm(X.view(-1, num_inputs), W) + b)

def cross_entropy(y_hat, y):
    return - torch.log(y_hat.gather(1, y.view(-1, 1)))

def sgd(params, lr, batch_size):
    for param in params:
        param.data -= lr * param.grad / batch_size

def accuracy(y_hat, y):
    return (y_hat.argmax(dim=1) == y).float().sum().item()

def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        acc_sum += accuracy(net(X), y)
        n += y.shape[0]
    return acc_sum / n

def train(net, train_iter, test_iter, loss, num_epochs, batch_size,
          params = None, lr = None, optimizer = None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        start = time.time()
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()

            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()

            l.backward()
            if optimizer is None:
                sgd(params, lr, batch_size)
            else:
                optimizer.step()

            train_l_sum += l.item()
            train_acc_sum += accuracy(y_hat, y)
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time_cost %.3fs'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc, time.time()-start))

batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size)

num_inputs = 784
num_outputs = 10

W = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_outputs)), dtype=torch.float)
b = torch.zeros(num_outputs, dtype=torch.float)
W.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)

num_epochs, lr = 10, 0.1

train(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W,b], lr)

运行结果：

epoch 1, loss 0.7841, train acc 0.750, test acc 0.790, time_cost 6.778s
epoch 2, loss 0.5710, train acc 0.813, test acc 0.811, time_cost 5.651s
epoch 3, loss 0.5244, train acc 0.826, test acc 0.821, time_cost 5.628s
epoch 4, loss 0.5020, train acc 0.831, test acc 0.827, time_cost 5.593s
epoch 5, loss 0.4853, train acc 0.836, test acc 0.825, time_cost 5.619s
epoch 6, loss 0.4745, train acc 0.840, test acc 0.829, time_cost 5.695s
epoch 7, loss 0.4652, train acc 0.843, test acc 0.833, time_cost 5.680s
epoch 8, loss 0.4582, train acc 0.845, test acc 0.831, time_cost 5.641s
epoch 9, loss 0.4521, train acc 0.847, test acc 0.834, time_cost 5.625s
epoch 10, loss 0.4470, train acc 0.848, test acc 0.833, time_cost 5.663s

三、总结

了解torchvision的常用库，学会下载并使用内置的数据集
了解torch.utils.data.DataLoader的使用，学会利用已有的数据集构建训练集和测试集
学习模块化的编程思路，尽量将功能用单独的函数进行表达
学习backward()的用法，理解该方法的作用
了解简单线性模型的构造以及参数更新方法

四、使用Pytorch自带的模块实现上面的功能

4.1 模型搭建

将网络用一个类进行表示，并继承torch.nn.Module，这样就可以使用该类下面的方法

class LinearNet(nn.Module):
    def __init__(self, input_nums, output_nums):
        super(LinearNet, self).__init__()
        self.linear = nn.Linear(input_nums, output_nums)

    def forward(self, x):
        y = self.linear(x.view(x.shape[0], -1))
        return y

根据构造函数的定义，在创建类的实例时，需要指定输入和输出的数量分别作为线性层的输入和输出。作为nn.Module的继承类，该类的实例可以像方法一样被调用，此时会自动调用forward函数，比如这里会创建类的对象net = LinearNet(input_nums, output_nums)，然后上面的中会有net(X)，这个就是相当于调用了forward函数。

4.2 指定参数

和上面一样，指定batch_size，input_nums，output_nums，num_epochs

4.3 损失函数

torch.nn下面有许多损失函数，直接调用交叉熵函数

loss = nn.CrossEntropyLoss()

4.4 初始化网络参数

需要使用torch.nn下面的init，这是一个python文件，里面包含若干函数，直接调用就可以对参数进行处理

init.normal_(net.linear.weight, mean=0, std=0.01)
init.constant_(net.linear.bias, val=0)

4.5 优化器设置

torch.optim下有许多优化器，直接调用就行，需要指定网络参数和学习率

optimizer = torch.optim.SGD(net.parameters(), lr=0.1)

4.6 数据的处理

之前手动实现softmax中已经写好了获得数据的方法，因此可以调用上面代码中的load_data_fashion_mnist方法

4.7 训练

同样，调用上面代码中的train方法，传入对应的参数即可

完整代码如下：

import torch
import torch.nn as nn
from torch.nn import init
from FashionMNIST import train, load_data_fashion_mnist


class LinearNet(nn.Module):
    def __init__(self, input_nums, output_nums):
        super(LinearNet, self).__init__()
        self.linear = nn.Linear(input_nums, output_nums)

    def forward(self, x):
        y = self.linear(x.view(x.shape[0], -1))
        return y


batch_size = 256
input_nums = 784
output_nums = 10
num_epochs = 5

net = LinearNet(input_nums, output_nums)

loss = nn.CrossEntropyLoss()

init.normal_(net.linear.weight, mean=0, std=0.01)
init.constant_(net.linear.bias, val=0)

optimizer = torch.optim.SGD(net.parameters(), lr=0.1)

train_iter, test_iter = load_data_fashion_mnist(batch_size)

train(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)

要注意的是，从FashionMNIST导入方法train和load_data_fashion_mnist时，程序会顺序执行，因此FashionMNIST文件会先执行一次然后再执行当前文件，为了避免执行导入的文件，需要将FashionMNIST文件做一些修改，即加入__name__的判断。这样当前文件执行时，被导入的模块的__name__会变成模块的名字FashionMNIST，因此if判断为否，就不会执行后面的内容

if __name__ == '__main__':
    batch_size = 256
    train_iter, test_iter = load_data_fashion_mnist(batch_size)

    num_inputs = 784
    num_outputs = 10

    W = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_outputs)), dtype=torch.float)
    b = torch.zeros(num_outputs, dtype=torch.float)
    W.requires_grad_(requires_grad=True)
    b.requires_grad_(requires_grad=True)

    num_epochs, lr = 10, 0.1

    train(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W,b], lr)