PyTorch基于全连接神经网络的手写数字识别

哈尔查理斯

已于 2022-03-25 11:07:12 修改

阅读量1.9k

点赞数 1

文章标签：深度学习机器学习 pytorch

于 2022-03-22 21:03:45 首次发布

本文链接：https://blog.csdn.net/qq_27849725/article/details/123671387

版权

PyTorch基于全连接神经网络的手写数字识别

上一篇文章我们学习了使用pytorch搭建神经网络求解二分类问题，而手写数字识别也是一个分类问题：0到9十个数字，就是10分类问题。我们预期只用把求解二分类问题的神经网络稍加改造，就能解决手写数字识别问题。

在二分类问题中，我们只需要一个隐藏层，训练结果就能比较好的收敛，本篇文章我们使用了两个隐藏层，其中每个隐藏层有30个神经元，经过5000步梯度下降后，训练集上的误差能降低到0.1左右。

本篇文章中使用的数据集是MNIST数据集，本篇文章我们没有使用torchvision的数据加载器，而是从mnist官网上下载的mnist数据集，然后手动读取数据集中的数据。这一方面是因为我还不会用数据加载器，另一方面是想复用二分类问题的代码。另外，我们也没有设置batch来减小计算量，每一次前馈过程都会把整个训练集上计算一遍，所以训练的比较慢，本篇的示例代码默认使用GPU加速计算，没有GPU或者没配置Cuda的同学想要运行需要适当更改一些代码。

简单测试一下，在我的GeForce 940MX笔记本显卡上，跑5000步耗时278秒，在同学的GeForce RTX 2080 Ti桌面显卡上，跑5000耗时16秒

这篇文章是用Jupyter Notebook写的，源代码已经放到了我的Gitee上，刚开始学习深度学习的同学可以下载下来，试着更改神经网络的层数、学习率、梯度下降算法、神经元的数量等，来看看不同的参数对学习效率的影响如何

import numpy as np
import matplotlib.pyplot as plt
import torch
import torchvision

从MNIST文件中加载数据

我们先把MNIST数据集从官网上下载下来，一共有四个文件：

train-images-idx3-ubyte.gz: 训练集图片 (9912422 bytes)

train-labels-idx1-ubyte.gz: 训练集标签 (28881 bytes)

t10k-images-idx3-ubyte.gz: 测试集图片 (1648877 bytes)

t10k-labels-idx1-ubyte.gz: 测试集标签 (4542 bytes)

我们按照官网上的说明把这4个文件解压，并且重命名成好认识的形式

def read_mnist_data(lab_path, img_path):
    lab_file = open(lab_path, 'rb')
    img_file = open(img_path, 'rb')
    labels = np.fromfile(lab_file, offset=8, dtype=np.uint8) #label文件中的数据从第8个字节开始
    images = np.fromfile(img_file, offset=16, dtype=np.uint8).reshape((len(labels), 784)) #image文件中的数据从第16个字节开始
    lab_file.close()
    img_file.close()
    return labels, images

train_labs, train_imgs = read_mnist_data("./train_labs", "./train_imgs") #加载训练集。建议把源代码文件和数据文件放到一个文件夹里，方便索引
test_labs, test_imgs = read_mnist_data("./test_labs", "./test_imgs") #加载测试集

print("train_img num: ", len(train_labs)) #查看训练集中图片的数量
print("test_img num: ", len(test_labs))

#显示一张图片看看数据是否加载成功
plt.imshow(train_imgs[0:1].reshape(28, 28), cmap='gray') #cmap='gray'：显示灰度图
print("lab=",train_labs[0])

train_img num:  60000
test_img num:  10000
lab= 5

在这里插入图片描述

构建神经网络

结构上和二分类问题的神经网络一样，只不过多了一层隐藏层，神经元的数量也增加了很多

class neural_net(torch.nn.Module):
    def __init__(self):
        super(neural_net, self).__init__()
        self.l1 = torch.nn.Linear(28*28, 30)
        self.l2 = torch.nn.Linear(30, 30)
        self.lo = torch.nn.Linear(30,10)
        
    def forward(self, input):
        out = torch.relu(self.l1(input))
        out = torch.relu(self.l2(out))
        out = self.lo(out)
        return out

net = neural_net()
print(net)

neural_net(
  (l1): Linear(in_features=784, out_features=30, bias=True)
  (l2): Linear(in_features=30, out_features=30, bias=True)
  (lo): Linear(in_features=30, out_features=10, bias=True)
)

格式化数据

X = torch.from_numpy(train_imgs).type(torch.FloatTensor)
Y = torch.from_numpy(train_labs).type(torch.LongTensor)

#把神经网络和训练集搬到GPU上
net.cuda()
X = X.cuda()
Y = Y.cuda()
print("X.device:", X.device)
print("Y.device:", Y.device)

X.device: cuda:0
Y.device: cuda:0

开始训练

optimizer = torch.optim.SGD(net.parameters(), lr = 0.003)
loss_func = torch.nn.CrossEntropyLoss()

import time
t0 = time.time_ns()
loss = 0

for i in range(5000):
    Y_hat = net(X)
    loss = loss_func(Y_hat, Y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if i%200 == 0:
        print("step=",i, "loss=", loss.data.item())

t1 = time.time_ns()
print("in the end loss=", loss.data.item())
print("running time=%ds" %((t1-t0)/1000000000))

step= 0 loss= 13.057044982910156
step= 200 loss= 0.42877089977264404
step= 400 loss= 0.3179222345352173
step= 600 loss= 0.26870378851890564
step= 800 loss= 0.2374461144208908
step= 1000 loss= 0.21529117226600647
... ...
step= 4000 loss= 0.11838517338037491
step= 4200 loss= 0.1155635267496109
step= 4400 loss= 0.11289114505052567
step= 4600 loss= 0.11045588552951813
step= 4800 loss= 0.10818389803171158
in the end loss= 0.10601907223463058
running time=278s

查看训练结果

X = torch.from_numpy(test_imgs).type(torch.FloatTensor).cuda()
Y = torch.from_numpy(test_labs).type(torch.LongTensor).cuda()
Y_hat = net.forward(X)

loss = loss_func(Y_hat, Y) #测试集上的误差
print("loss on test data: ",loss.data.item())

#获取数组a中最大值的下标
def get_max(a):
    max = a.max();
    max_i = 0;
    for i in range(len(a)):
        if a[i] == max:
            return i


#下面的代码查看测试集中前500张被误判的图片长什么样子
Y_hat = Y_hat.cpu().detach().numpy() #把预测值从GPU搬到CPU，并转换成numpy数组
j = 1
for i in range(500):
    p_value = get_max(Y_hat[i]) #训练好的模型预测出来的值
    r_value = test_labs[i] #训练集的标签，即实际值
    
    if(p_value != r_value): #预测值和实际值不相等
        print("预测值:%d, 实际值:%d" % (p_value, r_value))
        plt.subplot(8,8,j)
        plt.imshow(test_imgs[i].reshape(28,28), cmap='gray')
        j += 1

loss on test data:  0.1586684137582779
预测值:8, 实际值:3
预测值:8, 实际值:2
预测值:8, 实际值:0
预测值:3, 实际值:8
预测值:5, 实际值:9
预测值:6, 实际值:4
预测值:1, 实际值:8
预测值:0, 实际值:6
预测值:7, 实际值:2
预测值:3, 实际值:5
预测值:7, 实际值:3
预测值:3, 实际值:2
预测值:7, 实际值:8
预测值:0, 实际值:6
预测值:8, 实际值:5
预测值:0, 实际值:8

在这里插入图片描述