365天深度学习训练营-第P1周：实现mnist手写数字识别

_yoking_____

已于 2023-03-13 21:05:19 修改

阅读量197

点赞数

分类专栏： 365天深度学习训练营-学习打卡文章标签：深度学习 python tensorflow pytorch

于 2023-03-12 11:01:47 首次发布

本文链接：https://blog.csdn.net/qq_40661227/article/details/129458808

版权

365天深度学习训练营-学习打卡专栏收录该内容

11 篇文章 0 订阅

订阅专栏

本文介绍了如何使用TensorFlow和PyTorch进行手写数字识别。首先，通过TensorFlow加载并预处理MNIST数据集，包括归一化、数据可视化和调整图片格式。然后，构建了一个基于CNN的LeNet-5网络模型，并进行了编译、训练和预测。在PyTorch部分，同样构建了LeNet-5模型并进行训练，展示了训练过程和结果可视化。

摘要由CSDN通过智能技术生成

● 难度：新手入门⭐
● 语言：Python3、Pytorch3、tensorflow2.11
🍺要求：
1、清楚tensorflow以及pytorch的训练基本流程
2、完成手写数字的训练以及预测

🍨 本文为🔗365天深度学习训练营中的学习记录博客
🍖 原作者：K同学啊|接辅导、项目定制

第1周-实现mnist手写数字识别

我的环境：

语言环境：python
IDE：jupyter notebook
深度学习框架：TensorFlow2

一、前期工作

1. 设置GPU（如果有GPU）

tensorflow

import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU")

if gpus:
	gpu0 = gpus[0] # 默认使用第一个gpu
	tf.config.experimental.set_memory_growth(gpu0, True) # 设置GPU显存用量按需使用
	tf.config.set_visible_devices([gpu0], 'GPU')

注：这里 tf.config.set_visible_devices([gpu0], 'GPU') 是为了减少内存碎片，更有效地利用设备上相对宝贵的 GPU 内存资源，将 TensorFlow 限制为使用一组特定的 GPU。

2. 导入数据

tensorflow

from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

# 导入mnist数据，依次分别为训练集图片，训练集标签，测试集图片，测试集标签
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()

3. 归一化

数据归一化的作用：

使不同量纲的特征处于同一数值量级，减少方差大的特征的影响，使模型更准确。
加快学习算法的收敛速度

归一化与标准化

# 将像素点值标准化至0到1区间内
train_images, test_images = train_images / 255.0, test_images / 255.0
# 查看数据维度
train_images.shape, test_images.shape, train_labels.shape, test_labels.shape

4.可视化数据集图片

# 将数据集前20个图片可视化显示
# 进行图像大小为20宽、10(5)长的绘图（单位为英寸inch）
# (可能是屏幕分辨率的原因，这里我的长改成5)
plt.figure(figsize=(20, 5))
# 遍历MNIST数据集下标数值0~49
for i in range(20):
    # 将整个figure分成5行10列，绘制第i+1个子图
    plt.subplot(2, 10, i+1)
    # 设置x轴不显示刻度
    plt.xticks([])
    # 设置y轴不显示刻度
    plt.yticks([])
    # 设置不显示子图网格
    plt.grid(False)
    # 图像展示，cmp为颜色图谱，“plt.cm.binary”为matplotlib.cm中的色表
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    # 设置x轴标签显示为图片对应的数字
    plt.xlabel(train_labels[i])
    
# 显示图片
plt.show(）

在这里插入图片描述

5.调整图片格式

# 调整数据到我们需要的格式
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))

# print
train_images.shape, test_images.shape, train_labels.shape, test_labels.shape

问题：为啥要将图片改成（60000，28，28，1）？
最后一维表示RGB通道数，因为是黑白图片，所以为1

二、构建CNN网络模型

网络结构图：
在这里插入图片描述

tensorflow

# 使用卷积网络来实现图像识别
# 卷积层：通过卷积操作对图像进行降维和特征提取
# 池化层：是一种非线性形式的下采样。主要用于特征降维，压缩数据和参数的数量，减小过拟合，同时提高模型的鲁棒性。
# 全连接层：在经过几个卷积和池化层之后，神经网络中的高级推理通过全连接层来完成。
model = models.Sequential([
    # 设置二维卷积层1，设置32个3*3的卷积核，activation参数将激活函数设置为ReLu函数，
    # input_shape参数将图层的输入形状设置为（28，28，1）
    # ReLu函数作为激活函数可以增强判定函数和整个神经网络的非线性特性，而本身不会改变卷积层
    # 相比其他函数来说，ReLu函数更青睐，因为够简单，能够提升网络的训练速度，而不会影响模型的泛化性
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    # 池化层1, 2*2采样
    layers.MaxPooling2D((2, 2)),
    # 设置二维卷积层2，设置64个3*3的卷积核，activation参数将激活函数设置为ReLu函数
    layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    # 池化层2, 2*2采样
    layers.MaxPooling2D((2, 2)),
    
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10) # 有十个手写数字
])

model.summary()

在这里插入图片描述

三、编译模型

tensorflow

# model.compile()方法用于在配置训练方法时，告知训练时用的优化器，损失函数和准确率评测标准
model.compile(
    # Adam 优化器
    optimizer='adam',
    # 设置交叉熵损失函数
    # from——logits为True时，会将y_pred转化为概率（softmax）
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), 
    # 设置性能指标列表，将在模型训练时监控列表中的指标
    metrics=['accuracy'])

四、训练模型

tensorflow

history = model.fit(
    # 输入训练集图片
    train_images, 
    # 标签
    train_labels,
    # 设置10个epoch，每一个epoch都将会把所有的数据输入模型完成一次训练
    epochs=10,
    # 设置验证集
    validation_data=(test_images, test_labels))

在这里插入图片描述

五、预测

plt.imshow(train_images[13])

在这里插入图片描述

def predict_num(model, test_data, index):
    pre = model.predict(test_data)
    num, ph = 0, 0;
    for i, p in enumerate(pre[index]):
        if p &gt; ph:
            ph = p
            num = i
            
    return num, ph

predict_num, p = predict_num(model, test_images, 13)

输出：（0， 26.312767）

神经网络训练程序的简单概括：

在这里插入图片描述

附录：pytorch版本完整训练代码

import pytorch
import torch.nn as nn
import matplotlib.pyplot as plt
import torchvision

# 设置GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

# 导入mnist数据
train_ds = torchvision.datasets.MNIST('data', 
                                      train=True, 
                                      transform=torchvision.transforms.ToTensor(), 
                                      download=False)

test_ds = torchvision.datasets.MNIST('data', 
                                     train=False, 
                                     transform=torchvision.transforms.ToTensor(), 
                                     download=False)

# 打包成dataloader
batch_size = 32

train_dl = torch.utils.data.DataLoader(train_ds, batch_size=batch_size, shuffle=True)
test_dl = torch.utils.data.DataLoader(test_ds, batch_size=batch_size)

# 查看一个 batch 的数据
imgs, labels = next(iter(train_dl))
print(imgs.shape)


# 数据可视化
import numpy as np

plt.figure(figsize=(20, 5))
for i, img in enumerate(imgs[:20]):
    # 维度缩减
    npimg = np.squeeze(img.numpy())
    
    plt.subplot(2, 10, i+1)
    plt.imshow(npimg, cmap=plt.cm.binary)
    plt.axis('off')

# 构建简单的CNN网络
mport torch.nn.functional as F

num_classes = 10 # 图片类别数

class LeNet_5(nn.Module):
    """LetNet-5"""
    def __init__(self):
        super().__init__()
        
        # 特征提取网络
        self.feature_net = nn.Sequential(*[nn.Conv2d(1, 32, kernel_size=3), 
                            nn.ReLU(),
                            nn.MaxPool2d(2),
                            nn.Conv2d(32, 64, kernel_size=3),
                            nn.ReLU(),
                            nn.MaxPool2d(2)])
        
        # 分类网络
        self.classes_net = nn.Sequential(nn.Linear(1600, 64), 
                                         nn.ReLU(), 
                                         nn.Linear(64, num_classes))
        
    def forward(self, x):
        x = self.feature_net(x)
        
        x = torch.flatten(x, start_dim=1)
        
        x = self.classes_net(x)
        
        return x

# 打印网络结构信息
from torchinfo import summary

model = LeNet_5().to(device)

summary(model)

# 设置超参数
loss_fn = nn.CrossEntropyLoss()
lr = 1e-2
opt = torch.optim.SGD(model.parameters(), lr=lr)

# 训练函数
# 训练循环
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset) # 训练集大小，一共60000张图片
    num_batches = len(dataloader) # 批次数目，1875（60000 / 32）
    
    train_loss, train_acc = 0, 0
    
    for X, y in dataloader:
        X, y = X.to(device), y.to(device)
        
        # 计算预测误差
        pred = model(X) # 网络输出
        loss = loss_fn(pred, y)
        
        # back
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # 记录acc和loss
        train_acc += (pred.argmax(1) == y).type(torch.float).sum().item()
        train_loss += loss.item()
        
    train_acc /= size
    train_loss /= num_batches
    
    return train_acc, train_loss

# 测试函数
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    
    test_acc, test_loss = 0, 0
    
    with torch.no_grad():
        for imgs, target in dataloader:
            imgs, target = imgs.to(device), target.to(device)
            
            # 计算loss
            target_pred = model(imgs)
            loss = loss_fn(target_pred, target)
            
            test_loss += loss.item()
            test_acc += (target_pred.argmax(1) == target).type(torch.float).sum().item()
            
    test_acc /= size
    test_loss /= num_batches
    
    return test_acc, test_loss

# 训练框架
epochs = 5
train_loss, train_acc = [], []
test_loss, test_acc = [], []

for epoch in range(epochs):
    model.train()
    epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, opt)
    
    model.eval()
    epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)
    
    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)
    
    template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}% Test_loss:{:.3f}')
    print(template.format(epoch+1, 
                          epoch_train_acc*100, epoch_train_loss, 
                          epoch_test_acc*100, epoch_test_loss))
print("Done")

# 结果可视化
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore') # 忽略警告信息

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.rcParams['figure.dpi'] = 300

epoch_range = range(epochs)

plt.figure(figsize=(12, 3))

plt.subplot(1, 2, 1)
plt.plot(epoch_range, train_acc, label='Training Accuracy')
plt.plot(epoch_range, test_acc, label='Test Accuracy')
plt.legend(loc='lower right')
plt.title('Training And Validation Accuracy')

plt.subplot(1,2,2)
plt.plot(epoch_range, train_loss, label='Training Loss')
plt.plot(epoch_range, test_loss, label='Test Loss')
plt.legend(loc='upper right')
plt.title('Training And Validation Loss')

plt.show()

# 预测
def predict_num(model, test_data, index):
    pre = model(test_data.to(device))
    num, ph = 0, 0;
    for i, p in enumerate(pre[index]):
        if p > ph:
            ph = p
            num = i            
    return num, ph

img_, label_ = next(iter(test_dl))

plt.imshow(np.squeeze(img_[7]))
label_[7]

predict_num(model, img_, 7)