深度学习：Pytorch实现全连接网络的MNIST数据集分类及性能指标（准确率、精确率、召回率和混淆矩阵）分析

Curry0330

已于 2024-07-18 11:44:55 修改

阅读量1k

点赞数 34

文章标签：深度学习 pytorch 分类人工智能

于 2024-07-16 11:20:53 首次发布

本文链接：https://blog.csdn.net/Curry0330/article/details/140443905

版权

深度学习：Pytorch实现全连接网络的MNIST数据集分类及性能指标（准确率、精确率、召回率和混淆矩阵）分析

以 $MN I ST$ 数据集和 $p y t orc h$ 包完成分类过程为例

人工智能发展的三个关键要素：数据、算法和算力

图片预处理

一、图像预处理的内涵

图像预处理是深度学习中的一项基础技术，旨在为模型提供标准化和归一化的输入。这一过程包括一系列操作，如灰度化、噪声去除、缩放、剪裁等，以确保所有图像具有统一的格式和特性。一个有效的预处理策略能显著提高模型的训练速度和性能。

二、图像预处理的主要策略

灰度化: 将彩色图像转化为黑白，减少计算复杂度。
噪声去除: 通过滤波器等技术消除图像中的随机干扰。
缩放和裁剪: 根据模型需求调整图像大小，确保其适应模型的输入要求。
归一化: 将像素值范围标准化至特定区间，如[0,1]或[-1,1]，有助于模型收敛。

常见归一化方法：【机器学习】数据归一化全方法总结：Max-Min归一化、Z-score归一化、数据类型归一化、标准差归一化等_min-max归一化-CSDN博客
数据增强: 通过旋转、翻转等方式增加训练数据多样性。

三、深度学习中的图像预处理

在深度学习中，预处理的重要性体现在其对模型训练和性能的影响上。预处理可以帮助解决由于数据差异带来的模型泛化能力下降的问题。适当的预处理能够提高模型的鲁棒性，使其在面对不同环境、光照、角度下的图像时，都能有稳定的性能表现。

导入库

import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np

对图片进行多种预处理操作

transform = transforms.Compose([
    # num_output_channels是输出图像的通道数，参数值为 1 或 3。 默认值：1，即灰度图
    transforms.Grayscale(num_output_channels=1),  # 将图片灰度化
    transforms.ToTensor(),  # 将图片数组转成张量
    transforms.Normalize((0.5,), (0.5,))  # 归一化操作
])

数据加载

将数据分为训练集80%、验证集10%、测试集10%

训练集
训练集用来训练模型，即确定模型的权重和偏置这些参数，通常我们称这些参数为学习参数。

验证集
而验证集用于模型的选择，更具体地来说，验证集并不参与学习参数的确定，也就是验证集并没有参与梯度下降的过程。验证集只是为了选择超参数，比如网络层数、网络节点数、迭代次数、学习率这些都叫超参数。比如在k-NN算法中，k值就是一个超参数。所以可以使用验证集来求出误差率最小的k。

测试集
测试集只使用一次，即在训练完成后评价最终的模型时使用。它既不参与学习参数过程，也不参数超参数选择过程，而仅仅使用于模型的评价。

训练集，验证集，测试集分别是什么_训练集验证集测试集-CSDN博客

# 将路径下的图片自动加载
train_dataset = datasets.ImageFolder(root=".\\mnist_data\\training", transform=transform)
test_dataset = datasets.ImageFolder(root=".\\mnist_data\\testing", transform=transform)
# 定义数据加载器
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)
# shuffle 用来打乱数据的顺序，防止过拟合提高模型的鲁棒性

对加载的图片进行检查

# 查看标签的类别
print(train_dataset.classes)
# 查看标签的内容
print(test_dataset.targets)
# 查看数据集图片
img, label = train_loader.dataset[12345]
img = np.array(img)
print(img.shape)  # (1, 28, 28)
# 图片输出时是二维的所以要进行reshape， plt默认显示彩色图片，cmap修改图片颜色
plt.imshow(img.reshape(28, 28), cmap='gray')
plt.show()

# 每一批训练数据的shape
for X, y in train_loader:
    print(X.shape)
    # print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(y.shape)
    # print(f"Shape of y: {y.shape} {y.dtype}")
# 输出如下：
torch.Size([128, 1, 28, 28]) 第一项表示batch大小，第二维表示颜色通道数， 第三、四维表示图片的高和宽
torch.Size([128])
Shape of X [N, C, H, W]: torch.Size([128, 1, 28, 28])
Shape of y: torch.Size([128]) torch.int64

$plt$ 的 $im s h o w$ 和 $s h o w$ 函数的区别

$im s h o w$ 接收一张图像，只是画出该图，并将图片写入你要放置的磁盘中
当 $im s h o w$ 函数的所有操作完毕之后，调用 $s h o w$ 函数才会把图像显示出来

设备检查

# 定义训练设备, 检查是否使用GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

定义模型

# 批大小，训练时每次输入到模型的数据的数量
batch_size = 128
# 定义学习率0.001
learning_rate = 1e-3
# 训练循环次数
epochs = 10

$batch\_size$ 影响模型的训练速度，是一批处理的数量，多个批次完成对数据集的一次遍历，根据实际计算机的能力去设置。批次的大小也会影响 $SG D$ 过程的稳定性，较大且适宜的大小会使得 $SG D$ 平稳的趋向最优解。若大小过大可能会进入到一个局部最优解，而非全局最优解。
详解随机梯度下降法（Stochastic Gradient Descent，SGD）_随机梯度下降公式-CSDN博客

$learing\_rate$ 学习率控制网络模型的学习进度,决定这网络能否成功或者需要多久成功找到全局最小值，从而得到全局最优解，也就是最优参数。
深度学习笔记(五)：学习率过大过小对于网络训练有何影响以及如何解决-CSDN博客
 【深度学习】学习率lr(Learning Rate)对精度和损失值的影响_lr0和lr1区别深度学习学习率-CSDN博客

$e p oc h s$ 训练迭代次数

定义随机数种子以确保可重复性，设置CPU生成随机数的种子，方便下次复现实验结果

seed = 42
torch.manual_seed(seed)

定义模型结构

class NeuralNetwork(nn.Module):
    # 构造函数
    def __init__(self):
        # 访问父类的构造方法
        super().__init__()
        # Flatten层用来将二维图片reshape为一维向量
        self.flatten = nn.Flatten()
        # 在构造方法里，定义网络的结构。Sequential是一种容器，允许用户按顺序去定义神经网络的各个层
        self.linear_relu_stack = nn.Sequential(
            # 线性层+激活层构成网络中的一层
            nn.Linear(28 * 28, 256),
            # 激活函数
            nn.ReLU(),
            nn.Linear(256, 256),
            nn.ReLU(),
            # 输出层, 输入256，输出10是分了10类
            nn.Linear(256, 10)
        )

    # 定义前向传播的过程，x是输入模型的数据
    def forward(self, x):
        x = self.flatten(x)
        # logits用来描述模型未经处理(未经过激活层处理)的输出值
        logits = self.linear_relu_stack(x)
        return logits

查看网络结构

# 实例化模型
model = NeuralNetwork().to(device)
# 查看网络结构
print(model)

# 输出
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=10, bias=True)
  )
)

定义损失函数和优化器

# 定义目标函数，使用交叉熵函数作为目标函数，即损失函数
loss_fn = nn.CrossEntropyLoss()
# 定义优化器(反向传播——随机梯度下降的实现)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

定义训练过程

def train(dataloader, model, loss_fn, optimizer):
    # 训练集大小
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        # 计算预测值
        predict = model(X)
        # 计算损失值
        loss = loss_fn(predict, y)
        # 反向传播, backward()是用于自动计算梯度并进行反向传播的方法
        loss.backward()
        # 更新神经网络模型中的参数
        optimizer.step()
        # 清楚之前的计算梯度, torch中的梯度计算时，若不进行清除会导致梯度累加
        optimizer.zero_grad()

        # 显示当前训练了多少数据
        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss:{loss:>7f} [{current:>5d}/{size:>5d}]")

定义测试过程

def test(dataloader, model, loss_fn):
    # 训练集大小
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    # 设置为评估模式
    model.eval()
    test_loss, correct = 0, 0
    # 测试过程中不再进行梯度计算
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            # 计算预测值
            predict = model(X)
            # 计算整个数据集总的loss
            test_loss += loss_fn(predict, y).item()
            # 计算总正确率, argmax返回最大值的索引
            correct += (predict.argmax(1) == y).type(torch.float).sum().item()
            test_loss /= num_batches
            correct /= size
            print(f"Test Error: \nAccuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
            return correct, test_loss

绘制图像

# 存储迭代次数
iterations = []
accuracies = []
losses = []

# 初始化图形
plt.ion()
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

ax1.set_title("Accuracy over iterations")
ax1.set_xlabel("Iterations")
ax1.set_ylabel("Accuracy")
accuracy_line, = ax1.plot([], [], 'b')

ax2.set_title("Loss over iterations")
ax2.set_xlabel("Iterations")
ax2.set_ylabel("Loss")
loss_line, = ax2.plot([], [], 'r')


# 实时更新图形的函数
def update_plot(iteration, accuracy, loss):
    # 添加元素到列表的最后面
    iterations.append(iteration)
    accuracies.append(accuracy)
    losses.append(loss)

    # 更新数据
    accuracy_line.set_data(iterations, accuracies)
    loss_line.set_data(iterations, losses)

    # 更新坐标轴范围
    ax1.set_xlim(0, max(iterations))
    ax1.set_ylim(0, 1)
    ax2.set_xlim(0, max(iterations))
    ax2.set_ylim(0, max(losses) if losses else 1)

    plt.draw()
    plt.pause(0.1)

输出
输出生成的图片

运行测试 $t r ain$ 和 $t es t$

if __name__ == '__main__':
    for i in range(epochs):
        print(f"Epoch {i+1}\n--------------------------")
        train(train_loader, model, loss_fn, optimizer)
        acc, loss = test(test_loader, model, loss_fn)
        update_plot(i, acc, loss)
        draw(predicts, labels)
        # print(calculate_confusion_matrix(predicts, labels, num_classes))
        calculate_confusion_matrix(predicts, labels, num_classes)
    print("Done!")
    plt.show()
    os.system("pause")

模型性能判断指标

计算混淆矩阵（不使用 $s k l e a r n$ 库），并计算准确率、精确率和召回率

def calculate_confusion_matrix(predict, gt, n_classes):
    confusion_matrix = np.zeros((n_classes, n_classes))
    for index in range(len(gt)):
        confusion_matrix[gt[index], predict[index]] += 1
    accuracy = np.sum(np.diag(confusion_matrix)) / np.sum(confusion_matrix)
    precision = np.diag(confusion_matrix) / np.sum(confusion_matrix, axis=0)
    recall = np.diag(confusion_matrix) / np.sum(confusion_matrix, axis=1)
    print(f"Accuracy: {accuracy * 100},\nPrecision: {precision * 100},\nRecall: {recall * 100}\n")
    return confusion_matrix.astype(np.uint8)

使用 $s k l e a r n$ 库，绘制图并计算准确率、精确率和召回率

def draw(predicts, labels):
    confusion_mat = confusion_matrix(labels, predicts)
    print(confusion_mat)
    plt.figure(figsize=(8, 6))
    plt.imshow(confusion_mat, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title("Confusion Matrix")
    plt.colorbar()
    tick_marks = np.arange(10)
    plt.xticks(tick_marks, tick_marks)
    plt.yticks(tick_marks, tick_marks)

    thresh = confusion_mat.max() / 2.
    for i in range(confusion_mat.shape[0]):
        for j in range(confusion_mat.shape[1]):
            plt.text(j, i, format(confusion_mat[i, j], 'd'),
                     ha="center", va="center",
                     color="white" if confusion_mat[i, j] > thresh else "black")
    plt.xlabel("Predicted Label")
    plt.ylabel("True Label")
    plt.show()

    # 计算Accuracy
    accuracy = np.sum(np.diag(confusion_mat)) / np.sum(confusion_mat)
    # 计算Recall
    recall = np.diag(confusion_mat) / np.sum(confusion_mat, axis=1)
    # 计算Precision
    precision = np.diag(confusion_mat) / np.sum(confusion_mat, axis=0)
    print(f"Accuracy: {accuracy * 100},\nPrecision: {precision * 100},\nRecall: {recall * 100}\n")