Datawhale-动手学CV-Pytorch-MNIST分类实战代码解读-CSDN博客

本文链接：https://blog.csdn.net/2302_77608969/article/details/141102020

MNIST数据集介绍

MNIST数据集(Mixed National Institute of Standards and Technology database)是美国国家标准与技术研究院收集整理的大型手写数字数据库。包含60,000个示例的训练集以及10,000个示例的测试集，其中训练集 (training set) 由来自 250 个不同人手写的数字构成, 其中 50% 是高中学生, 50% 来自人口普查局 (the Census Bureau) 的工作人员，测试集(test set) 也是同样比例的手写数字数据。

MNIST数据集的图像尺寸为28 * 28，且这些图像只包含灰度信息，灰度值在0~1之间。

1.导入库

import torch
import torch.nn as nn
import numpy as np
from torch import optim
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torchvision.datasets import mnist
from torchvision import transforms
import matplotlib.pyplot as plt

a brief explanation of each import and its purpose

import torch: This imports the main PyTorch library, which is used for tensor computation and deep learning.
import torch.nn as nn: This imports the torch.nn module, which contains classes and functions to build neural networks.
import numpy as np: This imports NumPy, a library for numerical operations in Python, often used for array manipulations.
from torch import optim: This imports the optimization package from PyTorch, which contains various optimization algorithms like SGD, Adam, etc.
from torch.autograd import Variable: This imports the Variable class from PyTorch’s autograd package, which is used to wrap tensors and record operations for automatic differentiation.
from torch.utils.data import DataLoader: This imports the DataLoader class, which provides an efficient way to load and iterate over datasets.
from torchvision.datasets import mnist: This imports the MNIST dataset from the torchvision package, which is a collection of popular datasets and model architectures.
from torchvision import transforms: This imports the transforms module from torchvision, which provides common image transformations for data preprocessing.
import matplotlib.pyplot as plt: This imports the pyplot module from Matplotlib, which is used for plotting and visualizing data.

①what is optimization algorithms？

Optimization algorithms are techniques used to find the best solution (or optimal solution) to a problem by minimizing or maximizing a specific objective function. These algorithms are fundamental in various fields, including machine learning, operations research, and engineering.

Here are some key points about optimization algorithms:

Objective Function: This is the function that needs to be optimized. It could be a cost function that needs to be minimized or a reward function that needs to be maximized.

Types of Optimization Algorithms:

Gradient-Based Algorithms: These use the gradient (first derivative) of the objective function to find the optimal solution. Examples include Gradient Descent, Adam, and RMSProp.
Gradient-Free Algorithms: These do not require the gradient of the objective function. Examples include Genetic Algorithms, Simulated Annealing, and Particle Swarm Optimization.

Constraints: Some optimization problems have constraints that the solution must satisfy. These constraints can be equality constraints (e.g., (x + y = 1)) or inequality constraints (e.g., (x \geq 0)).

Applications: Optimization algorithms are used in various applications, such as training machine learning models, optimizing supply chains, and designing engineering systems.

2.定义网络结构

2.1 Initialization Method

def __init__(self, in_c=784, out_c=10):
    super(Net, self).__init__()

The __init__ method is the constructor for the class. It initializes the neural network. The super(Net, self).__init__() line calls the constructor of the parent class nn.Module.

① self 参数

self指的是实例Instance本身，在Python类中规定，函数的第一个参数是实例对象本身，并且约定俗成，把其名字写为self，也就是说，类中的方法的第一个参数一定要是self，而且不能省略。

self指的是实例本身，而不是类
self可以用this替代，但是不要这么去写
类的方法中的self不可以省略

② __init__ ()方法

在python中创建类后，通常会创建一个__init__ ()方法，这个方法会在创建类的实例的时候自动执行。 __init__ ()方法必须包含一个self参数，而且要是第一个参数。

在__init__()中，除了第一个参数必须是self之外，还可以定义其他变量（如是则在创建实例的时候就可以实现一些操作了），这样在之后的其他方法里面也可以用对应参数。
class Person():
    def __init__(self,if_like_study):
        print("是否喜欢学习？")
        self.if_like_study=if_like_study
    def like(self):
        print("%s喜欢学习" %self.if_like_study)

Bob=Person('当然')
Bob.like()
③ super(Net, self).__init__()

Python中的super(Net, self).__init__()是指首先找到Net的父类（比如是类NNet），然后把类Net的对象self转换为类NNet的对象，然后“被转换”的类NNet对象调用自己的init函数，其实简单理解就是子类把父类的__init__()放到自己的__init__()当中，这样子类就有了父类的__init__()的那些东西。

注：

super的内核——mro（method resolution order）：类继承体系中的成员解析顺序

图源（推荐阅读）self参数 - __ init__ ()方法 super(Net, self).__init__() - cltt - 博客园 (cnblogs.com)

super的具体用法
class Person:
    def __init__(self,name,gender):
        self.name = name
        self.gender = gender
    def printinfo(self):
        print(self.name,self.gender)

class Stu(Person):
    def __init__(self,name,gender,school):
        super(Stu, self).__init__(name,gender) # 使用父类的初始化方法来初始化子类
        self.school = school
    def printinfo(self): # 对父类的printinfo方法进行重写
        print(self.name,self.gender,self.school)

if __name__ == '__main__':
    stu = Stu('djk','man','nwnu')
    stu.printinfo()

2.2 Fully Connected Layers

self.fc1 = nn.Linear(in_c, 512)
self.fc2 = nn.Linear(512, 256)
self.fc3 = nn.Linear(256, 128)
self.fc4 = nn.Linear(128, out_c)

These lines define four fully connected (linear) layers:

fc1: Takes an input of size in_c (default 784) and outputs 512 features.
fc2: Takes 512 features and outputs 256 features.
fc3: Takes 256 features and outputs 128 features.
fc4: Takes 128 features and outputs out_c (default 10) features.

这段代码定义了一个神经网络的前向传播部分，使用了四个全连接层（也称为线性层）。每个层的输入和输出特征数。

其中，nn.Linear的详细含义为：

2.3 Activation Layers

self.act1 = nn.ReLU(inplace=True)
self.act2 = nn.ReLU(inplace=True)
self.act3 = nn.ReLU(inplace=True)

These lines define three ReLU (Rectified Linear Unit) activation functions. The inplace=True argument means that the operation will be done in-place, saving memory.

在下面这个例子中，输入张量 x 中的负值被变为 0，正值保持不变。
import torch
import torch.nn as nn

x = torch.tensor([-1.0, 0.0, 1.0])
relu = nn.ReLU(inplace=True)
y = relu(x)
print(y)  # 输出: tensor([0., 0., 1.])

2.4 Forward Method

def forward(self, x):
    x = self.act1(self.fc1(x))
    x = self.act2(self.fc2(x))
    x = self.act3(self.fc3(x))
    x = self.fc4(x)
    return x

The forward method defines the forward pass of the network. It takes an input x and passes it through each layer and activation function in sequence:

x is passed through fc1 and then act1.
The result is passed through fc2 and then act2.
The result is passed through fc3 and then act3.
Finally, the result is passed through fc4.

2.5 Creating an Instance

net = Net()

在以上代码的情况下，net = Net() 会创建一个包含四个全连接层和三个 ReLU 激活函数的神经网络实例。

2.6 loading data

2.6.1 加载训练集与测试集

train_set = mnist.MNIST('./data', train=True, transform=transforms.ToTensor(), download=True)
test_set = mnist.MNIST('./data', train=False, transform=transforms.ToTensor(), download=True)

mnist.MNIST：加载 MNIST 数据集。
./data：数据存储路径。
train=True：加载训练集。
train=False：加载测试集。
transform=transforms.ToTensor()：将数据转换为 PyTorch 张量。
download=True：如果数据集不存在，则下载。

2.6.2 创建数据加载器

train_data = DataLoader(train_set, batch_size=64, shuffle=True)
test_data = DataLoader(test_set, batch_size=128, shuffle=False)

DataLoader：用于加载数据集。
batch_size：每个批次的数据量。
shuffle：是否打乱数据。

2.7 训练集的数据可视化

import random
for i in range(4):
    ax = plt.subplot(2, 2, i+1)
    idx = random.randint(0, len(train_set))
    digit_0 = train_set[idx][0].numpy()
    digit_0_image = digit_0.reshape(28, 28)
    ax.imshow(digit_0_image, interpolation="nearest")
    ax.set_title('label: {}'.format(train_set[idx][1]), fontsize=10, color='black')
plt.show()

使用 random.randint 随机选择训练集中的样本。
将样本转换为 NumPy 数组并重塑为 28x28 的图像。
使用 plt.imshow 显示图像。
设置图像标题为对应的标签。

其中：

plt.subplot(2, 2, i+1)：创建一个 2x2 的子图网格，并选择第 i+1 个子图（在 plt.subplot 中，子图的起始索引是 1，而不是 0。索引从 1 开始，按行优先顺序排列。例如，plt.subplot(2, 2, 1) 表示在一个 2x2 的网格中选择第一个子图）。
ax：表示当前子图的轴对象，可以用来在该子图上绘制图像。
random.randint(0, len(train_set))：生成一个从 0 到 len(train_set) 之间的随机整数 idx，用于随机选择训练集中的一个样本。
train_set[idx][0]：获取训练集中第 idx 个样本的图像数据。
.numpy()：将图像数据从 PyTorch 张量转换为 NumPy 数组，便于后续处理。
digit_0.reshape(28, 28)：将一维数组 digit_0 重塑为 28x28 的二维数组，表示图像的像素矩阵。
ax.imshow(digit_0_image, interpolation="nearest")：在当前子图 ax 上显示图像 digit_0_image。
interpolation="nearest"：设置插值方法为最近邻插值，显示图像时不会进行平滑处理。
train_set[idx][1]：获取训练集中第 idx 个样本的标签（即该图像对应的数字）。
ax.set_title('label: {}'.format(train_set[idx][1]), fontsize=10, color='black')：设置当前子图的标题为该样本的标签，字体大小为 10，颜色为黑色。

2.8 定义损失函数与优化器

cost function的选择要点

常见cost function示例

# 定义损失函数--交叉熵
criterion = nn.CrossEntropyLoss()

# 定义优化器---随机梯度下降
optimizer = optim.SGD(net.parameters(), lr=1e-2, weight_decay=5e-4)

nn.CrossEntropyLoss 是 PyTorch 中用于多分类问题的损失函数。它结合了 nn.LogSoftmax 和 nn.NLLLoss（负对数似然损失），用于衡量模型预测的概率分布与真实标签之间的差异。

optimizer = optim.SGD(net.parameters(), lr=1e-2, weight_decay=5e-4)

optim.SGD 是 PyTorch 中的随机梯度下降优化器，用于更新模型参数以最小化损失函数。

其中，

学习率的影响

学习率（Learning Rate）是训练神经网络时的一个关键超参数，它决定了每次更新模型参数时的步长大小。学习率的选择对模型的训练效果和收敛速度有重要影响：

学习率过大：

跳过最优解：如果学习率过大，模型可能会在训练过程中跳过最优解，导致训练结果不稳定 1。
不收敛：模型可能在最优值附近徘徊，无法收敛到最优解 1。

学习率过小：

收敛缓慢：如果学习率过小，模型的训练速度会非常慢，需要更多的训练时间和计算资源 1。
局部最优解：模型可能会陷入局部最优解，无法找到全局最优解 1。

动态调整学习率：

学习率衰减：在训练过程中逐渐减小学习率，可以帮助模型更稳定地收敛到最优解 1。
自适应学习率：使用自适应学习率算法（如 Adam、Adagrad），可以根据训练过程动态调整学习率，提高训练效果 2。

权重的影响

权重（Weights）是神经网络中连接神经元的参数，它们在训练过程中通过反向传播算法进行更新。权重的初始化和更新方式对模型的性能有重要影响：

权重初始化：

良好的初始化：适当的权重初始化可以帮助模型更快地收敛，并避免梯度消失或梯度爆炸问题 3。
不良的初始化：不当的权重初始化可能导致训练过程中的不稳定，影响模型的最终性能 3。

权重更新：

反向传播：在每次迭代中，权重根据损失函数的梯度进行更新，以最小化损失函数 3。
正则化：通过添加权重衰减（如 L2 正则化），可以防止模型过拟合，提高模型的泛化能力 3。

总结

学习率和权重是影响神经网络训练效果的两个重要因素。合适的学习率可以加快模型的收敛速度，而良好的权重初始化和更新策略可以提高模型的性能和稳定性。

2.9 前向传播与反向传播

2.9.1 初始化及训练循环

# 记录训练损失
losses = []
# 记录训练精度
acces = []
# 记录测试损失
eval_losses = []
# 记录测试精度
eval_acces = []
# 设置迭代次数
nums_epoch = 20

for epoch in range(nums_epoch):
    train_loss = 0
    train_acc = 0
    net = net.train()

for epoch in range(nums_epoch)：循环进行 nums_epoch 次训练。
train_loss 和 train_acc：初始化每个 epoch 的训练损失和精度。
net.train()：将模型设置为训练模式。

2.9.2 批次训练

    for batch, (img, label) in enumerate(train_data):
        img = img.reshape(img.size(0), -1)
        img = Variable(img)
        label = Variable(label)

for batch, (img, label) in enumerate(train_data)：遍历训练数据集的每个批次。
img.reshape(img.size(0), -1)：将图像数据重塑为二维张量。
Variable(img) 和 Variable(label)：将图像和标签数据转换为 PyTorch 的 Variable 对象。

其中，

for batch, (img, label) in enumerate(train_data):
# batch 是当前批次的索引
# img 是当前批次的图像数据
# label 是当前批次的标签数据

enumerate(train_data)：enumerate 函数会为 train_data 中的每个元素生成一个索引和值的元组。train_data 是一个 DataLoader 对象，包含了训练数据集。

batch：

batch 是当前批次的索引，从 0 开始递增。
例如，如果 train_data 有 100 个批次，batch 的值将从 0 到 99。
'''
enumerate 是 Python 内置函数，
用于在遍历可迭代对象（如列表、元组等）时生成索引和值的元组。

'''

#默认情况下，索引从 0 开始递增
data = ['a', 'b', 'c']
for index, value in enumerate(data):
    print(index, value)

#如果需要自定义起始索引，可以使用 enumerate 的第二个参数。例如，从 1 开始
for index, value in enumerate(data, start=1):
    print(index, value)
(img, label)：

img 和 label 是当前批次的数据和标签。
img：包含当前批次的图像数据，通常是一个张量。
label：包含当前批次的标签数据，通常是一个张量。

2.9.3 前向传播与反向传播

        # 前向传播
        out = net(img)
        loss = criterion(out, label)
        # 反向传播
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

out = net(img)：将输入图像传入模型，得到输出。
loss = criterion(out, label)：计算输出与真实标签之间的损失，使用交叉熵损失函数衡量预测概率与真实标签之间的差异。
optimizer.zero_grad()：清零梯度。
loss.backward()：反向传播，计算梯度。
optimizer.step()：更新模型参数。

2.9.4 记录误差与计算准确率

        # 记录误差
        train_loss += loss.item()
        # 计算分类的准确率
        _, pred = out.max(1)
        num_correct = (pred == label).sum().item()
        acc = num_correct / img.shape[0]

        if (batch + 1) % 200 == 0:
            print('[INFO] Epoch-{}-Batch-{}: Train: Loss-{:.4f}, Accuracy-{:.4f}'.format(epoch + 1,
                                                                                         batch + 1,
                                                                                         loss.item(),
                                                                                         acc))
        train_acc += acc

train_loss += loss.item()：累加当前批次的损失。
_, pred = out.max(1)：获取预测结果。
num_correct = (pred == label).sum().item()：计算预测正确的样本数。
acc = num_correct / img.shape[0]：计算当前批次的准确率。
if (batch + 1) % 200 == 0：每 200 个批次打印一次训练信息。
train_acc += acc：累加当前批次的准确率。

2.9.5 记录每个epoch的损失与准确率

    losses.append(train_loss / len(train_data))
    acces.append(train_acc / len(train_data))

losses.append(train_loss / len(train_data))：记录当前 epoch 的平均训练损失。
acces.append(train_acc / len(train_data))：记录当前 epoch 的平均训练准确率。

2.9.6 测试集评估

    eval_loss = 0
    eval_acc = 0
    # 测试集不训练
    for img, label in test_data:
        img = img.reshape(img.size(0), -1)
        img = Variable(img)
        label = Variable(label)

        out = net(img)
        loss = criterion(out, label)
        # 记录误差
        eval_loss += loss.item()

        _, pred = out.max(1)
        num_correct = (pred == label).sum().item()
        acc = num_correct / img.shape[0]

        eval_acc += acc
    eval_losses.append(eval_loss / len(test_data))
    eval_acces.append(eval_acc / len(test_data))

eval_loss 和 eval_acc：初始化每个 epoch 的测试损失和精度。
for img, label in test_data：遍历测试数据集的每个批次。
img.reshape(img.size(0), -1)：将图像数据重塑为二维张量。
Variable(img) 和 Variable(label)：将图像和标签数据转换为 PyTorch 的 Variable 对象。
out = net(img)：将输入图像传入模型，得到输出。
loss = criterion(out, label)：计算输出与真实标签之间的损失。
eval_loss += loss.item()：累加当前批次的测试损失。
_, pred = out.max(1)：获取预测结果。
num_correct = (pred == label).sum().item()：计算预测正确的样本数。
acc = num_correct / img.shape[0]：计算当前批次的准确率。
eval_acc += acc：累加当前批次的准确率。
eval_losses.append(eval_loss / len(test_data))：记录当前 epoch 的平均测试损失。
eval_acces.append(eval_acc / len(test_data))：记录当前 epoch 的平均测试准确率。

2.9.7 打印每个epoch的训练和测试结果

    print('[INFO] Epoch-{}: Train: Loss-{:.4f}, Accuracy-{:.4f} | Test: Loss-{:.4f}, Accuracy-{:.4f}'.format(
        epoch + 1, train_loss / len(train_data), train_acc / len(train_data), eval_loss / len(test_data),
        eval_acc / len(test_data)))

打印当前 epoch 的训练和测试损失及准确率。

2.10 结果可视化

lt.figure()
plt.suptitle('Test', fontsize=12)
ax1 = plt.subplot(1, 2, 1)
ax1.plot(eval_losses, color='r')
ax1.plot(losses, color='b')
ax1.set_title('Loss', fontsize=10, color='black')
ax2 = plt.subplot(1, 2, 2)
ax2.plot(eval_acces, color='r')
ax2.plot(acces, color='b')
ax2.set_title('Acc', fontsize=10, color='black')
plt.show()

完整代码

import torch
import torch.nn as nn
import numpy as np
from torch import optim
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torchvision.datasets import mnist
from torchvision import transforms
import matplotlib.pyplot as plt


# 定义网络结构
class Net(nn.Module):
    def __init__(self, in_c=784, out_c=10):
        super(Net, self).__init__()

        # 定义全连接层
        self.fc1 = nn.Linear(in_c, 512)
        # 定义激活层
        self.act1 = nn.ReLU(inplace=True)

        self.fc2 = nn.Linear(512, 256)
        self.act2 = nn.ReLU(inplace=True)

        self.fc3 = nn.Linear(256, 128)
        self.act3 = nn.ReLU(inplace=True)

        self.fc4 = nn.Linear(128, out_c)

    def forward(self, x):
        x = self.act1(self.fc1(x))
        x = self.act2(self.fc2(x))
        x = self.act3(self.fc3(x))
        x = self.fc4(x)

        return x

    # 构建网络


net = Net()


# 准备数据集
# 训练集
train_set = mnist.MNIST('./data', train=True, transform=transforms.ToTensor(), download=True)
# 测试集
test_set = mnist.MNIST('./data', train=False, transform=transforms.ToTensor(), download=True)
# 训练集载入器
train_data = DataLoader(train_set, batch_size=64, shuffle=True)
# 测试集载入器
test_data = DataLoader(test_set, batch_size=128, shuffle=False)

# 可视化数据
import random
for i in range(4):
    ax = plt.subplot(2, 2, i+1)
    idx = random.randint(0, len(train_set))
    digit_0 = train_set[idx][0].numpy()
    digit_0_image = digit_0.reshape(28, 28)
    ax.imshow(digit_0_image, interpolation="nearest")
    ax.set_title('label: {}'.format(train_set[idx][1]), fontsize=10, color='black')
plt.show()

# 定义损失函数--交叉熵
criterion = nn.CrossEntropyLoss()

# 定义优化器---随机梯度下降
optimizer = optim.SGD(net.parameters(), lr=1e-2, weight_decay=5e-4)

# 开始训练
# 记录训练损失
losses = []
# 记录训练精度
acces = []
# 记录测试损失
eval_losses = []
# 记录测试精度
eval_acces = []
# 设置迭代次数
nums_epoch = 20
for epoch in range(nums_epoch):
    train_loss = 0
    train_acc = 0
    net = net.train()
    for batch, (img, label) in enumerate(train_data):
        img = img.reshape(img.size(0), -1)
        img = Variable(img)
        label = Variable(label)

        # 前向传播
        out = net(img)
        loss = criterion(out, label)
        # 反向传播
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # 记录误差
        train_loss += loss.item()
        # 计算分类的准确率
        _, pred = out.max(1)
        num_correct = (pred == label).sum().item()
        acc = num_correct / img.shape[0]

        if (batch + 1) % 200 ==0:
            print('[INFO] Epoch-{}-Batch-{}: Train: Loss-{:.4f}, Accuracy-{:.4f}'.format(epoch + 1,
                                                                                 batch+1,
                                                                                 loss.item(),
                                                                                 acc))
        train_acc += acc

    losses.append(train_loss / len(train_data))
    acces.append(train_acc / len(train_data))

    eval_loss = 0
    eval_acc = 0
    # 测试集不训练
    for img, label in test_data:
        img = img.reshape(img.size(0),-1)
        img = Variable(img)
        label = Variable(label)

        out = net(img)
        loss = criterion(out, label)
        # 记录误差
        eval_loss += loss.item()

        _, pred = out.max(1)
        num_correct = (pred == label).sum().item()
        acc = num_correct / img.shape[0]

        eval_acc += acc
    eval_losses.append(eval_loss / len(test_data))
    eval_acces.append(eval_acc / len(test_data))

    print('[INFO] Epoch-{}: Train: Loss-{:.4f}, Accuracy-{:.4f} | Test: Loss-{:.4f}, Accuracy-{:.4f}'.format(
        epoch + 1, train_loss / len(train_data), train_acc / len(train_data), eval_loss / len(test_data),
        eval_acc / len(test_data)))


plt.figure()
plt.suptitle('Test', fontsize=12)
ax1 = plt.subplot(1, 2, 1)
ax1.plot(eval_losses, color='r')
ax1.plot(losses, color='b')
ax1.set_title('Loss', fontsize=10, color='black')
ax2 = plt.subplot(1, 2, 2)
ax2.plot(eval_acces, color='r')
ax2.plot(acces, color='b')
ax2.set_title('Acc', fontsize=10, color='black')
plt.show()