训练一个模型也是一个迭代的过程;在每次迭代中(又称为 epoch),模型会对输出进行一次预测,计算这个预测的误差(损失值),收集这些误差相对于参数的导数,然后通过梯度下降的方式来优化这些参数。
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork()
超参数
超参数是你用来控制模型优化过程的、可以调整的参数。不同的超参数取值能够影响模型训练和收敛的速度。
我们定义以下用于训练的超参数:
- Number of Epochs - 迭代数据集的次数
- Batch Size - 在参数更新之前通过网络传播的数据样本数量。
- Learning Rate - 学习率, 每 Batch/Epoch 次更新模型参数的幅度。较小的值会产生较慢的学习速度,较大的值可能会在训练过程中产生无法预料的行为。
learning_rate = 1e-3
batch_size = 64
epochs = 5
优化循环
设置完超参数后,接下来我们在一个优化循环中训练并优化我们的模型。优化循环的每次迭代叫做一个 Epoch(时期、纪元)。
每个 Epoch 由两个主要部分构成:
- 训练循环:在训练数据集上遍历,尝试收敛到最优的参数。
- 验证/测试循环:在测试数据集上遍历,以检查模型效果是否在提升。
损失函数
拿到一些训练数据的时候,我们的模型不太可能给出正确答案。损失函数能衡量获得的结果相对于目标值的偏离程度,我们希望在训练中能够最小化这个损失函数。我们对给定的数据样本做出预测然后和真实标签数据对比来计算损失。
常见的损失函数包括给回归任务用的 nn.MSELoss
(Mean Square Error, 均方误差)、给分类任务使用的 nn.NLLLoss
(Negative Log Likelihood, 负对数似然)、nn.CrossEntropyLoss
(交叉熵损失函数)结合了 nn.LogSoftmax
和 nn.NLLLoss
.
我们把模型输出的 logits 传递给 nn.CrossEntropyLoss, 它会正则化 logits 并计算预测误差。
# 初始化损失函数
loss_fn = nn.CrossEntropyLoss()
优化器
优化是在每一个训练步骤中调整模型参数来减小模型误差的过程。优化算法定义了这个过程应该如何进行。所有优化的逻辑都被封装在 optimizer 这个对象中。这里,我们使用 SGD 优化器。除此之外,在 PyTorch 中还有很多其他可用的优化器,比如 ADAM 和 RMSProp 在不同类型的模型和数据上表现得更好。
我们通过注册需要训练的模型参数、然后传递学习率这个超参数来初始化优化器。
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
在训练循环内部, 优化在三个步骤上发生:
- 调用
optimizer.zero_grad()
来重置模型参数的梯度。梯度会默认累加,为了防止重复计算(梯度),我们在每次迭代中显式的清空(梯度累加值)。 - 调用
loss.backward()
来反向传播预测误差。PyTorch 对每个参数分别存储损失梯度。 - 我们获取到梯度后,调用
optimizer.step()
来根据反向传播中收集的梯度来调整参数。
完整实现
我们定义 train_loop
为优化循环的代码,test_loop
为根据测试数据来评估模型表现的代码
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
# Set the model to training mode - important for batch normalization and dropout layers
# Unnecessary in this situation but added for best practices
model.train()
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
loss.backward()
optimizer.step()
optimizer.zero_grad()
if batch % 100 == 0:
loss, current = loss.item(), (batch + 1) * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
def test_loop(dataloader, model, loss_fn):
# Set the model to evaluation mode - important for batch normalization and dropout layers
# Unnecessary in this situation but added for best practices
model.eval()
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
# Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
# also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
with torch.no_grad():
for X, y in dataloader:
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
完整代码实现
import torch
from torch import nn
from torch.utils.data import DataLoader, dataloader
from torchvision import datasets
from torchvision.transforms import ToTensor
training_data = datasets.FashionMNIST(
root='data',
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root='data',
train=False,
download=True,
transform=ToTensor()
)
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28,512),
nn.ReLU(),
nn.Linear(512,512),
nn.ReLU(),
nn.Linear(512,10),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
def train_loop(dataloader, model, loss_fn, optimizer, y=None):
size = len(dataloader.dataset)
model.train()
for batch,(X,y) in enumerate(dataloader):
pred = model(X)
loss = loss_fn(pred,y)
loss.backward()
optimizer.step()
optimizer.zero_grad()
if batch % 100 == 0:
loss,current = loss.item(),(batch+1)*len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}")
def test_loop(dataloader, model, loss_fn):
model.eval()
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss = 0
correct = 0
with torch.no_grad():
for X,y in dataloader:
pred = model(X)
test_loss += loss_fn(pred,y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%,Avg loss:{test_loss:>8f} \n")
model = NeuralNetwork()
learning_rate = 0.001
batch_size = 64
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate)
epoch = 10
for t in range(epoch):
train_loop(train_dataloader,model,loss_fn,optimizer)
test_loop(test_dataloader,model,loss_fn)
print("Done!")