欠拟合指模型在训练集上表现误差大,而过拟合指训练误差远小于在测试集上的误差,值得注意的是,过拟合和欠拟合可以同时发生。是我们在调试中需解决的两个问题。
解决方案有权重衰减以及丢弃法。
权重衰减法即L2范数正则化,通过模型损失函数添加惩戒项学出模型参数值较小,是应对过拟合的手段
def fit_and_plot_pytorch(wd):
net = nn.Linear(num_inputs,1)
nn.init.normal_(net.weight,mean = 0, std = 1)
nn.init.normal_(net.bias, mean = 0, std = 1)
optimizer_w = torch.optim.SGD(params=[net.weight],lr = lr,weight_decay = wd)
#weight_decay表示惩戒项lamda
optimizer_b = torch.optim.SGD(params=[net.bias], lr=lr)
train_ls, test_ls = [], []
for _ in range(num_epochs):
for X, y in train_iter:
l = loss(net(X), y).mean()
optimizer_w.zero_grad()
optimizer_b.zero_grad()
l.backward()
optimizer_w.step()
optimizer_b.step()
train_ls.append(loss(net(train_features), train_labels).mean().item())
test_ls.append(loss(net(test_features), test_labels).mean().item())
print('L2 norm of w:', net.weight.data.norm().item())
丢弃法即随机丢弃部分的神经元,减少神经网络过度依赖某个神经元,起到正则化的作用。在测试模型时不使用丢弃法。
net = nn.Sequential(
d2l.FlattenLayer(),
nn.Linear(num_inputs, num_hiddens1),
nn.ReLU(),
nn.Dropout(drop_prob1),
nn.Linear(num_hiddens1, num_hiddens2),
nn.ReLU(),
nn.Dropout(drop_prob2),
nn.Linear(num_hiddens2, 10)
)
for param in net.parameters():
nn.init.normal_(param, mean=0, std=0.01)
optimizer = torch.optim.SGD(net.parameters(), lr=0.5)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)