L2范数正则化
解决过拟合问题,通过给损失函数增加一个“惩罚项”,来减小权重w的取值范围,从而降低模型复杂度
w的取值越小,其拟合函数越少“弯弯绕绕”,及拟合函数越简单
带有L2范数正则化项的新损失函数为:
其中lambd为超参数,当lambd取0时,该惩罚项等于无,与原来的损失函数无异,当lambd趋于无穷大时,w趋于0
因此,根据随机梯度下降算法,每次w的更新如下(通常 学习率×lambd<1):
实现
import torch
from d2l import torch as d2l
from torch.serialization import load
from torch.utils import data
import torch.nn as nn
import plotly.graph_objects as go
import plotly.offline as of
# 创建数据集
true_w = torch.ones(200)*0.01# 特征值多,样本少,容易过拟合
true_b = 0.05
features,labels = d2l.synthetic_data(true_w, true_b, 20)
features_1,labels_1 = d2l.synthetic_data(true_w, true_b, 100)
# 导入数据集
def load_array(data_arrays, batch_size, is_train=True):
dataset = data.TensorDataset(*data_arrays)
return data.DataLoader(dataset, batch_size, shuffle=is_train)
batch_size = 5
data_iter = load_array((features,labels), batch_size)
test_iter = load_array((features_1,labels_1), 100)# 测试数据集
# 定义神经网络
net = nn.Sequential(nn.Linear(200, 1))
nn.init.normal_(net[0].weight, std=0.01)
#定义损失函数
loss = nn.MSELoss()
#定义优化方法
trainer = torch.optim.SGD(net.parameters(), lr=0.003)
# 训练
def l2_penalty(w):
return torch.sum(w.pow(2))/2
epochs = 100# 样本量小,训练次数多,就容易达到过拟合
lambd = 0
train_loss = []
test_loss = []
for epoch in range(epochs):
for X,y in data_iter:
l = loss(net(X), y)+lambd*l2_penalty(net[0].weight)# 目标函数加入了惩罚项
trainer.zero_grad()
l.backward()
trainer.step()
data_iter_2 = load_array((features,labels), batch_size=20)
for X,y in data_iter_2:
train_loss.append((loss(net(X),y)).item())
for X,y in test_iter:
test_loss.append((loss(net(X),y)).item())
# 打印损失
e = [x for x in range(epochs)]
line1 = go.Scatter(x=e, y=train_loss, name='train')
line2 = go.Scatter(x=e, y=test_loss, name='test')
fig = go.Figure([line1,line2])
fig.update_layout(title='Lambd:'+str(lambd), xaxis_title='epoch', yaxis_title='loss')
of.plot(fig)
Lambd = 0
Lambd = 3
从上图可看出,Lambd为0时,等价于不做权重衰减处理,此时呈现出过拟合的现象(即使训练集的损失值已经很小了,但是测试集的损失值几乎没改变)