2--权重衰退

pepsi_w

已于 2022-08-11 10:39:27 修改

阅读量519

点赞数

分类专栏：深度学习文章标签：深度学习 python

于 2022-08-11 10:08:22 首次发布

本文链接：https://blog.csdn.net/wangyumei0916/article/details/126278340

版权

深度学习专栏收录该内容

35 篇文章 7 订阅

订阅专栏

2.1 权重衰退

权重衰减（weight decay）是最广泛使用的正则化的技术之一，它通常也被称为L2正则化，这项技术通过函数与零的距离来衡量函数的复杂度。因为在所有函数f中，函数f=0（所有输入都得到值0）在某种意义上是最简单的。

可以通过线性函数 $f(\mathbf{x}) = \mathbf{w}^\top \mathbf{x}$ 的权重向量的范数来度量其复杂性，为了保证权重较小，一般会将其范数作为惩罚项加到最小化损失的问题中（一般使用L2范数）。这样损失函数就变为：

$L(\mathbf{w}, b) + \frac{\lambda}{2} \|\mathbf{w}\|^2,$

其中正则化常数λ来描述这种权衡，这是一个非负超参数，使用验证数据拟合。较小的λ值对应较少约束的w，而较大的λ值对w的约束更大。

相应地，L2正则化回归的小批量随机梯度下降公式为：

$\begin{aligned} \mathbf{w} & \leftarrow \left(1- \eta\lambda \right) \mathbf{w} - \frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \mathbf{x}^{(i)} \left(\mathbf{w}^\top \mathbf{x}^{(i)} + b - y^{(i)}\right). \end{aligned}$

2.2代码实现

!pip install git+https://github.com/d2l-ai/d2l-zh@release  # installing d2l
!pip install matplotlib==3.0.0

%matplotlib inline
import torch
from torch import nn
from d2l import torch as d2l

n_train,n_test,num_inputs,batch_size = 20,100,200,5
#训练集过小，模型越复杂相对来说是更容易过拟合
true_w,true_b = torch.ones((num_inputs, 1))*0.01, 0.05
train_data = d2l.synthetic_data(true_w,true_b,n_train)
train_iter = d2l.load_array(train_data, batch_size)
test_data = d2l.synthetic_data(true_w,true_b,n_test)
test_iter = d2l.load_array(test_data, batch_size, is_train=False)

def init_params():
  w = torch.normal(0,1,size=(num_inputs,1),requires_grad=True)
  b = torch.zeros(1,requires_grad=True)
  return [w,b]

#定义L2范数惩罚
def l2_penalty(w):
  #torch.sum(torch.abs(w)) #L1范式
  return torch.sum(w.pow(2))/2


def train(lambd):
  w,b = init_params()
  #lambda函数也叫匿名函数，即函数没有具体的名称
  #def f(x)：return x**2 print f(4) 与g = lambda x : x**2 print g(4) 等价
  net, loss = lambda X:d2l.linreg(X,w,b),d2l.squared_loss
  num_epochs, lr = 100, 0.003
  animator = d2l.Animator(xlabel='epochs',ylabel='loss',yscale='log',
                          xlim=[5,num_epochs],legend=['train','test'])
  for epoch in range(num_epochs):
    for X,y in train_iter:
      l = loss(net(X),y)+ lambd*l2_penalty(w)
      l.sum().backward()
      d2l.sgd([w,b],lr,batch_size)
    if (epoch+1)%5 == 0:
      animator.add(epoch+1,(d2l.evaluate_loss(net,train_iter,loss),d2l.evaluate_loss(net,test_iter,loss)))
  print("w的L2范数是：",torch.norm(w).item())

train(lambd=0)#正则化系数改为0

train(lambd=10)#正则化系数为10