深度学习的权重衰减是什么_深度学习-权重衰减

权重衰减(weight decay)是应对过拟合方法的常用方法。

方法

权重衰减等价于$L_2$范数正则化(regularization)。正则化是通过模型损失函数添加惩罚项来使得训练后的模型参数值较小,是应对过拟合的常用方法。

$L_2$范数正则化在模型原来的损失函数基础上添加$L_2$范数称惩罚项。从而得到训练所需最小的函数。$L_2$范数惩罚项是模型权重参数每个元素的平方和与一个正的常数的乘积。以线性回归为例:

其中$w_1,w_2$为权重参数,$b$是偏置参数,样本$i$的输入为$x^{(i)}, x^{(i)}$,标签为$y^{(i)}$,样本数为$n$。将权重参数用向量$w=[w_1,w_2]$表示,带$L_2$范数惩罚项的新函数为

其中超参数$k>0$。当权重参数均为0时,惩罚项最小。当$k$较大时,惩罚项在损失函数中的比重较大,这通常会使学到的权重参数的元素接近于0。当$k$设为0时,惩罚项完全不起作用。上述式子中$L_2$范数平法$||w||^2$展开后得到$w_1^2+w_2^2$。有了$L_2$范数的惩罚项后,在小批量的随机梯度下降中,权重$w_1,w_2$的迭代方式改为:

$L_2​$范数正则化让权重$w_1​$和$w_2​$先自乘小于1的数,再减去惩罚项中的梯度。因此,$L_2​$范数正则化又称权重衰减。权重衰减通过惩罚绝对值较大的模型参数为需要学习的模型增加了限制,这可能对过拟合有效。在实际中,有事也在惩罚项中添加偏差元素的平方和。

高维线性回归

使用下列函数生成样本标签:

其中噪音项$\varepsilon$服从N(0,1),p为维度。1

2

3

4

5

6

7

8

9

10

11

12import gluonbook as gb

from mxnet import autograd, gluon, init, nd

from mxnet.gluon import data as gdata, loss as gloss

n_train, n_test, num_inputs = 20, 100, 200

true_w, true_b = nd.ones((num_inputs, 1)) * 0.01, 0.05

features = nd.random.normal(shape=(n_train + n_test, num_inputs))

labels = nd.dot(features, true_w) + true_b

labels += nd.random.normal(scale=0.01, shape=labels.shape)

train_features, test_features = features[:n_train, :], features[n_train:, :]

train_labels, test_labels = labels[:n_train], labels[n_train:]

实现1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68import gluonbook as gb

from mxnet import autograd, gluon, init, nd

from mxnet.gluon import data as gdata, loss as gloss

import matplotlib.pyplot as plt

n_train, n_test, num_inputs = 20, 100, 200

true_w, true_b = nd.ones((num_inputs, 1)) * 0.01, 0.05

features = nd.random.normal(shape=(n_train + n_test, num_inputs))

labels = nd.dot(features, true_w) + true_b

labels += nd.random.normal(scale=0.01, shape=labels.shape)

train_features, test_features = features[:n_train, :], features[n_train:, :]

train_labels, test_labels = labels[:n_train], labels[n_train:]

# 初始化模型参数

def init_params():

w = nd.random.normal(scale=1, shape=(num_inputs, 1))

b = nd.zeros(shape=(1,))

w.attach_grad()

b.attach_grad()

return [w,b]

# 定义L2范数惩罚项

def l2_penalty(w):

return (w**2).sum() / 2

# 定义训练和测试

batch_size, num_epochs, lr = 1, 100, 0.003

net, loss = gb.linreg, gb.squared_loss

train_iter = gdata.DataLoader(gdata.ArrayDataset(train_features, train_labels), batch_size, shuffle=True)

def semilogy(x_vals, y_vals, x_label, y_label, x2_vals=None, y2_vals=None,

legend=None, figsize=(5.5, 2.5)):

plt.rcParams['figure.figsize'] = figsize

plt.xlabel(x_label)

plt.ylabel(y_label)

plt.semilogy(x_vals, y_vals)

if x2_vals and y2_vals:

plt.semilogy(x2_vals, y2_vals, linestyle=':')

plt.legend(legend)

plt.show()

def fit_and_plot(lambd):

w, b = init_params()

train_ls, test_ls = [], []

for _ in range(num_epochs):

for X, y in train_iter:

with autograd.record():

# 添加L2范数惩罚项

l = loss(net(X, w, b), y) + lambd * l2_penalty(w)

l.backward()

gb.sgd([w,b], lr, batch_size)

train_ls.append(loss(net(train_features, w, b), train_labels).mean().asscalar())

test_ls.append(loss(net(test_features, w, b), test_labels).mean().asscalar())

semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'loss',

range(1, num_epochs + 1), test_ls, ['train', 'test'])

print('L2 norm of w:', w.norm().asscalar())

# 不使用权重衰减

fit_and_plot(lambd=0)

# 使用权重衰减

# fit_and_plot(lambd=3)

当未使用权重衰减(lambd=0)时,训练集上的误差远小于测试集

L2 norm of w: 11.61194

使用权重衰减(lambd=3)时,训练误差虽然提高,但是测试集上的误差下降,过拟合得到一定程度上缓解,此时权重参数更接近0。

L2 norm of w: 0.046675965

权重衰减的Gluon实现1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68import gluonbook as gb

from mxnet import autograd, gluon, init, nd

from mxnet.gluon import data as gdata, loss as gloss, nn

import matplotlib.pyplot as plt

n_train, n_test, num_inputs = 20, 100, 200

true_w, true_b = nd.ones((num_inputs, 1)) * 0.01, 0.05

features = nd.random.normal(shape=(n_train + n_test, num_inputs))

labels = nd.dot(features, true_w) + true_b

labels += nd.random.normal(scale=0.01, shape=labels.shape)

train_features, test_features = features[:n_train, :], features[n_train:, :]

train_labels, test_labels = labels[:n_train], labels[n_train:]

def semilogy(x_vals, y_vals, x_label, y_label, x2_vals=None, y2_vals=None,

legend=None, figsize=(4.5, 2.5)):

plt.rcParams['figure.figsize'] = figsize

plt.xlabel(x_label)

plt.ylabel(y_label)

plt.semilogy(x_vals, y_vals)

if x2_vals and y2_vals:

plt.semilogy(x2_vals, y2_vals, linestyle=':')

plt.legend(legend)

plt.show()

# 定义训练和测试

batch_size, num_epochs, lr = 1, 100, 0.003

net, loss = gb.linreg, gb.squared_loss

train_iter = gdata.DataLoader(gdata.ArrayDataset(train_features, train_labels), batch_size, shuffle=True)

def fit_and_plot(wd):

net = nn.Sequential()

net.add(nn.Dense(1))

net.initialize(init.Normal(sigma=1))

# 对权重衰减,权重名称一般是以weight结尾

train_w = gluon.Trainer(net.collect_params('.*weight'), 'sgd', {'learning_rate': lr, 'wd': wd})

train_b = gluon.Trainer(net.collect_params('.*bias'), 'sgd', {'learning_rate': lr})

train_ls, test_ls = [], []

for _ in range(num_epochs):

for X,y in train_iter:

with autograd.record():

l = loss(net(X), y)

l.backward()

# 对两个Trainer分别调用step函数,从而分别更新权重和偏置

train_b.step(batch_size)

train_w.step(batch_size)

train_ls.append(loss(net(train_features), train_labels).mean().asscalar())

test_ls.append(loss(net(test_features), test_labels).mean().asscalar())

semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'loss',

range(1, num_epochs + 1), test_ls, ['train', 'test'])

print('L2 norm of w:', net[0].weight.data().norm().asscalar())

# 不使用权重衰减

fit_and_plot(wd=0)

# 使用权重衰减

# fit_and_plot(wd=3)

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值