视频链接: 48、快速复现PyTorch的Weight Normalization_哔哩哔哩_bilibili
官方API:torch.nn.utils.weight_norm
原始论文:Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
理论
-
在强化学习、对抗生成网络中会使用权重归一化来让训练更加稳定。
-
torch.nn.utils.weight_norm(module, name=‘weight’, dim=0)
pytorch官方API中的weight_norm是一个函数,而不是类。需要传入的是module。
-
计算公式如下:
w = g v ∥ v ∥ \mathbf{w}=g \frac{\mathbf{v}}{\left \| \mathbf{v} \right \|} w=g∥v∥v
其中, w \mathbf{w} w 是module的权重参数, g g g 是 w \mathbf{w} w 的幅度值,也就是模。 v \mathbf{v} v 其实就是 w \mathbf{w} w。 v ∥ v ∥ \frac{\mathbf{v}}{\left \| \mathbf{v} \right \|} ∥v∥v 计算的是单位长度的方向向量。这种操作更像是权重矩阵的分解。 -
module中不加weight norm,是只需要优化一个参数;加了weight norm,变成同时优化两个参数,对 g g g 和 v \mathbf{v} v 分别求一个梯度。
-
weight norm并没有实质意义上的额外参数,输出值保持不变。
代码
全连接层
首先使用全连接层作为module。
import torch
import torch.nn as nn
batch_size = 2
feat_dim = 3
hid_dim = 4
inputx = torch.randn(batch_size, feat_dim) # 二维张量 [2,3]
linear = nn.Linear(feat_dim, hid_dim, bias=False) # [4,3]
wn_linear = torch.nn.utils.weight_norm(linear) # 官方API
计算linear
层的权重矩阵的幅度值和单位方向向量。计算每一行向量的模,可以通过L2范数得到。
weight_magnitude = torch.tensor([linear.weight[i, :].norm() for i in torch.arange(linear.weight.shape[0])], dtype=torch.float32).unsqueeze(-1)
weight_direction = linear.weight / weight_magnitude
# 公式中的v就是w: v / ||v|| = w / ||w||
这里直接将一些必要信息打印一下:
print("weight_magnitude:") # 相当于公式中的g
print(weight_magnitude)
print("weight_direction:") # 相当于公式中的 v / ||v||
print(weight_direction)
print("magnitude of weight_direction:")
print((weight_direction ** 2).sum(dim=-1)) # 每一行元素的平方和为1
输出结果如下:
weight_magnitude:
tensor([[0.3865],
[0.6001],
[0.4221],
[0.7440]]) # [4,1]
weight_direction:
tensor([[ 0.7945, 0.1528, 0.5877],
[-0.9337, 0.3558, -0.0405],
[ 0.8495, 0.0206, -0.5273],
[-0.7468, 0.6474, 0.1521]], grad_fn=<DivBackward0>) # [4,3]
magnitude of weight_direction:
tensor([1.0000, 1.0000, 1.0000, 1.0000], grad_fn=<SumBackward1>) # [4]
1.验证一下公式的正确性,即Linear.weight=weight_direction * weight_magnitude
:
print("linear.weight:")
print(linear.weight)
print("weight_direction * weight_magnitude:")
print(weight_direction * weight_magnitude)
输出结果相同,成功验证公式:
linear.weight:
tensor([[ 0.3071, 0.0591, 0.2272],
[-0.5603, 0.2135, -0.0243],
[ 0.3585, 0.0087, -0.2225],
[-0.5556, 0.4817, 0.1132]], grad_fn=<MulBackward0>) # [4,3]
weight_direction * weight_magnitude:
tensor([[ 0.3071, 0.0591, 0.2272],
[-0.5603, 0.2135, -0.0243],
[ 0.3585, 0.0087, -0.2225],
[-0.5556, 0.4817, 0.1132]], grad_fn=<MulBackward0>) # [4,1]*[4,3]->[4,3]
2.验证另一个结论:权重归一化后的module不会改变原始module的输出结果。
print("linear(inputx):") # linear 和 wn_linear 的输出值相同
print(linear(inputx))
print("wn_linear(inputx):")
print(wn_linear(inputx))
linear
和 wn_linear
的输出结果相同,说明验证成功。
linear(inputx):
tensor([[ 0.2138, 0.3498, -0.6853, 0.6026],
[ 0.2718, 0.2176, -0.5267, 0.4888]], grad_fn=<MmBackward0>) # [2,4]
wn_linear(inputx):
tensor([[ 0.2138, 0.3498, -0.6853, 0.6026],
[ 0.2718, 0.2176, -0.5267, 0.4888]], grad_fn=<MmBackward0>)
3.打印权重归一化后的全连接层的参数:
print("parameters of wn_linear:")
for n, p in wn_linear.named_parameters():
print(n, p)
输出结果如下:
parameters of wn_linear:
weight_g Parameter containing:
tensor([[0.3865],
[0.6001],
[0.4221],
[0.7440]], requires_grad=True) # [4,1]
weight_v Parameter containing:
tensor([[ 0.3071, 0.0591, 0.2272],
[-0.5603, 0.2135, -0.0243],
[ 0.3585, 0.0087, -0.2225],
[-0.5556, 0.4817, 0.1132]], requires_grad=True) # [4,3]
可以看出wn_linear
包含两个参数weight_g
和wieght_v
。其中,weight_g
与前面计算出的weight_magnitude
相同,weight_v
与linear_weight
相同,即公式中的v
就是w
。
4.使用权重归一化线性层wn_linear
的参数,根据公式计算出原始linear
的权重linear.weight
:
print("construct weight of linear:")
print(wn_linear.weight_g * (wn_linear.weight_v / torch.tensor([wn_linear.weight_v[i, :].norm() for i in torch.arange(wn_linear.weight_v.shape[0])], dtype=torch.float32).unsqueeze(-1)))
输出结果如下:
construct weight of linear:
tensor([[ 0.3071, 0.0591, 0.2272],
[-0.5603, 0.2135, -0.0243],
[ 0.3585, 0.0087, -0.2225],
[-0.5556, 0.4817, 0.1132]], grad_fn=<MulBackward0>)
可以看出结果与原始线性层权重linear.weight
值相同。
卷积层
使用一维卷积层作为module。
# 实例化1*1的卷积层,相当于MLP
conv1d = nn.Conv1d(feat_dim, hid_dim, kernel_size=1, bias=False) # 1*1的卷积层
wn_conv1d = torch.nn.utils.weight_norm(conv1d)
计算conv1d
层的权重矩阵的幅度值和单位长度的方向向量。计算每一行向量的模,可以通过L2范数得到。
conv1d_weight_magnitude = torch.tensor([conv1d.weight[i, :, :].norm() for i in torch.arange(conv1d.weight.shape[0])], dtype=torch.float32).reshape(conv1d.weight.shape[0], 1, 1)
conv1d_weight_direction = conv1d.weight / conv1d_weight_magnitude
将这些信息打印一下:
print("conv1d_weight_magnitude:")
print(conv1d_weight_magnitude)
print("conv1d_weight_direction:")
print(conv1d_weight_direction)
输出结果如下:
conv1d_weight_magnitude:
tensor([[[0.8938]],
[[0.5470]],
[[0.5421]],
[[0.5670]]])
conv1d_weight_direction:
tensor([[[ 0.6186],
[-0.4646],
[ 0.6336]],
[[ 0.0478],
[ 0.2542],
[-0.9660]],
[[-0.9090],
[ 0.1691],
[ 0.3808]],
[[-0.0041],
[-0.7952],
[-0.6063]]], grad_fn=<DivBackward0>)
1.验证一下公式的正确性,即conv1d.weight=conv1d_weight_direction * conv1d_weight_magnitude
:
print("conv1d.weight:")
print(conv1d.weight)
print("conv1d_weight_magnitude * conv1d_weight_direction:")
print(conv1d_weight_magnitude * conv1d_weight_direction)
输出结果相同,成功验证公式。
conv1d.weight:
tensor([[[ 0.5529],
[-0.4153],
[ 0.5663]],
[[ 0.0262],
[ 0.1390],
[-0.5284]],
[[-0.4928],
[ 0.0917],
[ 0.2064]],
[[-0.0023],
[-0.4509],
[-0.3438]]], grad_fn=<MulBackward0>)
conv1d_weight_magnitude * conv1d_weight_direction:
tensor([[[ 0.5529],
[-0.4153],
[ 0.5663]],
[[ 0.0262],
[ 0.1390],
[-0.5284]],
[[-0.4928],
[ 0.0917],
[ 0.2064]],
[[-0.0023],
[-0.4509],
[-0.3438]]], grad_fn=<MulBackward0>)
2.打印出权重归一化后的一维卷积层wn_conv1d
的参数:
print("parameter of wn_conv1d:")
for n, p in wn_conv1d.named_parameters():
print(n, p, p.shape)
输出结果为:
parameter of wn_conv1d:
weight_g Parameter containing:
tensor([[[0.8938]],
[[0.5470]],
[[0.5421]],
[[0.5670]]], requires_grad=True) torch.Size([4, 1, 1])
weight_v Parameter containing:
tensor([[[ 0.5529],
[-0.4153],
[ 0.5663]],
[[ 0.0262],
[ 0.1390],
[-0.5284]],
[[-0.4928],
[ 0.0917],
[ 0.2064]],
[[-0.0023],
[-0.4509],
[-0.3438]]], requires_grad=True) torch.Size([4, 3, 1])
wn_conv1d
也包含两个参数weight_g
和weight_v
。其中,weight_g
与前面计算出的conv1d_weight_magnitude
相同,weight_v
与conv1d_weight
相同,即公式中的v
就是w
。
3.使用权重归一化线性层wn_conv1d
的参数,根据公式计算出原始conv1d
的权重conv1d.weight
:
print("construct weight of conv1d:")
print(wn_conv1d.weight_g * (wn_conv1d.weight_v / torch.tensor([wn_conv1d.weight_v[i, :, :].norm() for i in torch.arange(wn_linear.weight_v.shape[0])]).reshape(wn_linear.weight_v.shape[0], 1, 1)))
输出结果与conv1d.weight
相同:
construct weight of conv1d:
tensor([[[ 0.5529],
[-0.4153],
[ 0.5663]],
[[ 0.0262],
[ 0.1390],
[-0.5284]],
[[-0.4928],
[ 0.0917],
[ 0.2064]],
[[-0.0023],
[-0.4509],
[-0.3438]]], grad_fn=<MulBackward0>)