【deep_thoughts】48_快速复现PyTorch的Weight Normalization

最新推荐文章于 2025-03-26 10:14:18 发布

研2菜鸟

最新推荐文章于 2025-03-26 10:14:18 发布

阅读量1.4k

点赞数 1

文章标签： pytorch 深度学习人工智能

本文链接：https://blog.csdn.net/qq_45670134/article/details/131289695

版权

本文介绍了权重归一化在PyTorch中的应用，通过理论和代码展示了如何在全连接层和卷积层中使用weight_norm，以及它如何帮助加速训练和保持模型稳定性。通过计算权重矩阵的幅度值和单位方向向量，验证了权重归一化前后输出的一致性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

视频链接： 48、快速复现PyTorch的Weight Normalization_哔哩哔哩_bilibili

官方API：torch.nn.utils.weight_norm

原始论文：Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

理论

在强化学习、对抗生成网络中会使用权重归一化来让训练更加稳定。
torch.nn.utils.weight_norm(module, name=‘weight’, dim=0)

pytorch官方API中的weight_norm是一个函数，而不是类。需要传入的是module。
计算公式如下：
$\mathbf{w}=g \frac{\mathbf{v}}{\left \| \mathbf{v} \right \|}$
其中， $\mathbf{w}$ 是module的权重参数， $g$ 是 $\mathbf{w}$ 的幅度值，也就是模。 $\mathbf{v}$ 其实就是 $\mathbf{w}$ 。 $\frac{\mathbf{v}}{\left \| \mathbf{v} \right \|}$ 计算的是单位长度的方向向量。这种操作更像是权重矩阵的分解。
module中不加weight norm，是只需要优化一个参数；加了weight norm，变成同时优化两个参数，对 $g$ 和 $\mathbf{v}$ 分别求一个梯度。
weight norm并没有实质意义上的额外参数，输出值保持不变。

代码

全连接层

首先使用全连接层作为module。

import torch
import torch.nn as nn

batch_size = 2
feat_dim = 3
hid_dim = 4
inputx = torch.randn(batch_size, feat_dim)  # 二维张量 [2,3]
linear = nn.Linear(feat_dim, hid_dim, bias=False)  # [4,3]
wn_linear = torch.nn.utils.weight_norm(linear)  # 官方API

计算linear层的权重矩阵的幅度值和单位方向向量。计算每一行向量的模，可以通过L2范数得到。

weight_magnitude = torch.tensor([linear.weight[i, :].norm() for i in torch.arange(linear.weight.shape[0])], dtype=torch.float32).unsqueeze(-1)

weight_direction = linear.weight / weight_magnitude 
# 公式中的v就是w: v / ||v|| = w / ||w||

这里直接将一些必要信息打印一下：

print("weight_magnitude:")  # 相当于公式中的g
print(weight_magnitude)

print("weight_direction:")  # 相当于公式中的 v / ||v||
print(weight_direction)

print("magnitude of weight_direction:")
print((weight_direction ** 2).sum(dim=-1))  # 每一行元素的平方和为1

输出结果如下：

weight_magnitude:
tensor([[0.3865],
        [0.6001],
        [0.4221],
        [0.7440]])  # [4,1]
weight_direction:
tensor([[ 0.7945,  0.1528,  0.5877],
        [-0.9337,  0.3558, -0.0405],
        [ 0.8495,  0.0206, -0.5273],
        [-0.7468,  0.6474,  0.1521]], grad_fn=<DivBackward0>)  # [4,3]
magnitude of weight_direction:
tensor([1.0000, 1.0000, 1.0000, 1.0000], grad_fn=<SumBackward1>)  # [4]

1.验证一下公式的正确性，即Linear.weight=weight_direction * weight_magnitude：

print("linear.weight:")
print(linear.weight)
print("weight_direction * weight_magnitude:")
print(weight_direction * weight_magnitude)

输出结果相同，成功验证公式：

linear.weight:
tensor([[ 0.3071,  0.0591,  0.2272],
        [-0.5603,  0.2135, -0.0243],
        [ 0.3585,  0.0087, -0.2225],
        [-0.5556,  0.4817,  0.1132]], grad_fn=<MulBackward0>)  # [4,3]
weight_direction * weight_magnitude:
tensor([[ 0.3071,  0.0591,  0.2272],
        [-0.5603,  0.2135, -0.0243],
        [ 0.3585,  0.0087, -0.2225],
        [-0.5556,  0.4817,  0.1132]], grad_fn=<MulBackward0>)  # [4,1]*[4,3]->[4,3]

2.验证另一个结论：权重归一化后的module不会改变原始module的输出结果。

print("linear(inputx):")  # linear 和 wn_linear 的输出值相同 
print(linear(inputx))

print("wn_linear(inputx):")
print(wn_linear(inputx))

linear和 wn_linear的输出结果相同，说明验证成功。

linear(inputx):
tensor([[ 0.2138,  0.3498, -0.6853,  0.6026],
        [ 0.2718,  0.2176, -0.5267,  0.4888]], grad_fn=<MmBackward0>)  # [2,4]
wn_linear(inputx):
tensor([[ 0.2138,  0.3498, -0.6853,  0.6026],
        [ 0.2718,  0.2176, -0.5267,  0.4888]], grad_fn=<MmBackward0>)

3.打印权重归一化后的全连接层的参数：

print("parameters of wn_linear:")
for n, p in wn_linear.named_parameters():
    print(n, p)

输出结果如下：

parameters of wn_linear:
weight_g Parameter containing:
tensor([[0.3865],
        [0.6001],
        [0.4221],
        [0.7440]], requires_grad=True)  # [4,1]
weight_v Parameter containing:
tensor([[ 0.3071,  0.0591,  0.2272],
        [-0.5603,  0.2135, -0.0243],
        [ 0.3585,  0.0087, -0.2225],
        [-0.5556,  0.4817,  0.1132]], requires_grad=True)  # [4,3]

可以看出wn_linear包含两个参数weight_g和wieght_v。其中，weight_g与前面计算出的weight_magnitude相同，weight_v与linear_weight相同，即公式中的v就是w。

4.使用权重归一化线性层wn_linear的参数，根据公式计算出原始linear的权重linear.weight：

print("construct weight of linear:")
print(wn_linear.weight_g * (wn_linear.weight_v / torch.tensor([wn_linear.weight_v[i, :].norm() for i in torch.arange(wn_linear.weight_v.shape[0])], dtype=torch.float32).unsqueeze(-1)))

输出结果如下：

construct weight of linear:
tensor([[ 0.3071,  0.0591,  0.2272],
        [-0.5603,  0.2135, -0.0243],
        [ 0.3585,  0.0087, -0.2225],
        [-0.5556,  0.4817,  0.1132]], grad_fn=<MulBackward0>)

可以看出结果与原始线性层权重linear.weight值相同。

卷积层

使用一维卷积层作为module。

# 实例化1*1的卷积层，相当于MLP
conv1d = nn.Conv1d(feat_dim, hid_dim, kernel_size=1, bias=False)  # 1*1的卷积层
wn_conv1d = torch.nn.utils.weight_norm(conv1d)

计算conv1d层的权重矩阵的幅度值和单位长度的方向向量。计算每一行向量的模，可以通过L2范数得到。

conv1d_weight_magnitude = torch.tensor([conv1d.weight[i, :, :].norm() for i in torch.arange(conv1d.weight.shape[0])],                             dtype=torch.float32).reshape(conv1d.weight.shape[0], 1, 1)
conv1d_weight_direction = conv1d.weight / conv1d_weight_magnitude

将这些信息打印一下：

print("conv1d_weight_magnitude:")
print(conv1d_weight_magnitude)

print("conv1d_weight_direction:")
print(conv1d_weight_direction)

输出结果如下：

conv1d_weight_magnitude:
tensor([[[0.8938]],

        [[0.5470]],

        [[0.5421]],

        [[0.5670]]]) 
conv1d_weight_direction:
tensor([[[ 0.6186],
         [-0.4646],
         [ 0.6336]],

        [[ 0.0478],
         [ 0.2542],
         [-0.9660]],

        [[-0.9090],
         [ 0.1691],
         [ 0.3808]],

        [[-0.0041],
         [-0.7952],
         [-0.6063]]], grad_fn=<DivBackward0>)

1.验证一下公式的正确性，即conv1d.weight=conv1d_weight_direction * conv1d_weight_magnitude：

print("conv1d.weight:")
print(conv1d.weight)
print("conv1d_weight_magnitude * conv1d_weight_direction:")
print(conv1d_weight_magnitude * conv1d_weight_direction)

输出结果相同，成功验证公式。

conv1d.weight:
tensor([[[ 0.5529],
         [-0.4153],
         [ 0.5663]],

        [[ 0.0262],
         [ 0.1390],
         [-0.5284]],

        [[-0.4928],
         [ 0.0917],
         [ 0.2064]],

        [[-0.0023],
         [-0.4509],
         [-0.3438]]], grad_fn=<MulBackward0>)
conv1d_weight_magnitude * conv1d_weight_direction:
tensor([[[ 0.5529],
         [-0.4153],
         [ 0.5663]],

        [[ 0.0262],
         [ 0.1390],
         [-0.5284]],

        [[-0.4928],
         [ 0.0917],
         [ 0.2064]],

        [[-0.0023],
         [-0.4509],
         [-0.3438]]], grad_fn=<MulBackward0>)

2.打印出权重归一化后的一维卷积层wn_conv1d的参数：

print("parameter of wn_conv1d:")
for n, p in wn_conv1d.named_parameters():
    print(n, p, p.shape)

输出结果为：

parameter of wn_conv1d:
weight_g Parameter containing:
tensor([[[0.8938]],

        [[0.5470]],

        [[0.5421]],

        [[0.5670]]], requires_grad=True) torch.Size([4, 1, 1])
weight_v Parameter containing:
tensor([[[ 0.5529],
         [-0.4153],
         [ 0.5663]],

        [[ 0.0262],
         [ 0.1390],
         [-0.5284]],

        [[-0.4928],
         [ 0.0917],
         [ 0.2064]],

        [[-0.0023],
         [-0.4509],
         [-0.3438]]], requires_grad=True) torch.Size([4, 3, 1])

wn_conv1d也包含两个参数weight_g和weight_v。其中，weight_g与前面计算出的conv1d_weight_magnitude相同，weight_v与conv1d_weight相同，即公式中的v就是w。

3.使用权重归一化线性层wn_conv1d的参数，根据公式计算出原始conv1d的权重conv1d.weight：

print("construct weight of conv1d:")
print(wn_conv1d.weight_g * (wn_conv1d.weight_v / torch.tensor([wn_conv1d.weight_v[i, :, :].norm() for i in torch.arange(wn_linear.weight_v.shape[0])]).reshape(wn_linear.weight_v.shape[0], 1, 1)))

输出结果与conv1d.weight相同：

construct weight of conv1d:
tensor([[[ 0.5529],
         [-0.4153],
         [ 0.5663]],

        [[ 0.0262],
         [ 0.1390],
         [-0.5284]],

        [[-0.4928],
         [ 0.0917],
         [ 0.2064]],

        [[-0.0023],
         [-0.4509],
         [-0.3438]]], grad_fn=<MulBackward0>)