torch.nn.functional.normalize(input, p=2, dim=1, eps=1e-12, out=None)
其中,p
表示范数(这里是2范数),dim
表示计算范数的维度(默认为1),eps
是为了防止分母为0;
pytorch中的normalize函数本质上就是针对某个维度进行归一化,公式为:
ν
=
ν
m
a
x
(
∣
∣
ν
∣
∣
p
,
ϵ
)
\nu = \frac{\nu}{max(||\nu||_p,\epsilon)}
ν=max(∣∣ν∣∣p,ϵ)ν
方便记忆,二维矩阵中, dim=1
表示在行内进行归一化,dim=0
表示在列内进行归一化。
在使用过程中,对dim
的理解不够到位,来三个代码实验一下。
示例1:dim=1
a = F.softmax(torch.randn((1, 3, 4)), 1)
b = F.normalize(a)
输出:
// a
tensor([[[0.2621, 0.2830, 0.3758, 0.0260],
[0.3634, 0.3750, 0.5382, 0.1085],
[0.3744, 0.3420, 0.0860, 0.8655]]])
// b
b: tensor([[[0.4489, 0.4870, 0.5676, 0.0298],
[0.6224, 0.6454, 0.8130, 0.1243],
[0.6412, 0.5885, 0.1299, 0.9918]]])
代码中针对维度1进行归一化。维度1有3个通道,具体的计算细节为
0.4489
=
0.2621
0.262
1
2
+
0.363
4
2
+
0.374
4
2
0.4489=\frac{0.2621}{\sqrt{0.2621^2+0.3634^2+0.3744^2}}
0.4489=0.26212+0.36342+0.374420.2621
0.6224
=
0.3634
0.262
1
2
+
0.363
4
2
+
0.374
4
2
0.6224=\frac{0.3634}{\sqrt{0.2621^2+0.3634^2+0.3744^2}}
0.6224=0.26212+0.36342+0.374420.3634
0.6421
=
0.3744
0.262
1
2
+
0.363
4
2
+
0.374
4
2
0.6421=\frac{0.3744}{\sqrt{0.2621^2+0.3634^2+0.3744^2}}
0.6421=0.26212+0.36342+0.374420.3744
示例2:dim=2
a = F.softmax(torch.randn((1, 3, 4)), 1)
c = F.normalize(b, dim=2)
// a
tensor([[[0.0861, 0.1087, 0.0518, 0.3551],
[0.8067, 0.4128, 0.0592, 0.2884],
[0.1072, 0.4785, 0.8890, 0.3565]]])
// c
tensor([[[0.2237, 0.2825, 0.1347, 0.9230],
[0.8467, 0.4332, 0.0621, 0.3027],
[0.0997, 0.4447, 0.8262, 0.3313]]])
这里作用的是维度2,可以认为维度2有4个通道,计算细节为:
0.2237
=
0.0861
0.086
1
2
+
0.108
7
2
+
0.051
8
2
+
0.355
1
2
0.2237=\frac{0.0861}{\sqrt{0.0861^2+0.1087^2+0.0518^2+0.3551^2}}
0.2237=0.08612+0.10872+0.05182+0.355120.0861
0.2825
=
0.1087
0.086
1
2
+
0.108
7
2
+
0.051
8
2
+
0.355
1
2
0.2825=\frac{0.1087}{\sqrt{0.0861^2+0.1087^2+0.0518^2+0.3551^2}}
0.2825=0.08612+0.10872+0.05182+0.355120.1087
0.1347
=
0.0518
0.086
1
2
+
0.108
7
2
+
0.051
8
2
+
0.355
1
2
0.1347=\frac{0.0518}{\sqrt{0.0861^2+0.1087^2+0.0518^2+0.3551^2}}
0.1347=0.08612+0.10872+0.05182+0.355120.0518
0.9230
=
0.3551
0.086
1
2
+
0.108
7
2
+
0.051
8
2
+
0.355
1
2
0.9230=\frac{0.3551}{\sqrt{0.0861^2+0.1087^2+0.0518^2+0.3551^2}}
0.9230=0.08612+0.10872+0.05182+0.355120.3551
示例3:dim=0
a = F.softmax(torch.randn((1, 3, 4)), 1)
c = F.normalize(b, dim=0)
// a
tensor([[[0.0861, 0.1087, 0.0518, 0.3551],
[0.8067, 0.4128, 0.0592, 0.2884],
[0.1072, 0.4785, 0.8890, 0.3565]]])
// c
tensor([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])
这里作用的是维度0;维度0上只有1个通道,因此归一化之后全为1,即
1.0
=
0.0861
0.086
1
2
1.0=\frac{0.0861}{\sqrt{0.0861^2}}
1.0=0.086120.0861