网上找了很多博客文章,终于明白了是怎么个计算流程,赶紧用纸笔记录下来。
一、BatchNorm
二、LayerNorm
计算时,仅考虑batch中当前特征图,与其他特征图数值无关。
比如张量大小为 (2,4,2,2)
tensor([[[[0.4757, 0.4907],
[0.3027, 0.6742]],
[[0.0549, 0.1216],
[0.1029, 0.3212]],
[[0.4346, 0.4875],
[0.4800, 0.0753]],
[[0.2940, 0.7453],
[0.0109, 0.5594]]],
[[[0.0383, 0.3897],
[0.4732, 0.6865]],
[[0.4232, 0.2626],
[0.7349, 0.4870]],
[[0.7869, 0.3576],
[0.3803, 0.9292]],
[[0.9276, 0.1433],
[0.8078, 0.1724]]]])
定义LayerNorm为cls_norm = nn.LayerNorm((4,2,2), eps=1e-6)
,经过LayerNorm后的值为
这里的参数(4,2,2)表示对后三个维度进行归一化,batch中不同特征图之间互不影响,计算2次均值方差。
如果是(2,2)表示对后两个维度归一化,batch中不同特征图的不同通道之间互不影响,计算8次均值方差。
tensor([[[[ 0.5604, 0.6280],
[-0.2230, 1.4589]],
[[-1.3448, -1.0427],
[-1.1273, -0.1389]],
[[ 0.3744, 0.6137],
[ 0.5799, -1.2525]],
[[-0.2622, 1.7807],
[-1.5438, 0.9393]]],
[[[-1.6919, -0.4042],
[-0.0983, 0.6832]],
[[-0.2817, -0.8701],
[ 0.8608, -0.0477]],
[[ 1.0510, -0.5217],
[-0.4386, 1.5724]],
[[ 1.5668, -1.3071],
[ 1.1277, -1.2004]]]], grad_fn=<NativeLayerNormBackward0>)
验证:取batch中的第一个特征图,计算其均值和方差,然后对第一个特征图进行归一化处理
>>> x_np = np.array(x[0,...])
>>> x_np
array([[[0.47573125, 0.4906667 ],
[0.3026693 , 0.6741975 ]],
[[0.05485821, 0.12159312],
[0.1029042 , 0.32124496]],
[[0.43464422, 0.48749745],
[0.4800402 , 0.07525522]],
[[0.29400337, 0.7452886 ],
[0.01090312, 0.55943626]]], dtype=float32)
>>> x_np.mean()
0.35193336 # 均值
>>> x_np.var()
0.04879661 # 方差
# 归一化处理
>>> (x_np-0.35193336)/np.sqrt(0.04879661)
array([[[ 0.5604262 , 0.62803805],
[-0.22301573, 1.4588718 ]],
[[-1.3448427 , -1.0427375 ],
[-1.1273412 , -0.1389247 ]],
[[ 0.37442747, 0.6136911 ],
[ 0.5799325 , -1.2525066 ]],
[[-0.26224586, 1.7806973 ],
[-1.5438249 , 0.9393541 ]]], dtype=float32)
可以看到,得到的结果和LayerNorm的计算结果一致。
cls_norm = nn.LayerNorm((2,2), eps=1e-6)
的结果验证也正确。
>>> cls_norm = nn.LayerNorm((2,2), eps=1e-6)
>>> y1 = cls_norm(x)
>>> y1
tensor([[[[-0.0767, 0.0369],
[-1.3928, 1.4327]],
[[-0.9366, -0.2807],
[-0.4644, 1.6817]],
[[ 0.3818, 0.6908],
[ 0.6472, -1.7199]],
[[-0.3911, 1.2372],
[-1.4126, 0.5666]]],
[[[-1.5351, -0.0308],
[ 0.3265, 1.2393]],
[[-0.3164, -1.2613],
[ 1.5183, 0.0593]],
[[ 0.6941, -1.0244],
[-0.9335, 1.2638]],
[[ 1.1601, -1.0332],
[ 0.8250, -0.9518]]]], grad_fn=<NativeLayerNormBackward0>)
>>> x_np = np.array(x[0,0, ...])
>>> x_np
array([[0.47573125, 0.4906667 ],
[0.3026693 , 0.6741975 ]], dtype=float32)
>>> x_np.mean()
0.48581618
>>> np.var(x_np)
0.017288884
>>> (x_np-0.48581618)/np.sqrt(0.017288884)
array([[-0.07669892, 0.03688957],
[-1.3928876 , 1.4326969 ]], dtype=float32)