Normlization方法总结
1 nn.BatchNorm2d
torch.nn.BatchNorm2d(num_features, eps=1e-0.5, momentum=0.1,
affine=True, tracking_running_stats=True, device=None, dtype=None)
参数解释
num_features
: 输入特征图(B, C, H, W)中C的值,即通道数
eps
:
ϵ
\epsilon
ϵ值,为了让分母不为零
momentum
: 更新策略
x
^
n
e
w
=
(
1
−
m
o
m
e
n
t
u
m
)
×
x
^
+
m
o
m
e
n
t
u
m
×
x
t
\hat{x}_{new}=(1-momentum)\times\hat{x}+momentum\times x_t
x^new=(1−momentum)×x^+momentum×xt,其中
x
^
\hat{x}
x^是统计量,
x
t
x_t
xt是观测值
affine
:
γ
\gamma
γ和
β
\beta
β是否为可学习数值,若不可学习,则为定值(1, 0)
track_running_stats
: 均值和方差的统计是否记录,为了使得学习到的参数符合总体的统计规律,一般设置为True
Pytorch简介
对一个四维输入(batch, channel, height, width)使用,其中
x
x
x指的是同一个channel下,不同batch、不同height、width元素,这些元素所组成集合的均值、方差记为E(x)、Var(x),其输出值的计算公式如下:
y
=
x
−
E
[
x
]
V
a
r
[
x
]
+
ϵ
∗
γ
+
β
y=\frac{x-E[x]}{Var[x]+\epsilon}*\gamma+\beta
y=Var[x]+ϵx−E[x]∗γ+β
其中
ϵ
\epsilon
ϵ和
γ
\gamma
γ是可学习的参数,在反向传播中学习得到,初始值为(1, 0)
计算实例
import torch
input = torch.Tensor([[[[1,2,3],
[4,5,6],
[7,8,9]],
[[11,12,13],
[14,15,16],
[17,18,19]]
],
[[[11,13,15],
[17,19,21],
[23,25,27]],
[[21,23,25],
[27,29,31],
[33,35,37]]
]])
m = torch.nn.BatchNorm2d(2)
output = m(input)
print(output)
>>>
tensor([[[[-1.3574, -1.2340, -1.1106],
[-0.9872, -0.8638, -0.7404],
[-0.6170, -0.4936, -0.3702]],
[[-1.3574, -1.2340, -1.1106],
[-0.9872, -0.8638, -0.7404],
[-0.6170, -0.4936, -0.3702]]],
[[[-0.1234, 0.1234, 0.3702],
[ 0.6170, 0.8638, 1.1106],
[ 1.3574, 1.6042, 1.8511]],
[[-0.1234, 0.1234, 0.3702],
[ 0.6170, 0.8638, 1.1106],
[ 1.3574, 1.6042, 1.8511]]]], grad_fn=<NativeBatchNormBackward0>)
2 nn.LayerNorm
torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, device=None, dtype=None)
参数解释
normalized_shape(int or list or torch size)
: 期望输入的尺寸值
eps(float)
:
ϵ
\epsilon
ϵ值,为了求解稳定性
elementwise_affine(bool)
:
γ
\gamma
γ和
β
\beta
β是否为可学习数值,若不可学习,则为定值(1, 0)
Pytorch简介
对一个四维输入(batch, channel, height, width)使用,其中 x x x指的是同一个batch内,某一个通道下,所有维度值组成的集合,例如(2,3,4)的句子,输入到layernorm后对2个batch中每个batch 3个通道中每个通道 下的4个值组成的集合;(2,2,3,3)的图片中,2个batch中每个batch 2个通道中每个通道 下的(3,3)所有元素组成的集合
计算实例
import torch
from torch import nn as nn
# NLP中使用
input = torch.Tensor([[[1,2,3,4],
[5,6,7,8],
[9,10,11,12]],
[[21,22,23,24],
[25,26,27,28],
[29,30,31,32]]])
layer_norm = nn.LayerNorm(4)
layer_norm(input)
'''
tensor([[[-1.3416, -0.4472, 0.4472, 1.3416],
[-1.3416, -0.4472, 0.4472, 1.3416],
[-1.3416, -0.4472, 0.4472, 1.3416]],
[[-1.3416, -0.4472, 0.4472, 1.3416],
[-1.3416, -0.4472, 0.4472, 1.3416],
[-1.3416, -0.4472, 0.4472, 1.3416]]],
grad_fn=<NativeLayerNormBackward0>)
'''
# 图像处理中使用
input = torch.Tensor([[[[1, 2, 3],
[11, 12, 13],
[21, 22, 23]],
[[51, 52, 53],
[61, 62, 63],
[71, 72, 73]]],
[[[81, 82, 83],
[91, 92, 93],
[101,102,103]],
[[111, 112, 113],
[121, 122, 123],
[131, 132, 133]]]])
layer_norm = nn.LayerNorm([2, 3, 3])
layer_norm(input)
'''
tensor([[[[-1.3682, -1.3302, -1.2922],
[-0.9881, -0.9501, -0.9121],
[-0.6081, -0.5701, -0.5321]],
[[ 0.5321, 0.5701, 0.6081],
[ 0.9121, 0.9501, 0.9881],
[ 1.2922, 1.3302, 1.3682]]],
[[[-1.5207, -1.4622, -1.4037],
[-0.9358, -0.8773, -0.8188],
[-0.3509, -0.2924, -0.2339]],
[[ 0.2339, 0.2924, 0.3509],
[ 0.8188, 0.8773, 0.9358],
[ 1.4037, 1.4622, 1.5207]]]], grad_fn=<NativeLayerNormBackward0>)
'''