Batch normalization 应用一种转换,使数据的均值接近于0,标准差接近1。
在训练和推理过程中,Batch normalization的工作方式是不同的
api
https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization
tf.keras.layers.BatchNormalization(
axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True,
beta_initializer='zeros', gamma_initializer='ones',
moving_mean_initializer='zeros',
moving_variance_initializer='ones', beta_regularizer=None,
gamma_regularizer=None, beta_constraint=None, gamma_constraint=None,
renorm=False, renorm_clipping=None, renorm_momentum=0.99, fused=None,
trainable=True, virtual_batch_size=None, adjustment=None, name=None, **kwargs
)
During training
Batch normalization会使用当前批次数据的均值和标准差做归一化。对每一个要做归一化的channel,Batch normalization返回 (batch -mean(batch))/(var(batch)+epsilon)*gamma + beta
epsilon
是一个小常量,防止zero divisiongamma
是一个可学习的缩放因子(初始化为1),可以通过设置scale=False
禁用缩放beta
是一个科学性的平移变量(初始化为0),可以通过设置center=False
禁用
moving_var
: 滑动方差;moving_mean
:滑动均值
During inference
Batch normalization使用它在训练中看到的Batch的平均值和标准偏差的移动平均值对其输出进行归一化,返回 (batch-self.moving_mean)/sqrt(self.moving_var + epsilon) * gamma + beta
self.moving_mean
与 self.moving_var
都是不可训练变量,他们在training过程每一次调用该层时更新。
moving_mean = moving_mean * momentum + mean(batch) * (1-momentum)
moving_var = moving_var * momentum + var(batch) * (1-momentum)
因此,该层在接受过与推理数据具有相似统计信息的数据训练之后,才会在推理过程中,规范化其输入。
Example
>>> a = tf.constant([[1, 2, 3], [4, 5, 6]], dtype=tf.float32)
>>> a
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
[4., 5., 6.]], dtype=float32)>
>>> layer = tf.keras.layers.BatchNormalization()
>>> layer.weights
[]
>>> layer(a)
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0.9995004, 1.9990008, 2.9985013],
[3.9980016, 4.997502 , 5.9970026]], dtype=float32)>
>>> layer.weights
[<tf.Variable 'batch_normalization_1/gamma:0' shape=(3,) dtype=float32, numpy=array([1., 1., 1.], dtype=float32)>,
<tf.Variable 'batch_normalization_1/beta:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>,
<tf.Variable 'batch_normalization_1/moving_mean:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>,
<tf.Variable 'batch_normalization_1/moving_variance:0' shape=(3,) dtype=float32, numpy=array([1., 1., 1.], dtype=float32)>
]
# 调用layer(a)等于layer(a, training=False),
# 这里的,moving_mean与moving_variance都为初始默认参数0.、1.
>>> layer(a, training=True) # 训练过程,用真实均值与方差
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[-0.9997779 , -0.9997778 , -0.9997779 ],
[ 0.9997778 , 0.99977803, 0.9997778 ]], dtype=float32)>
# \delta^2 = ((1-2.5)^2+(4-2.5)^2)/2 = 2.25
# 0.9997779 = (1.-2.5)/sqrt(2.25+0.001)
>>> layer.weights
[<tf.Variable 'batch_normalization_1/gamma:0' shape=(3,) dtype=float32, numpy=array([1., 1., 1.], dtype=float32)>,
<tf.Variable 'batch_normalization_1/beta:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>,
<tf.Variable 'batch_normalization_1/moving_mean:0' shape=(3,) dtype=float32, numpy=array([0.025, 0.035, 0.045], dtype=float32)>,
<tf.Variable 'batch_normalization_1/moving_variance:0' shape=(3,) dtype=float32, numpy=array([1.0125, 1.0125, 1.0125], dtype=float32)>
]
# 更新滑动均值与方差
# moving_mean:
# mean: (1.+4.)/2 = 2.5
# 0.025 = 2.5 * 0.01 + 0.09*0
# variance: ((1-2.5)^2+(4-2.5)^2)/2=2.25
# moving_variance: 1.0125 = 0.01*2.25 + 0.99*1
>>> layer(a, training=False) # 推理过程,用滑动均值与方差
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0.9684845, 1.9518689, 2.9352531],
[3.948437 , 4.9318213, 5.9152055]], dtype=float32)>
# 0.9684845 = (1-0.025)/sqrt(1.0125 + 0.001)