前提基本知识:
均值: μ = x 1 + x 2 + x 3 + . . . + x n n \mu = \frac{x_1 + x_2 + x_3 + ...+x_n}{n} μ=nx1+x2+x3+...+xn
方差: σ 2 = ( x 1 − μ ) 2 + ( x 2 − μ ) 2 + . . . + ( x n − μ ) 2 n \sigma^2 = \frac{(x_1-\mu)^2 +(x_2-\mu)^2 +...+(x_n-\mu)^2}{n} σ2=n(x1−μ)2+(x2−μ)2+...+(xn−μ)2
标准差: σ = ( x 1 − μ ) 2 + ( x 2 − μ ) 2 + . . . + ( x n − μ ) 2 n \sigma = \sqrt{\frac{(x_1-\mu)^2 +(x_2-\mu)^2 +...+(x_n-\mu)^2}{n}} σ=n(x1−μ)2+(x2−μ)2+...+(xn−μ)2, 表示数据的离散程度
x 1 − μ σ \frac{x_1-\mu}{\sigma} σx1−μ 表示将数据规范化到均值 μ = 0 \mu=0 μ=0, 方差 σ = 1 \sigma = 1 σ=1的正态分布
但是每层网络输入(训练数据)的最佳分布不一定是标准正态分布, 如果强制将输入转换成标准正态分布的数据 ,就会将网络的能力消弱。为了解决这个问题,再标准化公式中添了缩放系数 γ \gamma γ和偏移系数 β \beta β
x = γ ⋅ x 1 − μ σ + β x=\gamma \cdot \frac{x_1-\mu}{\sigma}+\beta x=γ⋅σx1−μ+β
其中:
- 均值 μ \mu μ 和 标准差 σ 2 \sigma^2 σ2 由batch 输入数据计算得到(使用函数tf.nn.moments())
- 缩放系数 γ \gamma γ和偏移系数 β \beta β 是由网络自己学习得来。
说在前面:
-
tf.nn.batch_normalization() 是 batch normalization 的最底层的实现, 需要自己去设置scale 和 offset。
-
如果是训练阶段, 需要根据移动平均线(moving average)来决定 μ \mu μ和 σ 2 \sigma^2 σ2的值; 在Inference阶段,或者是测试阶段,则根据输入的数值直接计算 μ \mu μ和 σ 2 \sigma^2 σ2。
-
tf.layer.batch_normalization() 是对 tf.nn.batch_normalization() 的封装,封装函数中将scale 和 offset设置为变量;会根据参数training=True/False 来决定如何对 μ \mu μ和 σ 2 \sigma^2 σ2取值
tf.nn.batch_normalization() 函数原型:
tf.nn.batch_normalization(x, mean, variances, offset, scale, variance_epsilon, name=None)
参数:
- x: 输入的tensor对象,格式一般为[batch, hight, width, channel]
- mean: 均值
- variances: 方差
- offset: 偏移量参数
- scale: 缩放参数
- variance_epsilon: 为避免分母( σ \sigma σ)为0, 需要对分母加一个很小的数,默认为0.001
- name=None: 当前operation对象的名称
思路:
对于同一batch的input,假设输入大小为[batch_num, height, width, channel],逐channel地计算同一batch中所有数据的mean和variance,再对input使用mean和variance进行归一化,最后的输出再进行线性平移,得到batch_norm的最终结果。
伪代码:
for i in range(channel):
x = input[:,:,:,i]
# 以下两句由函数 tf.nn.moments() 完成, 得出 mean 和 variance
mean = mean(x)
variance = variance(x)
# 以下由函数 tf.nn.batch_normalization()完成
x = (x - mean) / sqrt(variance)
x = scale * x + offset
input[:,:,:,i] = x
下面例子中,我们把 scale 和offset 设置成固定值,scale=1, offset=0
函数tf.nn.moments()参考这里
import tensorflow as tf
import numpy as np
input = [[[[1, 2],
[2, 1]],
[[3, 4],
[4, 3]]],
[[[5, 6],
[6, 5]],
[[7, 8],
[8, 7]]
]]
input_tf = tf.constant(input, dtype=tf.float32)
input_tf = tf.transpose(input_tf, perm=(0, 2, 3, 1))
mean, var = tf.nn.moments(input_tf, [0, 1, 2])
_, _, _, c = input_tf.get_shape().as_list()
scale = tf.ones((c), dtype=tf.float32)
offset = tf.zeros((c), dtype=tf.float32)
output = tf.nn.batch_normalization(input_tf, mean, var, offset, scale, variance_epsilon=0.01, name='bn')
with tf.Session() as sess:
m, v, out = sess.run([mean, var, output])
print('mean:', m)
print('variance:', v)
out = np.transpose(out, axes=(0, 3, 1, 2))
print('output:', out)
# 输出为:
# mean: [3.5 5.5]
# variance: [4.25 4.25]
# output: [[[[-1.2112539 -0.72675234]
# [-0.72675234 -1.2112539 ]]
#
# [[-1.211254 -0.7267524 ]
# [-0.7267524 -1.211254 ]]]
#
#
# [[[ 0.7267523 1.2112539 ]
# [ 1.2112539 0.7267523 ]]
#
# [[ 0.7267523 1.2112539 ]
# [ 1.2112539 0.7267523 ]]]]
在实际项目中,缩放参数scale 和 偏移量参数offset 不是固定的,而是需要通过学习得到。也就是说scale 和 offset 必须是 tf.Variable, 而且是 trainable 的。
下面的例子中,我们将放缩参数 和 偏移参数设为变量,同 均值,方差 和 tf.nn.batch_normalization 一起,封装成一个函数,然后在主函数中调用。
import tensorflow as tf
import numpy as np
def batch_norm(x, momentum=0.99, epsilon=0.001, is_training=True):
_, _, _, c = x.get_shape().as_list()
gamma = tf.get_variable('gamma', (c,), dtype=tf.float32, initializer=tf.ones_initializer, trainable=is_training)
beta = tf.get_variable('beta', (c,), dtype=tf.float32, initializer=tf.zeros_initializer, trainable=is_training)
mean, variance = tf.nn.moments(x, [0, 1, 2])
if is_training:
moving_mean = tf.get_variable('moving_mean', (c), dtype=tf.float32, initializer=tf.zeros_initializer,
trainable=is_training)
moving_variance = tf.get_variable('moving_variance', (c), dtype=tf.float32, initializer=tf.ones_initializer,
trainable=is_training)
update_moving_mean = moving_averages.assign_moving_average(moving_mean, mean, momentum)
update_moving_variance = moving_averages.assign_moving_average(moving_variance, variance, momentum)
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_moving_mean)
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_moving_variance)
mean, variance = moving_mean, moving_variance
x = tf.nn.batch_normalization(x, mean, variance, beta, gamma, variance_epsilon=epsilon)
return x
if __name__=='__main__':
input = [[[[1, 2],
[2, 1]],
[[3, 4],
[4, 3]]],
[[[5, 6],
[6, 5]],
[[7, 8],
[8, 7]]
]]
input_tf = tf.constant(input, dtype=tf.float32)
input_tf = tf.transpose(input_tf, perm=(0, 2, 3, 1))
output_tf = batch_norm(input_tf, is_training=False)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(output_tf)
output = np.transpose(output, axes=(0, 3, 1, 2))
print('output:', output)
# 输出为:
# output: [[[[-1.2125356 -0.72752136]
# [-0.72752136 -1.2125356 ]]
#
# [[-1.2125355 -0.7275213 ]
# [-0.7275213 -1.2125355 ]]]
#
#
# [[[ 0.7275214 1.2125356 ]
# [ 1.2125356 0.7275214 ]]
#
# [[ 0.7275214 1.2125356 ]
# [ 1.2125356 0.7275214 ]]]]
tf.layers.batch_normalization函数原型:
tf.layers.batch_normalization(
inputs,
axis=-1,
momentum=0.99,
epsilon=0.001,
center=True,
scale=True,
beta_initializer=tf.zeros_initializer(),
gamma_initializer=tf.ones_initializer(),
moving_mean_initializer=tf.zeros_initializer(),
moving_variance_initializer=tf.ones_initializer(),
beta_regularizer=None,
gamma_regularizer=None,
beta_constraint=None,
gamma_constraint=None,
training=False,
trainable=True,
name=None,
reuse=None,
renorm=False,
renorm_clipping=None,
renorm_momentum=0.99,
fused=None,
virtual_batch_size=None,
adjustment=None
)
参数(只对几个关键参数介绍,其他的用默认值即可):
- x: 输入的tensor对象,格式一般为[batch, hight, width, channel]
- momentum: 平滑算法中,当前待平滑的值(如均值,方差)相比上一个值的权重
- epsilon: 为避免分母( σ \sigma σ)为0, 需要对分母加一个很小的数,默认为0.001
- center: bool 型,如果为True 表示要加偏移量参数offset , 默认为True
- scale: bool 型,如果为True 表示要加缩放参数scale , 默认为True
tf.layers.batch_normalization()做的事情,就和我们上面那个例子里封装函数batch_norm()做的事情是差不多的。见以下例子:
import tensorflow as tf
import numpy as np
if __name__=='__main__':
input = [[[[1, 2],
[2, 1]],
[[3, 4],
[4, 3]]],
[[[5, 6],
[6, 5]],
[[7, 8],
[8, 7]]
]]
input_tf = tf.constant(input, dtype=tf.float32)
input_tf = tf.transpose(input_tf, perm=(0, 2, 3, 1))
output_tf = tf.layers.batch_normalization(input_tf, momentum=0.99, epsilon=0.001, training=True)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(output_tf)
output = np.transpose(output, axes=(0, 3, 1, 2))
print('output:', output)
# 输出为:
# output: [[[[-1.2125356 -0.72752136]
# [-0.72752136 -1.2125356 ]]
#
# [[-1.2125356 -0.72752136]
# [-0.72752136 -1.2125356 ]]]
#
#
# [[[ 0.72752136 1.2125356 ]
# [ 1.2125356 0.72752136]]
#
# [[ 0.72752136 1.2125356 ]
# [ 1.2125356 0.72752136]]]]
Reference:
- 什么是批标准化 (Batch Normalization)
- 《TensorFlow与卷积神经网络》