在TensorFlow中,如果我们要使用batch normalization层,可以使用的API有tf.layers.batch_normalization和tf.contrib.layers.batch_norm,如果我们直接使用这两个API构建我们的网络,往往会出现训练的时候网络的表现非常好,而当测试的时候我们将其中的参数is_training设置为False时,网络的表现非常的差。这往往是因为我们训练的时候忽视了一个细节。
方法1:
在tf.contrib.layers.batch_norm的帮助文档中我们看到有以下的文字
Note: when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op.
也就是说,我们需要在代码运行的过程中手动对moving_mean和moving_variance进行手动更新,代码如下:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
这一步非常的重要,很多人在训练的时候往往会忽略这一步,导致训练/测试时结果相差巨大。
还有一个方法:需要将is_training改成True。
要注意的地方是,在做测试的时候,如果将is_training改为 False,就会出现测试accuracy很低的现象,需要将is_training改成True。虽然这样能得到高的accuracy,但是明显不合理!!
自己写,用tf.nn.batch_normalization
tensorflow实现:
def batchNorm_layer(inputs, is_training, decay = 1e-5, epsilon = 1e-3):
scale = tf.Variable(tf.ones(inputs.get_shape()[1:].as_list()))
beta = tf.Variable(tf.zeros(inputs.get_shape()[1:].as_list()))
pop_mean = tf.Variable(tf.zeros(inputs.get_shape()[1:].as_list()), trainable=False)
pop_var = tf.Variable(tf.ones(inputs.get_shape()[1:].as_list()), trainable=False)
if is_training:
batch_mean, batch_var = tf.nn.moments(inputs, [0])
train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
train_var = tf.assign(pop_var, pop_var * decay + batch_var * (1 - decay))
with tf.control_dependencies([train_mean, train_var]):
return tf.nn.batch_normalization(inputs, batch_mean, batch_var, beta, scale, epsilon)
else:
return tf.nn.batch_normalization(inputs, pop_mean, pop_var, beta, scale, epsilon)