tensorflow中的batch normalization

 

1.原理

公式如下:

y=γ(x-μ)/σ+β

其中x是输入,y是输出,μ是均值,σ是方差,γ和β是缩放(scale)、偏移(offset)系数。

一般来讲,这些参数都是基于channel来做的,比如输入x是一个16*32*32*128(NWHC格式)的feature map,那么上述参数都是128维的向量。其中γ和β是可有可无的,有的话,就是一个可以学习的参数(参与前向后向),没有的话,就简化成y=(x-μ)/σ。而μ和σ,在训练的时候,使用的是batch内的统计值,测试/预测的时候,采用的是训练时计算出的滑动平均值。

 

2.tensorflow中使用

tensorflow中batch normalization的实现主要有下面三个:

tf.nn.batch_normalization

tf.layers.batch_normalization

tf.contrib.layers.batch_norm

封装程度逐个递进,建议使用tf.layers.batch_normalization或tf.contrib.layers.batch_norm,因为在tensorflow官网的解释比较详细。

2.1 tf.nn.batch_normalization

tf.nn.batch_normalization(
    x,
    mean,
    variance,
    offset,
    scale,
    variance_epsilon,
    name=None
)
Args:
x: Input Tensor of arbitrary dimensionality.#任意维度的tensor
mean: A mean Tensor.# 均值tensor
variance: A variance Tensor. #方差tensor
offset: An offset Tensor, often denoted β in equations, or None. If present, will be added to the normalized tensor.
scale: A scale Tensor, often denoted γ in equations, or None. If present, the scale is applied to the normalized tensor.
variance_epsilon: A small float number to avoid dividing by 0.
name: A name for this operation (optional).

Returns:
the normalized, scaled, offset tensor.

example:

import tensorflow as tf
import numpy as np
w1_initial = np.random.normal(size=(784,100)).astype(np.float32)
w2_initial = np.random.normal(size=(100,100)).astype(np.float32)
x = tf.placeholder(tf.float32, shape=[None, 784])
w1 = tf.Variable(w1_initial)
b1 = tf.Variable(tf.zeros([100]))
z1 = tf.matmul(x,w1)+b1
print("z1.shape:",z1.shape)
l1 = tf.nn.sigmoid(z1)
print("l1.shape:",l1.shape)
batch_mean2, batch_var2 = tf.nn.moments(l1,[0])#axis = [0]按列求方差均值,axis = [1]按行求,axis = [0,1]求所有的均值和方差
print("batch_mean2.shape:",batch_mean2.shape)
print("batch_var2.shape:",batch_var2.shape)
scale2 = tf.Variable(tf.ones([100]))
beta2 = tf.Variable(tf.zeros([100]))
epsilon = 1e-3
BN2 = tf.nn.batch_normalization(l1,batch_mean2,batch_var2,beta2,scale2,epsilon)
print("BN2.shape:",BN2.shape)
z1.shape: (?, 100)
l1.shape: (?, 100)
batch_mean2.shape: (100,)
batch_var2.shape: (100,)
BN2.shape: (?, 100)

2.2 tf.layers.batch_normalization

batch_normalization(inputs,
                        axis=-1,
                        momentum=0.99,
                        epsilon=1e-3,
                        center=True,
                        scale=True,
                        beta_initializer=init_ops.zeros_initializer(),
                        gamma_initializer=init_ops.ones_initializer(),
                        moving_mean_initializer=init_ops.zeros_initializer(),
                        moving_variance_initializer=init_ops.ones_initializer(),
                        beta_regularizer=None,
                        gamma_regularizer=None,
                        beta_constraint=None,
                        gamma_constraint=None,
                        training=False,
                        trainable=True,
                        name=None,
                        reuse=None,
                        renorm=False,
                        renorm_clipping=None,
                        renorm_momentum=0.99,
                        fused=None,
                        virtual_batch_size=None,
                        adjustment=None):

注意:训练时,需要更新moving_mean和moving_variance。默认情况下,更新操作被放入tf.GraphKeys.UPDATE_OPS,因此需要将它们作为依赖项添加到train_op。此外,在获取update_ops集合之前,请务必添加batch_normalization操作。否则,update_ops将为空,并且训练/推断将无法正常工作。例如:


x_norm = tf.layers.batch_normalization(x, training=training)

# ...
 
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

  with tf.control_dependencies(update_ops):

    train_op = optimizer.minimize(loss)

训练的时候需要注意两点,(1)输入参数training=True,以保存一个batch中的平均值,方差等,在测试时,有些时候输入时单个样本,没有一个batch的样本,无法计算除一个batch平均值,方差等,所以在训练时保存平均值,方差等,在测试时将training=false,使用训练时保存的平均值,方差等数据,(2)计算loss时,要添加以上代码(即添加update_ops到最后的train_op中)。如果不加入这控制依赖会导致,测试准确率严重异常。

2.3 tf.contrib.layers.batch_norm

tf.contrib.layers.batch_norm(
    inputs,
    decay=0.999,
    center=True,
    scale=False,
    epsilon=0.001,
    activation_fn=None,
    param_initializers=None,
    param_regularizers=None,
    updates_collections=tf.GraphKeys.UPDATE_OPS,
    is_training=True,
    reuse=None,
    variables_collections=None,
    outputs_collections=None,
    trainable=True,
    batch_weights=None,
    fused=None,
    data_format=DATA_FORMAT_NHWC,
    zero_debias_moving_mean=False,
    scope=None,
    renorm=False,
    renorm_clipping=None,
    renorm_decay=0.99,
    adjustment=None
)

训练时,需要更新moving_mean和moving_variance。默认情况下,更新操作被放入tf.GraphKeys.UPDATE_OPS,所以需要添加它们作为依赖项train_op

这种方法与tf.layers.batch_normalization的使用方法差不多,两者最主要的差别在参数scale和centre的默认值上,这两个参数即是我们之前介绍原理时所说明的对input进行mean和variance的归一化之后采用的线性平移中的scale和offset,可以看到offset的默认值两者都是True,但是scale的默认值前者为True后者为False,也就是说明在tf.contrib.layers.batch_norm中,默认不对处理后的input进行线性缩放,只是加一个偏移。
 

 

https://blog.csdn.net/huitailangyz/article/details/85015611  #好文必看

https://www.cnblogs.com/hrlnw/p/7227447.html

https://blog.csdn.net/heiheiya/article/details/81000756

https://blog.csdn.net/candy_gl/article/details/79551149

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值