为啥要用BN层?
怎么把BN层加到模型中去?
BN公式推导及原理?
一、为啥使用BN层?
- 收敛快
- 不用调参
- 解决梯度弥散(更新速度逐层递减)【其他方法:使用不饱和激活函数解决单边抑制】
二、怎么把BN层加到模型中去?tf.layers.batch_normalization
定义BN层:
def bn(input,is_training):
epsilon = 0.0000001
BN_DECAY = 0.9
params_shape = input.get_shape()[-1:] # channel or depth 取出最后一维,即通道(channel)或叫深度(depth)
axis = list(range(len(input.get_shape()) - 1))
mean, variance = tf.nn.moments(input, axis) # mean, variance for each feature map 求出每个卷积结果(feature map)的平均值,方差
beta = tf.get_variable('beta', params_shape, initializer=tf.zeros_initializer())
gamma = tf.get_variable('gamma', params_shape, initializer=tf.ones_initializer())
moving_mean = tf.get_variable('moving_mean',params_shape,initializer=tf.zeros_initializer,trainable=False)
moving_variance = tf.get_variable('moving_variance',params_shape,initializer=tf.ones_initializer,trainable=False)
# update variable by variable * decay + value * (1 - decay)
# 更新移动平均值和移动方差,更新方式是 variable * decay + value * (1 - decay)
update_moving_mean = moving_averages.assign_moving_average(moving_mean, mean, BN_DECAY)
update_moving_variance = moving_averages.assign_moving_average(moving_variance, variance, BN_DECAY)
tf.add_to_collection(UPDATE_OPS_COLLECTION, update_moving_mean)
tf.add_to_collection(UPDATE_OPS_COLLECTION, update_moving_variance)
# 如果是训练就用当前值,如果是测试就用滑动平均值
mean, variance = control_flow_ops.cond(is_training, lambda: (mean, variance), lambda: (moving_mean, moving_variance))
return tf.nn.batch_normalization(input,mean,variance,offset=beta,scale=gamma,variance_epsilon=epsilon)
函数解释:
mean, variance = tf.nn.moments(input, axis) # input:需要计算均值和方差的矩阵
tf.nn.batch_normalization(input,mean,variance,offset=beta,scale=gamma,variance_epsilon=epsilon) # input,mean,variance 同上
moving_averages.assign_moving_average(moving_variance, variance, BN_DECAY) # 滑动平均
tf.add_to_collection("name", tensor) # 存入列表
result = control_flow_ops.cond(Bool,funcA,funcB) # Bool ==True 执行funcA
更新beta gamma值程序
batchnorm_updates = tf.get_collection(UPDATE_OPS_COLLECTION)
with tf.control_dependencies(batchnorm_updates ):
train_op = 。。。
三、BN的原理
(其中
是 伸缩变换对应程序里是gamma,
是平移变换 对应beta,
是防止分母为零的,对应程序中的 epsilon)
通过这个变换,将原来的非标准分布 标准化为
,标准化后的分布是对称的均匀的,但也抵消了一些权值和偏置对输出图的影响
学习得到的 、
是对于标准化分布的再次调整