文章目录
方差(Variance)和标准差(Standard Deviation)
方差
方差是总体所有变量值与其算术平均数偏差平方的平均值,它表示了一组数据分布的离散程度的平均值。标准差是方差的平方根,它表示了一组数据关于平均数的平均离散程度。
方差在概率论和统计学中有不同的定义,
- 概率论:概率论中方差用来度量随机变量和其数学期望(即均值)之间的偏离程度
- 统计学:统计中的方差(样本方差)是每个样本值与全体样本值的平均数之差的平方值的平均数
标准差
计算公式
Layer Normalization 计算方法
直接上论文中的截图:
- 计算期望 (mean 也即均值)
- 计算方差 (Variance)
- normalization - 其中 epsilon 是参数,一般取值为
1e-6
- 引入另外两个参数
scale
和bias
,更新 y i y_i yi的值
python 手工实现
def layer_norm_compute_python(x, epsilon, scale, bias):
mean = tf.reduce_mean(x, axis=[-1], keep_dims=True)
# tf.square 取平方
variance = tf.reduce_mean(tf.square(x - mean), axis=[-1], keep_dims=True)
# tf.rsqrt 去平方根的倒数,所以下面要用乘法
norm_x = (x - mean) * tf.rsqrt(variance + epsilon)
return norm_x * scale + bias
def layer_norm(x, filters=None, epsilon=1e-6, scope=None, reuse=None):
if filters is None:
filters = x.get_shape()[-1]
with tf.variable_scope(scope, default_name="layer_norm", values=[x], reuse=reuse):
scale = tf.get_variable(
"layer_norm_scale", [filters], regularizer = regularizer, initializer=tf.ones_initializer())
bias = tf.get_variable(
"layer_norm_bias", [filters], regularizer = regularizer, initializer=tf.zeros_initializer())
result = layer_norm_compute_python(x, epsilon, scale, bias)
return result
TensorFlow中的计算方式
这里 tf 为 1.x版本
Bert modeling.py
源码如下:
def layer_norm(input_tensor, name=None):
"""Run layer normalization on the last dimension of the tensor."""
return tf.contrib.layers.layer_norm(
inputs=input_tensor, begin_norm_axis=-1, begin_params_axis=-1, scope=name)
验证两种方式
- 另外一种计算 期望 和 方法的方法是
mean, variance = tf.nn.moments(img, axes=-1)
# layer normalization
import tensorflow as tf
x1 = tf.convert_to_tensor(
[[[18.369314, 2.6570225, 20.402943],
[10.403599, 2.7813416, 20.794857]],
[[19.0327, 2.6398268, 6.3894367],
[3.921237, 10.761424, 2.7887821]],
[[11.466338, 20.210938, 8.242946],
[22.77081, 11.555874, 11.183836]],
[[8.976935, 10.204252, 11.20231],
[-7.356888, 6.2725096, 1.1952505]]])
# shape # (4, 2)
mean_x = tf.reduce_mean(x1, axis=-1) # 计算某一维度的平均值
# shape # (4, 2, 1)
mean_x = tf.expand_dims(mean_x, -1)
# std_x = tf.math.reduce_std(x1, axis=-1) # tf 1.15版本,计算的是 标准差
# shape # (4, 2) 这里计算的是 方差,即标准差的平方
variance_x = tf.reduce_mean(tf.square(x1 - mean_x), axis=[-1])
# mean_x2, variance_x2 = tf.nn.moments(x1, axes=-1) # 方法2
# shape # (4, 2, 1)
variance_x= tf.expand_dims(variance_x, -1)
# 手动计算
la_no1 = (x1 - mean_x)*tf.rsqrt(variance_x) # tf.rsqrt 平方根倒数
x = tf.placeholder(tf.float32, shape=[4, 2, 3])
la_no = tf.contrib.layers.layer_norm(
inputs=x, begin_norm_axis=-1, begin_params_axis=-1)
with tf.Session() as sess1:
sess1.run(tf.global_variables_initializer())
x1 = sess1.run(x1)
# 手动计算
print(sess1.run(la_no1))
'''
[[[ 0.5749929 -1.4064412 0.83144826]
[-0.1250188 -1.1574404 1.2824593 ]]
[[ 1.3801126 -0.9573896 -0.422723 ]
[-0.5402143 1.4019758 -0.86176145]]
[[-0.36398557 1.3654773 -1.0014919 ]
[ 1.4136491 -0.6722269 -0.74142253]]
[[-1.2645671 0.08396867 1.1806016 ]
[-1.3146634 1.108713 0.20595042]]]
'''
# tensorflow实现
print(sess1.run(la_no, feed_dict={x: x1})) # feed_dict 喂入数据
'''
[[[ 0.574993 -1.4064413 0.8314482 ]
[-0.12501884 -1.1574404 1.2824591 ]]
[[ 1.3801126 -0.9573896 -0.422723 ]
[-0.5402143 1.4019756 -0.86176145]]
[[-0.36398554 1.3654773 -1.0014919 ]
[ 1.4136491 -0.67222667 -0.7414224 ]]
[[-1.2645674 0.08396816 1.1806011 ]
[-1.3146634 1.108713 0.20595042]]]
'''
- 注意 这里简单起见没有使用 epsilon 、scale 、bias等参数
- 可以发现 两者差别不大
Reference
https://blog.csdn.net/biubiubiu888/article/details/85761378
https://blog.csdn.net/qq_34418352/article/details/105684488