layer-normalization 实现 & tf.contrib.layers.layer_norm

最新推荐文章于 2025-03-05 16:59:20 发布

小孟Tec

最新推荐文章于 2025-03-05 16:59:20 发布

阅读量6.8k

点赞数

分类专栏： TensorFlow 文章标签：数据分析 python 深度学习

本文链接：https://blog.csdn.net/m0_38024592/article/details/118688393

版权

TensorFlow 专栏收录该内容

8 篇文章

订阅专栏

文章目录

方差（Variance）和标准差（Standard Deviation）

方差

方差是总体所有变量值与其算术平均数偏差平方的平均值，它表示了一组数据分布的离散程度的平均值。标准差是方差的平方根，它表示了一组数据关于平均数的平均离散程度。

方差在概率论和统计学中有不同的定义，

概率论：概率论中方差用来度量随机变量和其数学期望（即均值）之间的偏离程度
统计学：统计中的方差（样本方差）是每个样本值与全体样本值的平均数之差的平方值的平均数

标准差

计算公式
在这里插入图片描述

Layer Normalization 计算方法

直接上论文中的截图：

计算期望 (mean 也即均值)
计算方差 (Variance)
normalization - 其中 epsilon 是参数，一般取值为1e-6
引入另外两个参数 scale 和 bias，更新 $y_i$ 的值

python 手工实现

def layer_norm_compute_python(x, epsilon, scale, bias):
    mean = tf.reduce_mean(x, axis=[-1], keep_dims=True)
    # tf.square 取平方
    variance = tf.reduce_mean(tf.square(x - mean), axis=[-1], keep_dims=True)
    # tf.rsqrt 去平方根的倒数，所以下面要用乘法
    norm_x = (x - mean) * tf.rsqrt(variance + epsilon)
    return norm_x * scale + bias
 
def layer_norm(x, filters=None, epsilon=1e-6, scope=None, reuse=None):
    if filters is None:
        filters = x.get_shape()[-1]
    with tf.variable_scope(scope, default_name="layer_norm", values=[x], reuse=reuse):
        scale = tf.get_variable(
            "layer_norm_scale", [filters], regularizer = regularizer, initializer=tf.ones_initializer())
        bias = tf.get_variable(
            "layer_norm_bias", [filters], regularizer = regularizer, initializer=tf.zeros_initializer())
        result = layer_norm_compute_python(x, epsilon, scale, bias)
        return result

TensorFlow中的计算方式

这里 tf 为 1.x版本

Bert modeling.py源码如下：

def layer_norm(input_tensor, name=None):
  """Run layer normalization on the last dimension of the tensor."""
  return tf.contrib.layers.layer_norm(
      inputs=input_tensor, begin_norm_axis=-1, begin_params_axis=-1, scope=name)

验证两种方式

另外一种计算期望和方法的方法是 mean, variance = tf.nn.moments(img, axes=-1)

# layer normalization
import tensorflow as tf
 
x1 = tf.convert_to_tensor(
    [[[18.369314, 2.6570225, 20.402943],
      [10.403599, 2.7813416, 20.794857]],
     [[19.0327, 2.6398268, 6.3894367],
      [3.921237, 10.761424, 2.7887821]],
     [[11.466338, 20.210938, 8.242946],
      [22.77081, 11.555874, 11.183836]],
     [[8.976935, 10.204252, 11.20231],
      [-7.356888, 6.2725096, 1.1952505]]])

# shape # (4, 2)
mean_x = tf.reduce_mean(x1, axis=-1)  # 计算某一维度的平均值
# shape # (4, 2, 1)
mean_x = tf.expand_dims(mean_x, -1)
 
# std_x = tf.math.reduce_std(x1, axis=-1)  # tf 1.15版本，计算的是 标准差

# shape # (4, 2)  这里计算的是 方差，即标准差的平方
variance_x = tf.reduce_mean(tf.square(x1 - mean_x), axis=[-1])
# mean_x2, variance_x2 = tf.nn.moments(x1, axes=-1)  # 方法2
# shape # (4, 2, 1)
variance_x= tf.expand_dims(variance_x, -1)
 
# 手动计算
la_no1 = (x1 - mean_x)*tf.rsqrt(variance_x)  # tf.rsqrt 平方根倒数
 
x = tf.placeholder(tf.float32, shape=[4, 2, 3])
la_no = tf.contrib.layers.layer_norm(
      inputs=x, begin_norm_axis=-1, begin_params_axis=-1)
with tf.Session() as sess1:
    sess1.run(tf.global_variables_initializer())
    x1 = sess1.run(x1)
 
    # 手动计算
    print(sess1.run(la_no1))
    '''
    [[[ 0.5749929  -1.4064412   0.83144826]
    [-0.1250188  -1.1574404   1.2824593 ]]
    [[ 1.3801126  -0.9573896  -0.422723  ]
    [-0.5402143   1.4019758  -0.86176145]]
    [[-0.36398557  1.3654773  -1.0014919 ]
    [ 1.4136491  -0.6722269  -0.74142253]]
    [[-1.2645671   0.08396867  1.1806016 ]
    [-1.3146634   1.108713    0.20595042]]]
    '''
 
    # tensorflow实现
    print(sess1.run(la_no, feed_dict={x: x1}))  # feed_dict 喂入数据
    '''
    [[[ 0.574993   -1.4064413   0.8314482 ]
    [-0.12501884 -1.1574404   1.2824591 ]]
    [[ 1.3801126  -0.9573896  -0.422723  ]
    [-0.5402143   1.4019756  -0.86176145]]
    [[-0.36398554  1.3654773  -1.0014919 ]
    [ 1.4136491  -0.67222667 -0.7414224 ]]
    [[-1.2645674   0.08396816  1.1806011 ]
    [-1.3146634   1.108713    0.20595042]]]
    '''