[软件工程应用与实践]lingvo学习笔记

NewtonLoop

于 2021-11-27 16:29:31 发布

阅读量3.4k

点赞数

分类专栏： [软件工程应用与实践]lingvo学习笔记文章标签： 1024程序员节 python 深度学习

本文链接：https://blog.csdn.net/NewtonLoop/article/details/121578508

版权

[软件工程应用与实践]lingvo学习笔记专栏收录该内容

18 篇文章 0 订阅

订阅专栏

[软件工程应用与实践]lingvo学习笔记

2021SC@SDUSC

能看出来, 代码的作者不同, 代码的组织风格也不同. 负责该模块的大佬习惯在构造方法时加入默认参数, 起的名字也带有缩写. 读书是与作者交流, 读代码也相当于和那位大佬交流了一下吧! 很有趣.

lingvo.core.favor_attention module

模块作用 : 实现多头favore -注意层和favore -自我注意层

方法

next_seed(current_seed)

函数作用 : 产生种子

源码 :

def next_seed(current_seed):
  if current_seed is None:
    return None
  else:
    return current_seed + 1

create_projection_matrix(nb_random_projections, dim, seed=0, scaling=0)

参数 :

nb_random_projections : 随机投影数
dim : 每个随机投影的维数
seed : 用于构建投影的随机种子
scaling : 如果所有随机投影都需以 length \sqrt{dim} 进行标准化处理, 则scaling值为 1 , 如果随机投影分布跟随 \chi(dim) 则为 0 .
返回 : 形状为[nb_random_projections, dim]的随机投影矩阵

作用 : 构造随机投影矩阵

源码 :

def create_projection_matrix(nb_random_projections, dim, seed=0, scaling=0):

若定义随即投影数为零, 则无需执行该方法

  if nb_random_projections == 0:
    return None

获取随机种子

  nb_full_blocks = nb_random_projections // dim
  block_list = []
  current_seed = seed
  for _ in range(nb_full_blocks):
    unstructured_block = tf.random.normal((dim, dim), seed=current_seed)
    q, _ = tf.linalg.qr(unstructured_block)
    q = tf.transpose(q)
    block_list.append(q)
    current_seed = next_seed(current_seed)
  remaining_rows = nb_random_projections - nb_full_blocks * dim
  if remaining_rows > 0:
    unstructured_block = tf.random.normal((dim, dim), seed=current_seed)
    q, _ = tf.linalg.qr(unstructured_block)
    q = tf.transpose(q)
    block_list.append(q[0:remaining_rows])
  final_matrix = tf.concat(block_list, 0)
  current_seed = next_seed(current_seed)

若投影无需标准化处理

  if scaling == 0:
    squares = tf.math.square(
        tf.random.normal((nb_random_projections, dim), seed=current_seed))
    squared_lengths = tf.math.reduce_sum(squares, axis=1)
    multiplier = tf.math.sqrt(squared_lengths)

若投影需标准化处理

  elif scaling == 1:
    multiplier = tf.math.sqrt(float(dim)) * tf.ones((nb_random_projections))
  else:
    raise ValueError("Scaling must be one of {0, 1}. Was %s" % scaling)

  return tf.linalg.matmul(tf.linalg.diag(multiplier), final_matrix)

relu_kernel_transformation(data, is_query, projection_matrix=None, numerical_stabilizer=0.001)

参数 :

data : 形状为 [B, L, H, D] 的张量. B - batch dimension, L - attention dimensions, H - heads, D - features.
is_query : 指示输入数据是查询还是键张量
projection_matrix : 形状为[M, D] 的高斯矩阵. M - 随机特征的数量, D - 每个D x D子块有成对且正交的行
numerical_stabilizer : 数值稳定性的小正常数
返回 : 对应内核特征映射

作用 : 计算relu内核的特性

源码 :

def relu_kernel_transformation(data,
                               is_query,
                               projection_matrix=None,
                               numerical_stabilizer=0.001):
  del is_query
  if projection_matrix is None:
    return tf.nn.relu(data) + numerical_stabilizer
  else:
    ratio = 1.0 / tf.math.sqrt(
        tf.dtypes.cast(projection_matrix.shape[0], projection_matrix.dtype))
    data_dash = ratio * tf.einsum("blhd,md->blhm", data, projection_matrix)
    return tf.nn.relu(data_dash) + numerical_stabilizer

线性整流函数（Rectified Linear Unit, ReLU），又称修正线性单元。在横坐标的右侧，ReLU函数为线性函数。在横坐标的右侧，ReLU函数为值为0。tf.nn.relu()函数的目的是，将输入小于0的值幅值为0，输入大于0的值不变。

def softmax_kernel_transformation(data, is_query, projection_matrix=None, numerical_stabilizer=0.000001)

参数 : 与上一个方法完全相同

作用 : 使用FAVOR+机制计算softmax内核的随机特性

favor + 机制在之前学习attention的笔记中已经有记录, 此处不展开

源码 :


def softmax_kernel_transformation(data,
                                  is_query,
                                  projection_matrix=None,
                                  numerical_stabilizer=0.000001):
  projection_matrix = tf.cast(projection_matrix, data.dtype)

数据标准化处理

  data_normalizer = 1.0 / tf.math.sqrt(
      (tf.math.sqrt(tf.dtypes.cast(data.shape[-1], data.dtype))))
  ratio = 1.0 / tf.math.sqrt(
      tf.dtypes.cast(projection_matrix.shape[0], data.dtype))

将标准化后的数据与随机投影矩阵对应位相乘

  data_dash = tf.einsum("blhd,md->blhm", data_normalizer * data,
                        projection_matrix)
  diag_data = tf.math.square(data)
  diag_data = tf.math.reduce_sum(
      diag_data, axis=tf.keras.backend.ndim(data) - 1)
  diag_data = (diag_data / 2.0) * data_normalizer * data_normalizer
  diag_data = tf.expand_dims(diag_data, axis=tf.keras.backend.ndim(data) - 1)

如果是一条 query, 则给data_dash降维后返回

  if is_query:
    last_dims_t = (len(data_dash.shape) - 1,)
    data_dash = ratio * (
        tf.math.exp(data_dash - diag_data - tf.math.reduce_max(
            data_dash, axis=last_dims_t, keepdims=True)) + numerical_stabilizer)
  else:
    data_dash = ratio * (
        tf.math.exp(data_dash - diag_data - tf.math.reduce_max(data_dash)) +
        numerical_stabilizer)

  return data_dash

noncausal_numerator(qs, ks, vs)

参数 :

qs : 形状为[L,B,H,M]的query_prime张量
ks : 形状为[L,B,H,M]的key_prime张量
vs : 形状为[L,B,H,D]的值向量
返回 : 未标准化的 FAVOR 无条件attention向量AV

作用 : 计算未标准化的 FAVOR 无条件attention向量AV

源码 :
结合参数类型一看便知

def noncausal_numerator(qs, ks, vs):
  kvs = tf.einsum("lbhm,lbhd->bhmd", ks, vs)
  return tf.einsum("lbhm,bhmd->lbhd", qs, kvs)

NewtonLoop

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[软件工程应用与实践]lingvo学习笔记

[软件工程应用与实践]lingvo学习笔记2021SC@SDUSC能看出来, 代码的作者不同, 代码的组织风格也不同. 负责该模块的大佬习惯在构造方法时加入默认参数, 起的名字也带有缩写. 读书是与作者交流, 读代码也相当于和那位大佬交流了一下吧! 很有趣.lingvo.core.favor_attention module模块作用 : 实现多头favore -注意层和favore -自我注意层方法next_seed(current_seed)函数作用 : 产生种子源码 :def next
复制链接

扫一扫