理解 tf.keras.layers.Attention

最新推荐文章于 2024-05-11 10:07:29 发布

我是京城小白

最新推荐文章于 2024-05-11 10:07:29 发布

阅读量1.1w

点赞数 7

分类专栏：深度学习

本文链接：https://blog.csdn.net/wdh315172/article/details/105977429

版权

深度学习专栏收录该内容

26 篇文章 1 订阅

订阅专栏

欢迎关注微信公众号：python科技园

官方链接：https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/keras/layers/Attention

语法：

tf.keras.layers.Attention(
    use_scale=False, **kwargs
)

Inputs are query tensor of shape [batch_size, Tq, dim], value tensor of shape [batch_size, Tv, dim] and key tensor of shape [batch_size, Tv, dim]. The calculation follows the steps:

Calculate scores with shape [batch_size, Tq, Tv] as a query-key dot product: scores = tf.matmul(query, key, transpose_b=True).
Use scores to calculate a distribution with shape [batch_size, Tq, Tv]: distribution = tf.nn.softmax(scores).
Use distribution to create a linear combination of value with shape batch_size, Tq, dim]: return tf.matmul(distribution, value).

参数：

use_scale: If True, will create a scalar variable to scale the attention scores.
causal: Boolean. Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past.

Call 参数：

inputs: List of the following tensors:
- query: Query Tensor of shape [batch_size, Tq, dim].
- value: Value Tensor of shape [batch_size, Tv, dim].
- key: Optional key Tensor of shape [batch_size, Tv, dim]. If not given, will use value for both key and value, which is the most common case.
mask: List of the following tensors:
- query_mask: A boolean mask Tensor of shape [batch_size, Tq]. If given, the output will be zero at the positions where mask==False.
- value_mask: A boolean mask Tensor of shape [batch_size, Tv]. If given, will apply the mask such that values at positions where mask==False do not contribute to the result.

输出 shape：

Attention outputs of shape [batch_size, Tq, dim].

The meaning of query, value and key depend on the application. In the case of text similarity, for example, query is the sequence embeddings of the first piece of text and value is the sequence embeddings of the second piece of text. key is usually the same tensor as value.

示例 1：

import tensorflow as tf

query = tf.convert_to_tensor(np.asarray([[[1., 1., 1., 3.]]]))

key_list = tf.convert_to_tensor(np.asarray([[[1., 1., 2., 4.], [4., 1., 1., 3.], [1., 1., 2., 1.]],
                                            [[1., 0., 2., 1.], [1., 2., 1., 2.], [1., 0., 2., 1.]]]))

query_value_attention_seq = tf.keras.layers.Attention()([query, key_list])

结果 1：

采用语法中提到的计算方式计算，看看结果：

scores = tf.matmul(query, key, transpose_b=True)

distribution = tf.nn.softmax(scores)

print(tf.matmul(distribution, value))

示例 2：

import tensorflow as tf

scores = tf.matmul(query, key_list, transpose_b=True)

distribution = tf.nn.softmax(scores)

result = tf.matmul(distribution, key_list)

结果 2：

其中：distribution 的结果即为每个输入向量的权重值。

[0.731, 0.269, 9.02e-5] 分别代表 [[1., 1., 2., 4.], [4., 1., 1., 3.], [1., 1., 2., 1.]] 的权重值。

可见 结果 1 和 结果 2 的返回值一致。

我是京城小白

关注

7
点赞
踩
20

收藏

觉得还不错? 一键收藏
1
评论
理解 tf.keras.layers.Attention

官方链接：https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/keras/layers/Attention语法：tf.keras.layers.Attention( use_scale=False, **kwargs)Inputs arequerytensor of shape[batch_...
复制链接

扫一扫

专栏目录