理解 tf.keras.layers.Attention

欢迎关注微信公众号:python科技园

 

官方链接:https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/keras/layers/Attention

 

语法:

tf.keras.layers.Attention(
    use_scale=False, **kwargs
)

Inputs are query tensor of shape [batch_size, Tq, dim]value tensor of shape [batch_size, Tv, dim] and key tensor of shape [batch_size, Tv, dim]. The calculation follows the steps:

  1. Calculate scores with shape [batch_size, Tq, Tv] as a query-key dot product: scores = tf.matmul(query, key, transpose_b=True).
  2. Use scores to calculate a distribution with shape [batch_size, Tq, Tv]distribution = tf.nn.softmax(scores).
  3. Use distribution to create a linear combination of value with shape batch_size, Tq, dim]return tf.matmul(distribution, value).

 

参数:

  • use_scale: If True, will create a scalar variable to scale the attention scores.
  • causal: Boolean. Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past.

 

Call 参数:

  • inputs: List of the following tensors:
    • query: Query Tensor of shape [batch_size, Tq, dim].
    • value: Value Tensor of shape [batch_size, Tv, dim].
    • key: Optional key Tensor of shape [batch_size, Tv, dim]. If not given, will use value for both key and value, which is the most common case.
  • mask: List of the following tensors:
    • query_mask: A boolean mask Tensor of shape [batch_size, Tq]. If given, the output will be zero at the positions where mask==False.
    • value_mask: A boolean mask Tensor of shape [batch_size, Tv]. If given, will apply the mask such that values at positions where mask==False do not contribute to the result.

 

输出 shape:

Attention outputs of shape [batch_size, Tq, dim].

The meaning of queryvalue and key depend on the application. In the case of text similarity, for example, query is the sequence embeddings of the first piece of text and value is the sequence embeddings of the second piece of text. key is usually the same tensor as value.

 

示例 1:

import tensorflow as tf

query = tf.convert_to_tensor(np.asarray([[[1., 1., 1., 3.]]]))

key_list = tf.convert_to_tensor(np.asarray([[[1., 1., 2., 4.], [4., 1., 1., 3.], [1., 1., 2., 1.]],
                                            [[1., 0., 2., 1.], [1., 2., 1., 2.], [1., 0., 2., 1.]]]))

query_value_attention_seq = tf.keras.layers.Attention()([query, key_list])

结果 1:

 

采用 语法 中提到的计算方式计算,看看结果:

scores = tf.matmul(query, key, transpose_b=True)

distribution = tf.nn.softmax(scores)

print(tf.matmul(distribution, value))

示例 2:

import tensorflow as tf

scores = tf.matmul(query, key_list, transpose_b=True)

distribution = tf.nn.softmax(scores)

result = tf.matmul(distribution, key_list)

结果 2:

 

其中:distribution 的结果即为每个输入向量的权重值。

[0.731, 0.269, 9.02e-5] 分别代表 [[1., 1., 2., 4.], [4., 1., 1., 3.], [1., 1., 2., 1.]] 的权重值。

 

可见 结果 1 结果 2 的返回值一致。

  • 6
    点赞
  • 19
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
boston_housing module: Boston housing price regression dataset. cifar10 module: CIFAR10 small images classification dataset. cifar100 module: CIFAR100 small images classification dataset. fashion_mnist module: Fashion-MNIST dataset. imdb module: IMDB sentiment classification dataset. mnist module: MNIST handwritten digits dataset. reuters module: Reuters topic classification dataset. import tensorflow as tf from tensorflow import keras fashion_mnist = keras.datasets.fashion_mnist (x_train, y_train), (x_test, y_test) = fashion_mnist.load_data() mnist = keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() cifar100 = keras.datasets.cifar100 (x_train, y_train), (x_test, y_test) = cifar100.load_data() cifar10 = keras.datasets.cifar10 (x_train, y_train), (x_test, y_test) = cifar10.load_data() imdb = keras.datasets.imdb (x_train, y_train), (x_test, y_test) = imdb.load_data() # word_index is a dictionary mapping words to an integer index word_index = imdb.get_word_index() # We reverse it, mapping integer indices to words reverse_word_index = dict([(value, key) for (key, value) in word_index.items()]) # We decode the review; note that our indices were offset by 3 # because 0, 1 and 2 are reserved indices for "padding", "start of sequence", and "unknown". decoded_review = ' '.join([reverse_word_index.get(i - 3, '?') for i in x_train[0]]) print(decoded_review) boston_housing = keras.datasets.boston_housing (x_train, y_train), (x_test, y_test) = boston_housing.load_data() reuters= keras.datasets.reuters (x_train, y_train), (x_test, y_test) = reuters.load_data() tf.keras.datasets.reuters.get_word_index( path='reuters_word_index.json' )

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值