Attention mechanism

最新推荐文章于 2021-03-08 11:45:09 发布

DecafTea

最新推荐文章于 2021-03-08 11:45:09 发布

阅读量106

点赞数

分类专栏： NLP

本文链接：https://blog.csdn.net/DecafTea/article/details/111562295

版权

NLP 专栏收录该内容

52 篇文章 3 订阅

订阅专栏

在这里插入图片描述

1. Attention with RNN

Attention-based encoder-decoder sequence model architecture (A is RNN or LSTM or GRU）

在这里插入图片描述
decoder’s inputs一般用上一时间步预测出的结果。
下一时间步的hidden state s用上一时间步的c，s和这一时间步的new input更新：s_t = func(s_t-1, c_t-1, x’_t)

attention参数包含W_Q, W_K, W_V.

softmax(K^Tq): 可以衡量k1, k2, …, km分别与qj的相似度/匹配程度，值域[0, 1].

在这里插入图片描述
attention本质是对value进行加权求和。value表示对应word的信息。权重alpha表示信息的重要性，权重越大越attention于对应的value上。

终极总结图：
在这里插入图片描述
我们上面采取的是第一种方式：use dot product to measure similarity

2. Self-attention with RNN

下一时间步的hidden state h用上一时间步的c，h和这一时间步的new input更新：h_t = func(h_t-1, c_t-1, x’_t). 也可以只用c和x’: h_t = func(c_t-1, x’_t)

在这里插入图片描述
h0是全零向量，所以c1 = h1

repeat the process until 读完所有input x:
读取x_t, h_t-1, c_t-1，更新h_t
算h_t和所有h_1, …, h_t的align程度, 记作alpha
c = alpha和h对应相乘
在这里插入图片描述

repeat the process …

在这里插入图片描述

3. Attention without RNN

可用于替代RNN models for seq2seq(many-to-many) tasks e.g. machine translation。
在这里插入图片描述
x：input
x’：also input，but use predicted results during inference
每个x‘都对应一个c

4. Self-attention without RNN

可用于替代RNN for any tasks，不局限于seq2seq。
在这里插入图片描述
两个input sequence都是x
每个x对应一个c

Reference:
[1] https://www.bilibili.com/video/BV1YK4y1x7PC
[2] https://www.bilibili.com/video/BV1Xf4y127fz
[3] https://www.bilibili.com/video/BV12f4y12777
[4] https://www.bilibili.com/video/BV1E54y1B7hk