Fine-Grained Attention Mechanism for Neural Machine Translation

最新推荐文章于 2023-02-16 20:30:00 发布

kaiyin_hzau

最新推荐文章于 2023-02-16 20:30:00 发布

阅读量727

点赞数

分类专栏： Attention

本文链接：https://blog.csdn.net/zhoukaiyin_hzau/article/details/80538701

版权

Attention 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

学习笔记迁移至以下公众号：在这里插入图片描述

Fine-Grained Attention Mechanism for Neural Machine Translation

在这篇文章中作者提出了一种fine-grained的注意力机制即每一维的context vector将获得一个单独的注意力得分。
作者证明了这种做法的合理性

通过Choi et al.(2017)的文章，对词向量的维度进行contextualization发现词向量的每一个维度都都在context中充当着不同的作用，由于这一观点的启发，作者Choi 和Bengio便提出这种fine-grained的注意力机制。

Background : Attention-based Neral Machine Translation

这部分主要介绍了Bahdanau et al.(2015)提出的注意力机制。

given a source sentence $X=(w_1^x,w_2^x,...,w_T^x)$

那么我们的目标函数就是 $p(Y=(w_1^y,w_2^y,...,w_T^y)|X)$ .
解释为：在给定输入条件下，获得某一输出的概率。

在这个模型中包含Encoder,Decoder,and attention mechanism三个部分。

一、Encoder

encoder 通常通过双向循环神经网络来完成。在编码过程开始之前，每一个source word $w_t^x$ 被映射到一个连续的向量空间中去(所有输入单词的词嵌入矩阵)。
$x_t=E^x[.,w_t ^x]$
如果图示的话可以表示为：

![这里写图片描述](https://img-blog.csdn.net/20180601155037397?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3pob3VrYWl5aW5faHphdQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70) 将词向量一次输入到Encoder层中。 $\overrightarrow{h_t} = \overrightarrow{\Phi}(\overrightarrow{h}_{t-1},x_t)$, $\overleftarrow h_t = \overleftarrow{\Phi}(\overleftarrow{h}_{t+1},x_t)$,

这里的 $\overleftarrow{\Phi}$ 可以是LSTM（ Hochreiter and Schmidhuber (1997)）,或者GRU( Cho et al. (2014))）。
在每一个t时刻将正反两个方向的结果concat在一起。

$h_t = [\overrightarrow{h_t},\overleftarrow h_t ]$ 这就获得了一个annotation vectors: $C={h_1,h_2,...,h_T}$

二、Decoder

解码层可以模型化为：

$p(w_{t'}^y|w_{

三、Attention mechanism

在解码层中维持着一个hidden state $z_{t'}$ 。在每一个 $t^{'}$ 时刻模型首先用注意力机制 $f_{Att}$ 对C中的每一个向量进行select ，or weight 。在Bahdanau的注意了机制中 $f_{Att}$ 是一个前向神经网络，输入分别为前一时刻的解码层的hidden state，和C中的一个向量。在这个前向神经网络中tanh()常常被作为激发函数。

$e_{t',t}=f_{Att}(z_{t'-1},h_t)$ 这里的得分$e_{t',t}$用softmax来normalized. $a_{t't}=\frac{exp(e_{t',t})}{\sum_{k=1}^{T} {exp(e_{t',k})}}$ 再将对每一个annotation vector的weights加起来。 $c_{t'}=\sum_{t=1}^{T}{a_{t't}}{h_t}$

与编码层不同的是这里的 $y_{t'-1}$ 是上一个目标单词的向量。

这里的 $c_{t'}$ 在作为Decoder的一个输入

$\overrightarrow{h_t} = \overrightarrow{\Phi}(\overrightarrow{z}_{t'-1},y_{t'-1},c_i)$,

Variants of Attention Mechanism

Jean et al.(2015a); Chung et al. (2016a)L，uong et al.(2016)对Bahdanau et al.(2015)的注意力机制做了点改进。

前者将score function 修改成 $e_{t',t} = f_{AttY}(z_{t'-1},h_t,y_{t'-1})$
即加入了上衣时刻的输出值。

Fine-Grained Attention Mechanism

上面无论是Bahdanau的还是后面工作者所做的改进，给定一个query对于每一个context vector 仅得到一个注意力值。如图（a）所示。
作者提出的模型类似于途中的（b）所示。

这里写图片描述

$e_{t',t}^d = f_{AttY2D}^d(z_{t'-1},h_t,y_{t'-1})$ $a_{t',t}^d = \frac{exp(e_{t',t}^d)}{\sum_{k=1}^T}excp(e_{e_{t',k}}^d)$ $c_t'=sum\sum_{t=1}^{T}a_{t',t}*h_t$

这里的改变是讲aligment结果进行softmax后与每一个annotation vector做Hadamard 乘积，这样就将annotation vector的每一个维度独立开来进行注意力机制的描述。

[1] Fine-grained attention mechanism for neural machine translation

kaiyin_hzau

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Fine-Grained Attention Mechanism for Neural Machine Translation

Fine-Grained Attention Mechanism for Neural Machine Translation在这篇文章中作者提出了一种fine-grained的注意力机制即每一维的context vector将获得一个单独的注意力得分。作者证明了这种做法的合理性通过Choi et al.(2017)的文章，对词向量的维度进行contextualization发现词向...
复制链接

扫一扫

专栏目录