Multi-source attention mechanism

最新推荐文章于 2022-07-04 19:20:58 发布

Lcyztf

最新推荐文章于 2022-07-04 19:20:58 发布

阅读量806

点赞数

分类专栏： ML Dialogue Systems 文章标签： attention

本文链接：https://blog.csdn.net/Lcyztf/article/details/82463250

版权

Dialogue Systems 同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

3 篇文章 0 订阅

订阅专栏

一、Attention Strategies for Multi-Source Sequence-to-Sequence Learning

本文主要考虑多encoder和单个RNN decoder的scenario.主要分为以下三种来讨论：

1、Concatenation of the context vectors

A widely adopted technique for combining multiple attention models in a decoder is concatenation of the context vectors. This setting forces the model to attend to each encoder independently and lets the attention combination to be resolved implicitly in the subsequent network layers.

2、Flat Attention Combination

We let the decoder learn the αi distribution jointly over all encoder hidden states.

α系数是对于所有的encoders states归一化的。

attention energy term e按照Bahdanau的计算方法，注意The parameters va and Wa are shared among the encoders, and Ua is different for each encoder and serves as an encoder-speciﬁc projection of hidden states into a common vector space.

The states of the individual encoders occupy different vector spaces and can have a different dimensionality, therefore the context vector cannot be computed as their weighted sum. We project them into a single space using linear projections:

3、Hierarchical Attention Combination

The hierarchical attention combination model computes every context vector independently, similarly to the concatenation approach. Instead of concatenation, a second attention mechanism is constructed over the context vectors.

First, we compute the context vector for each encoder independently using Equation 3.

Second, we project the context vectors (and optionally the sentinel) into a common space (Equation 8), we compute another distribution over the projected context vectors(Equation 9) and their corresponding weighted average (Equation 10):

Both of the alternatives(method 2 and method 3) allow us to explicitly compute distribution over the encoders and thus interpret how much attention is paid to each encoder at every decoding step.

在multi-source MT的实验中，hierarchical attention的效果是最好的。

Lcyztf

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Multi-source attention mechanism

一、Attention Strategies for Multi-Source Sequence-to-Sequence Learning本文主要考虑多encoder和单个RNN decoder的scenario.主要分为以下三种来讨论：1、Concatenation of the context vectorsA widely adopted technique for combin...
复制链接

扫一扫