本文的目标是介绍Attention Model在自然语言处理里的应用,本文的结构是:先介绍两篇经典之作,一篇NMT,一篇是Image Caption;之后介绍Attention在不同NLP Task上的应用,在介绍时有详有略。
经典之作
有两篇文章被Attention的工作广泛引用,这里单拎出来介绍:
NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
NMT通常用encoder-decoder family的方法,把句子编码成一个定长向量,再解码成译文。作者推测定长向量是encoder-decoder架构性能提升的瓶颈,因此让模型自动寻找(与预测下一个词相关的)部分原文。
Encoder部分,作者使用了Bidirectional RNN for annotating sequences
这是PPT介绍
http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:bahdanau-iclr2015.pdf
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
这篇文章的任务是给图片起个标题,我自己做了一页PPT总结了文章思路
接下来介绍自然语言处理各种Task中的Attention应用。
Attention in Word Embedding
Not All Contexts Are Created Equal: Better Word Representations with Variable Attention
The general intuition of the model is that some words are only relevant for predicting local context (e.g. function words), while other words are more suited for determining global context, such as the topic of the document.
In CBOW:
In this paper: