Teaching Machines to Read and Comprehend

  • 看博客都说《Teaching Machines to Read and Comprehend》是机器阅读理解的开山作,今天就来好好理一波。
  • 网上没有找到对应tensorflow的代码,只有一版theno的,此处就只从论文角度思考了。
  • 关键词: MRC
  • 论文做了两件事儿:
  1. we define a new methodology that resolves this bottleneck and provides
    large scale supervised reading comprehension data
  2. 提出了三个模型。

数据

  1. we have collected two new corpora of roughly a million news stories with associated queries from the CNN and Daily Mail websites。
  2. Both news providers supplement their articles with a number of bullet points, summarising aspects of the information contained in the article. Of key importance is that these summary points are abstractive and do not simply copy sentences from the documents
  • 问题:

To understand that distinction consider for instance the following Cloze form queries (created from headlines in the
Daily Mail validation set): a) The hi-tech bra that helps you beat breast X; b) Could Saccharin help beat X ?; c) Can fish oils help fight prostate X ? An ngram language model trained on the Daily Mail would easily correctly predict that (X = cancer), regardless of the contents of the context document, simply because this is a very frequently cured entity in the Daily Mail corpus.

  • 在每日邮报数据集上训练过的n-gram模型可以很轻易地预测出(X=cancer),而不用任何文档的内容,因为这是一个在每日邮报上出现频率很高的实体。
  • 怎么避免?

To prevent such degenerate solutions and create a focused task we anonymise and randomise our corpora with the following procedure,

  • a) use a coreference system to establish coreferents in each data point;
    b) replace all entities with abstract entity markers according to coreference;
    c) randomly permute these entity markers whenever a data point is loaded.
  • 我们通过以下步骤匿名化和随机化我们的语料库
  • a )使用共参照系统在每个数据点建立共参照;
  • b )根据共指,用抽象实体标记替换所有实体;
  • c )每当加载数据点时,随机置换这些实体标记。
  • 注:我个人暂时对这个随机置换有点疑惑,个人认为:如果单纯把所有的实体全部用entity置换,那么其实无非换了一个写法,一个标记而已,这样做不会有任何作用,而这个随机替换才是解决的原因,但是具体这个随机置换是以什么比例,entity置换成原来的?目前个人不清楚。

在这里插入图片描述

模型

  • 这个应该是这篇论文的重点:

The Deep LSTM Reader

  • Our first neural model for reading comprehension tests the ability of Deep LSTM encoders to handle significantly longer sequences.

在这里插入图片描述

  • We feed our documents one word at a time into a Deep LSTM encoder, after a delimiter we then also feed the query into the encoder. Alternatively we also experiment with processing the query then the document. The result is that this model processes each document query pair as a single long sequence.
  • 把文档和问题拼接到一起,中间用分隔符号分开,然后当作一个长序列一起输入LSTM,作者还尝试了先问题在文档的输入。
  • 总的来说:就是用一个两层LSTM来encode query|||document或者document|||query,然后用得到的表示做分类

The Attentive Reader

在这里插入图片描述

  • The Deep LSTM Reader must propagate dependencies over long distances in order to connect queries to their answers. The fixed width hidden vector forms a bottleneck for this information flow that we propose to circumvent using an attention mechanism inspired by recent results in translation and image recognition. This attention model first encodes the document and the query using separate bidirectional single layer LSTMs.

  • Deep LSTM阅读器必须在长距离上传播决策来连接queries到他们的答案。固定长度的隐藏层向量变成了信息流的瓶颈。受到翻译和图像识别成果的启发,我们提议使用attention机制。

  • 这个模型将document和query分开表示,其中query部分就是用了一个双向LSTM来encode,然后将两个方向上的last hidden state拼接作为query的表示,document这部分也是用一个双向的LSTM来encode,每个token的表示是用两个方向上的hidden state拼接而成,document的表示则是用document中所有token的加权平均来表示,这里的权重就是attention,权重越大表示回答query时对应的token的越重要。然后用document和query的表示做分类。

The Impatient Reader

在这里插入图片描述

  • Attentive Reader可以集中注意到document中最有可能得到答案的段落。我们可以进一步升级到当query中的每个词被阅读的时候,模型都会重新阅读document。
  • 这个模型在Attentive Reader模型的基础上更细了一步,即每个query token都与document tokens有关联,而不是像之前的模型将整个query考虑为整体。感觉这个过程就好像是你读query中的每个token都需要找到document中对应相关的token。

模型效果

在这里插入图片描述

论文中提到的数据

[URL]

[Context]

[Question]

[Answer]

[Entity mapping]

在这里插入图片描述

参考

[ 1 ]Teaching Machines to Read and Comprehend

[ 2 ]机器阅读理解之开山之作 Teaching Machines to Read and Comprehend

[ 3 ]深度学习解决机器阅读理解任务的研究进展

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值