动态存储网络——问答系统

论文标题:Ask Me Anything:Dynamic Memory Networks for Natural Language Processing

作者:Ankit Kumar, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong,
Romain Paulus, Richard Socher
firstname@metamind.io, MetaMind, Palo Alto, CA USA

简评:通过GRU(参考时间序列,他所谓的dynamic大概就是指这个了)将word vector(或者其他词编码方式)code成sequence,再通过语料背景知识和Q&A中的问题,共同生成一系列sequence,通过一系列矩阵运算,找出分数最高的答案sequence,再decode成自然语言(这一部分可能用到n-gram模型,猜测),相当于某种程度上的sequence2sequence。
该篇只整理评分部分,与目前做的项目有直接关系。
这里写图片描述
*Figure 1. Overview of DMN modules. Communication between
them is indicated by arrows and uses vector representations.
Questions trigger gates which allow vectors for certain inputs to
be given to the episodic memory module. The final state of the
episodic memory is the input to the answer module.*

1.门限循环单元(Gated Recurrent Units)

这里写图片描述
Figure 1. Gated Recurrent Units
使用门限激活函数改RNN结构。我们知道虽然RNNs能够理论上支持很长的序列,但是训练这个网络十分的困难。门限循环单元可以使得RNNs拥有更多持久的记忆从而可以支持更长的序列。GRU的4个基本运算状态如上述公式所述。
新记忆产生:一个新的记忆 h~t 是由过去的隐含状态ht?1和新的输入xt共同得到的。也就是说,这个阶段能够对新观察到的信息(词)和历史的隐层状态ht?1进行合理合并,根据语境向量h~t总结这个新词以何种状态融合。
重置门:重置信号rt会判定ht?1对结果h~t的重要程度。如果ht?1和新的记忆的计算不相关,那么重置门能够完全消除过去的隐层信息(状态)。
更新门:更新信号zt会决定以多大程度将ht?1向下一个状态传递。比如,如果zt≈1,则ht?1几乎完全传递给ht。相反的,如果zt≈0,新的\tilde{h}_th~t前向传递给下一层隐层。
隐层状态:使用过去隐层输入ht?1最终产生了隐层状态ht。新的记忆会根据更新门的判定区产生h~t。
In our experiments, we use a gated recurrent network (GRU) (Cho et al., 2014a; Chung
et al., 2014). We also explored the more complex LSTM (Hochreiter & Schmidhuber, 1997) but it performed similarly and is more computationally expensive. Both work much better than the standard tanh RNN and we postulate that the main strength comes from having gates that allow
the model to suffer less from the vanishing gradient problem (Hochreiter & Schmidhuber, 1997). Assume each time step t has an input xt and a hidden state ht. The internal mechanics of the GRU is defined as:
这里写图片描述

2.Scoring Module

这里写图片描述
Figure 3. Real example of an input list of sentences and the attention gates that are triggered by a specific question from the bAbI tasks (Weston et al., 2015a). Gate values g are shown above the corresponding vectors. The gates change with each search over inputs. We do not draw onnections for gates that are close to zero. Note that the second iteration has wrongly placed some weight in sentence 2,which makes some intuitive sense, as sentence 2 is another place John had been.

In our work, we use a gating functionas our attention mechanism. For each pass i, the
mechanism takes as input a candidate fact c, a previous memory m, and the question q to compute a gate:这里写图片描述

The scoring function G takes as input the feature set z(c, m, q) and produces a scalar score. We first define a large feature vector that captures a variety of similaritiesbetween input, memory and question vectors: z(c, m, q) =
这里写图片描述
where ◦ is the element-wise product. The function G is a simple two-layer feed forward neural network这里写图片描述
Some datasets, such as Facebook’s bAbI dataset, specify which facts are important for a given question. In those cases, the attention mechanism of the G function can be trained in a supervised fashion with a standard crossentropy cost function.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值