本文链接：https://blog.csdn.net/jnwujingchao/article/details/106891324

问题：

作者使用了一种层级架构机器阅读模型用来从长文档中抽取精确的信息。这个模型把长文档编码成小的、重叠的窗口，并对这些窗口进行并行编码。然后，模型通过对这些窗口编码进行处理，简化为单个编码，使用序列解码器解码为答案。这种层级方法允许模型在不增加序列步骤的情况下将其扩展到长文档。在有监督的环境下，我们的模型在WikiReading数据集上达到了76.8的最新精度。

我们也可以在半监督的情况下评估我们的模型，主要是通过对WikiReading训练集采用降采样的方法创建越来越小的监督量。与此同时让完整的未标记的文档语料库在文档窗口上去训练一个序列自动编码器。我们评估的模型可以重用自动编码器的状态和输出。而无需微调它们的权重，从而实现更有效的训练和推理。

介绍：
深度神经网络对于机器阅读和问答任务表现除了非常好的效果，这要求能够从文档中找出查询的答案，一个最基本的seq2seq就能够通过对问题和答案编码，并解码一个答案序列来处理这些任务。但是这也存在一些缺点，答案可能文本的早期出现，并且想要进一步存储在循环的步骤中，这可能会导致信息的遗失。在解码器中添加注意力机制就能解决这个问题

即使需要注意力机制，基于循环神经网络（RNNs）的方法也需要与文档长度成比例的多个顺序步骤来编码每个文档位置。分层阅读模式通过将文档分解成句子来解决这个问题（Choi等人，2017）。在本文中，我们引入了一个简单的层次模型，在没有这种语言结构的情况下，可以在我们的基准任务上达到最先进的性能，并以此为框架来探索阅读理解的半监督学习。

我们首先开发了一个称为滑动窗口编码器的分级读卡器专注读卡器（sware），它可以绕过前面提到的现有读卡器的瓶颈。sware，如图1所示，首先将每个问题编码成一个向量空间表示。然后，它将每个文档分为重叠的、固定长度的窗口，并根据问题表示方式对每个窗口进行并行编码。灵感来自于最近的注意机制，如赫尔曼等人。（2015年），SWEAR关注窗口表示，并将它们简化为每个文档的单个向量。最后，从这个文档向量中解码出答案。我们的结果显示，在监督的维基阅读任务中，sware的表现优于先前的最新水平（Hewlett等人，2016年），将平均F1从之前的75.6提高到76.8（Choi等人，2017年）。
尽管WikiReading是一个包含数百万标记示例的大型数据集，但许多机器读取应用程序在一组大型未标记文档中的标记示例数量要少得多。为了模拟这种情况，我们构建了一个半监督版本的WikiReading，方法是将标记的语料库降采样为各种较小的子集，同时保留完整的未标记语料库（即Wikipedia）。为了利用未标记的数据，我们评估了在半监督版本的sware中重用无监督递归自编码的多种方法。重要的是，在这些模型中，我们可以重用所有的自动编码器参数，而无需微调，这意味着监控阶段只需学习如何在文档和查询上确定答案。这允许更有效的训练和在线操作：文档可以在脱机的单个过程中编码，并且这些编码在训练期间和回答ng查询时都被所有模型重用。我们的半监督学习模型在几个具有不同特征的子集上取得了比有监督sware更好的性能。与2016年的71.8（100%的数据集）相比，表现最好的模型达到66.5（1%的WikiReading数据集）。

相关工作：

Our model architecture is one of many hierarchical models for documents proposed in the literature. The most similar is proposed by Choi et al. (2017), which uses a coarse-to-fine approach of first encoding each sentence with a cheap BoW or Conv model, then selecting the top k sentences to form a mini-document which is then processed by a standard seq2seq model. While they also evaluate their approach on WikiReading, their emphasis is on efficiency rather than model accuracy, with the resulting model performing slightly worse than the full seq2seq model but taking much less time to execute. SWEAR also requires fewer sequential steps than the document length but still computes at least as many recurrent steps in parallel.

Our model can also be viewed as containing a Memory Network (MemNet) built from a document (Weston et al., 2014; Sukhbaatar et al., 2015), where the memories are the window encodings. The core MemNet operation consists of attention over a set of vectors (memories) based on a query encoding, and then reduction of a second set of vectors by weighted sum based on the attention weights. In particular, Miller et al. (2016) introduce the Key-Value MemNet where the two sets of memories are computed from the keys and values of a map, respectively: In their QA task, each memory entry consists of a potential answer (the value) and its context bag of words (the key). Our reviewer approach is inspired by “Encode, Review, Decode” approach introduced by Yang et al. (2016), which showed the value of introducing additional computation steps between the encoder and decoder in a seq2seq model. The basic recurrent autoencoder was first introduced by Dai et al. (2015), a standard seq2seq model with the same input and output. Fabius et al. (2014) expanded this model into the Variational Recurrent Autoencoder (VRAE), which we describe in Section 4.1.1. VRAE is an application of the general idea of variational autoencoding, which applies variational approximation to the posterior to reconstruct the input (Kingma and Welling, 2013). While we train window autoencoders, an alternative approach is hierarchical document autoencoders (Li et al., 2015).

The semi-supervised approach of initializing the weights of an RNN encoder with those of a recurrent autoencoder was first studied by Dai et al. (2015) in the context of document classification and further studied by Ramachandran et al. (2016) for traditional sequence-to-sequence tasks such as machine translation. Our baseline semisupervised model can be viewed as an extension of these approaches to a reading comprehension setting. Dai et al. (2015) also explore initialization from a language model, but find that the recurrent autoencoder is superior, which is why we do not consider language models in this work

结论：
We have demonstrated the efficacy of the SWEAR architecture, reaching state of the art performance on supervised WikiReading. The model improves the extraction of precise information from long documents over the baseline seq2seq model. In a semi-supervised setting, our method of reusing (V)RAE encodings in a reading comprehension framework is effective, with SWEAR-PR reaching an accuracy of 66.5 on 1% of the dataset against last year’s state of the art of 71.8 using the full dataset. However, these methods require careful configuration and tuning to succeed, and making them more robust presents an excellent opportunity for future work.

我们的模型架构是文献中提出的许多文档层次模型之一。最相似的是由Choi等人提出的。（2017），它使用一种从粗到细的方法，首先使用廉价的BoW或Conv模型对每个句子进行编码，然后选择前k个句子形成一个迷你文档，然后由标准的seq2seq模型进行处理。虽然他们也在WikiReading上评估他们的方法，但他们的重点是效率而不是模型的准确性，结果模型的性能略低于完整的seq2seq模型，但执行时间要少得多。sware还需要比文档长度更少的连续步骤，但仍然并行计算至少同样多的重复步骤。
我们的模型还可以被视为包含根据文档（Weston et al.，2014；Sukhbatar et al.，2015）构建的内存网络（MemNet），其中内存是窗口编码。核心MemNet操作包括基于查询编码的对一组向量（存储器）的注意力，然后根据注意力权重通过加权和减少第二组向量。尤其是Miller等人。（2016）引入键值MemNet，其中两组存储器分别从一个映射的键值和值计算得出：在它们的QA任务中，每个存储器条目由一个潜在的答案（值）和它的上下文单词包（键）组成。我们的评审方法受到杨等人提出的“编码、评审、解码”方法的启发。（2016），这表明了在seq2seq模型中在编码器和解码器之间引入额外计算步骤的价值。Dai等人首先介绍了基本的递归自动编码器。（2015），具有相同输入和输出的标准seq2seq模型。Fabius等人。（2014）将该模型扩展为变分递归自动编码器（VRAE），我们在第4.1.1节中对此进行了描述。VRAE是变分自编码的一般思想的应用，它将变分近似应用于后验来重建输入（Kingma和Welling，2013）。当我们训练窗口自动编码器时，另一种方法是分层文档自动编码器（Li等人，2015）。
Dai等人首先研究了用递归自编码器的权值初始化RNN编码器权值的半监督方法。（2015）在文件分类的背景下，由Ramachandran等人进一步研究。（2016）用于传统的顺序到顺序任务，如机器翻译。我们的基线半监督模型可以看作是这些方法在阅读理解环境中的延伸。Dai等人。（2015）也从一个语言模型中探索初始化，但是发现递归的自动编码器是优越的，这就是为什么我们在这项工作中不考虑语言模型