Question Answering论文(问答系统&阅读理解)

最新推荐文章于 2021-11-04 19:58:08 发布

funNLPer

最新推荐文章于 2021-11-04 19:58:08 发布

阅读量2.9k

点赞数

分类专栏：论文阅读文章标签：深度学习

本文链接：https://blog.csdn.net/orangerfun/article/details/107531356

版权

论文阅读专栏收录该内容

8 篇文章 1 订阅

订阅专栏

1. 概述

两种方法：
基于信息检索的问答系统 IR-based question answering 和 基于知识的问答系统 knowledge-based question answering

IR-based question answering：
给一个用户的问题，首先通过信息检索方法找到相关的文档或短文，然后使用阅读理解算法阅读这些被检索到的文档直接由span of text产生一个答案。

knowledge-based question answering
该系统并非是构建问题的语义表示（向量），而是将What states borderTexas?映射到逻辑表示：x.state(x)^borders(x,texas)，或者将When was Ada Lovelace born?映射到gapped relation： birth-year (Ada Lovelace, ?x)，然后这些表示将被用于请求数据库

2. IR-based Factoid Question Answering

下图展示了IR-based Factoid Question Answering系统的三个阶段：

问题处理阶段（ question processing）
文章检索及排序(passage retrieval and ranking)
答案抽取（answer extraction）

2.1 Question Processing

该阶段的主要目的是提取一些关键字（query），用于送入信息检索系统去匹配文档。当然有些系统也会抽取其他信息，如：

答案类型（answer type）：person, location, time, etc.
问题类型（question type）：the question is a definition question, a math question, a list question?
中心点(focus)：问题中的很有可能被答案替换的词

例如：question: Which US state capital has the largest population?
query: “US state capital has the largest population”
answer type: city
focus: state capital

2.1.1 query formulation

Query formulation is the task of creating a query—a list of tokens— to send to an information retrieval system to retrieve documents that might contain answer strings.
例如：
when was the laser invented? >>> the laser was invented
where is the Valley of the Kings?>>>the Valley of the Kings is located in

2.1.2 Answer Types

问题像Who founded Virgin Airlines?期望得到答案是PERSON
而“What Canadian city has the largest population?”期望得到CITY

3. Neural Answer Extraction

神经网络答案提取器经常设计成上下文的阅读理解任务

3.1 A bi-LSTM-based Reading Comprehension Algorithm

阅读理解问题是给定一个有 $l$ 个单词 $q_1, q_2, ... , q_l$ 组成的问题 $q$ 和由 $m$ 个单词 $p_1, p_2, ..., p_m$ 组成的文章 $p$ ，目的是计算每一个单词 $p_i$ 是答案的起点和终点的概率

下图展示了阅读理解系统结构
在这里插入图片描述

图1. 基于BiLSTM的问答系统
和大部分系统一样，该系统将问题表示成向量，将文章中的每个单词表示成向量，计算问题向量和每个文章单词向量的相似度，然后使用相似度得分决定答案的起点和终点。

算法细节
问题被表示成单独的向量 $q$ ，该向量是问题中每个单词向量 $q_i$ 的加权和
$\mathbf{q}=\sum_{j} b_{j} \mathbf{q}_{j}$

其中，权重 $b_j$ 是每个问题单词的相关性的度量，它是由一个学习得到的向量 $w$ 计算得到的：
$b_{j}=\frac{\exp \left(\mathbf{w} \cdot \mathbf{q}_{j}\right)}{\sum_{j^{\prime}} \exp \left(\mathbf{w} \cdot \mathbf{q}_{j}^{\prime}\right)}$

为了计算文章向量 $p_1, p_2, ..., p_m$ ，我们首先构建一个输入表示 $\tilde{p}=\left\{\tilde{\mathbf{p}}_{1}, \ldots, \tilde{\mathbf{p}}_{m}\right\}$ ，该表示由四部分构成：

每个单词的词向量，可以是直接从Glove得到
单词特征，类似单词 $p_i$ 的词性，或者是 $p_i$ 的命名实体标签，通过POS或NER算法得到
抽取匹配特征：即单词 $p_i$ 是否出现再问题中， $\mathbb{1}\left(p_{i} \in q\right)$
注意力机制 $q_{i, j}=\frac{\exp \left(\alpha\left(\mathbf{E}\left(p_{i}\right)\right) \cdot \alpha\left(\mathbf{E}\left(q_{j}\right)\right)\right)}{\sum_{j^{\prime}} \exp \left(\alpha\left(\mathbf{E}\left(p_{i}\right)\right) \cdot \alpha\left(\mathbf{E}\left(q_{j}^{\prime}\right)\right)\right)}$ ，其中 $\alpha(\cdot)$ 可以是简单的前馈网络

然后，将 $\tilde{p}$ 送入 BiLSTM网络中 $\left.\left\{\mathbf{p}_{1}, \ldots, \mathbf{p}_{m}\right\}\right)=R N N\left(\left\{\tilde{\mathbf{p}}_{1}, \ldots, \tilde{\mathbf{p}}_{m}\right\}\right.$

上面几个步骤的结果是产生问题的一个向量 $q$ 和文章中每个单词的向量表示 ${p_1, p_2, ..., p_m}$ , 为了找到答案，我们可以分别训练两个单独的分类器，一个去计算每个单词 $p_i$ 的是答案起点 $P_{start}^{(i)}$ 的概率，令一个分类器计算每个单词是答案终点 $P_{end}^{(i)}$ 的概率。分类器可以简单的使用点击来计算 $q$ 和 $p_i$ 的相似度，但是去学习一个更复杂的相似度函数，例如双线性注意力层 $W$
$\begin{array}{l} p_{\text {start}}(i) \propto \exp \left(\mathbf{p}_{i} \mathbf{W}_{s} \mathbf{q}\right) \\ p_{\text {end}}(i) \propto \exp \left(\mathbf{p}_{i} \mathbf{W}_{e} \mathbf{q}\right) \end{array}$

3.2 BERT-based Question Answering

如下图所示， BERT使用 [SEP] 特殊符号将两个序列结合作为输入，预训练的BERT模型将会产生文章每个单词的词向量作为作为输出。在问答系统中，我们将问题作为第一个序列，文章作为第二个序列。在输出头部我们增加两个新的向量S 和 E, 这两个向量将会在fine tuning 阶段被训练。S 表示span-start embedding， E表示span-end embedding。我们计算S和输出token的向量 $T_i$ 的点积来计算 span-start 的概率，然后对文章中所有单词 $T_i$ 进行正则化。
$\text {Pstart}_{i}=\frac{e^{S \cdot T_{i}}}{\sum_{j} e^{S \cdot T_{j}}}$
同理计算span-end的概率
$\text {Pend}_{i}=\frac{e^{E \cdot T_{i}}}{\sum_{j} e^{E \cdot T_{j}}}$

候选答案（位置从 $i$ 到位置 $j$ ）的得分是 $S\cdot T_i+E\cdot T_j$ , 分数最高且 $j\geq i$ 的答案作为模型的预测输出。进行微调训练的目标是每次观察的正确开始和结束位置的对数似然函数。
在这里插入图片描述

图2. 基于BERT的问答系统

reference

Speech and Language Processing An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition

funNLPer

关注

0
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
Question Answering论文(问答系统&阅读理解)

1. 概述两种方法：基于信息检索的问答系统 IR-based question answering 和基于知识的问答系统 knowledge-based question answeringIR-based question answering：给一个用户的问题，首先通过信息检索方法找到相关的文档或短文，然后使用阅读理解算法阅读这些被检索到的文档直接由span of text产生一个答案。knowledge-based question answering该系统并非是构建问题的语义表示（向量）
复制链接

扫一扫