Retrospective Reader for Machine Reading Comprehension Zhuosheng论文总结

最新推荐文章于 2021-07-23 19:20:57 发布

aopolin

最新推荐文章于 2021-07-23 19:20:57 发布

阅读量770

点赞数 1

分类专栏： NLP入门文章标签：自然语言处理深度学习

本文链接：https://blog.csdn.net/aopolin/article/details/113186325

版权

NLP入门专栏收录该内容

1 篇文章 0 订阅

订阅专栏

研究问题：当MRC任务中涉及无法回答的问题时，除encoder外，还需要一个称为verification的基本验证模块，以MRC建模的最新实践仍然主要受益于采用经过良好训练的pre-trained LM作为encoder block，只关注“读取”。

解决方法：为了更好的探索verifier的设计，该论文提出Retrospective reader(Retro-Reader)，包含两步：

粗略阅读，简要调查文章和问题的整体关联，并得出初步判断；
精细阅读，验证答案并给出最终预测。

Contributions:

提出了一种新的回溯式阅读器设计，它能够充分有效地进行答案验证，而不是简单地在现有阅读器中堆叠验证程序。
实验表明，我们的回溯阅读器可以在强大的基线上产生实质性的改进，并在基准MRC任务上获得最新的结果。
由于Encoder端的PreTrained LM太过于强大，所以该论文主要关注Decoder端如段落和问题-注意交互，尤其是答案验证

1. 前人工作：

Liu et al.2018 向context中添加一个空的word token，并且最后给reader加一个简单的classification layer
Hu et al.2019 额外使用两种不同类型的loss，独立的span loss预测可回答的问题，以及非独立的loss决定问题是否可回答，此外，还采用了一个额外的验证器来确定预测答案是否由输入的片段所包含（Figure 1-[b]）
back et al.2020 提出一个attention-based satisfaction score来比较question embedding和cacidate answer embeddings（Figure 1-[C]）
zhang et al.2020c 提出一个基于BERTverifier layer，它是一个线性层，用于context embedding，通过上下文词的表示对start和end的分布进行加权，并连接到[CLS]token表示，以进行验证

2. 该论文的模型

retrospective reader由两个并行模块组成的two-stage阅读过程:

sketchy reading module：对于问题是否可以回答，作出一个粗略的判断(external front verifcation)
intensive reading module:预测candidate answers，并且用answerability confidence(internal front verification)联合sketchy judgment score生成最后的答案(rear verification)

2.1 Sketchy Reading Module

2.1.1 Embedding

将question和passage texts拼接作为输入，送进encoder(如PLM)

2.1.2 Interaction

Embedding的结果经过multi-layer Transformer
$\hat{h}_{i}^{l+1} = \Sigma_{m=1}^{M}W_m^{l+1}\{\Sigma_{j=1}^{n}A_{i,j}^m{\cdot}V_m^{l+1}x_j^l\}\tag{1}$

$h_i^{l+1} = LayerNorm(x^l_i + \hat{h}_i^{l+1})\tag{2}$

$\hat{x}_i^{l+1} = W_2^{l+1}{\cdot}GELU(W_1^{l+1}h_i^{l+1}+b_1^{l+1})+b_2^{l+1}\tag{3}$

$x_i^{l+1} = LayerNorm(h_i^{l+1}+\hat{x}_i^{l+1})\tag{4}$

m是指attention heads的index
$A_{i,j}^m$ 是指 $Attention(Q,K,V)=softmax(\frac{QK^T}{\sqrt{d_k}})V$ 前半部分，即 $softmax(\frac{QK^T}{\sqrt{d_k}})$ ，有 $A_{i,j}^m\propto\exp[(Q_m^{l+1}x_i^l)^T(K_m^{l+1}x_j^l)]$ ，即 $\exp$ 后表示normalization
$W_m^{l+1},Q_m^{l+1},K_m^{l+1},V_m^{l+1}$ 都是通过 $m$ -th head attention学习到的参数， $W_1^{l+1},W_2^{l+1},b_1^{l+1},b_2^{l+1}$ 是学习到的参数和偏置

External Front Verification(E-FV)

实际上是一个而分类任务，将第一个token([CLS]) $h_1\in{\bf H}$ 送到全连接曾，做一个分类预测结果，训练的loss函数是一个交叉熵函数
$L^{ans} = - \frac{1}{N}\Sigma^{N}_{i=1}[y_i\log\hat{y_i}+(1-y_i)\log(1-\hat{y_i})] \tag{5}$
计算exteranl verification得分：
$score_{ext}=logit_{na}-logit_{ans}$

2.2 Intensive Reading Module

和sketchy reader的interaction procedure一样，获得representation $\bf H$ ，之前的BERT、XLNET、ALBERT都是直接将 $\bf H$ 直接送进线性层生成预测结果

2.2.1 Question-aware Matching

根据位置信息，将representation $\bf H$ 分开变成 ${\bf H}^Q$ 和 ${\bf H}^P$ , 然后研究了两种潜在的question-aware matching机制

Cross Attention： Transformer-style multi-head cross attention

将 $\bf H$ 和 ${\bf H}^Q$ 送入一个revised one-layer multi-head attention layer(由Lu et al.2019提出)，其中 $\bf Q=K=V$ ，将其中的 $\bf Q$ 替换成 $\bf H$ ， $\bf K$ 和 $\bf V$ 替换成 ${\bf H}^Q$ ，得到最后的表示 $\bf H'$

Matching Attention: traditional matching attention

将 $\bf H$ 和 ${\bf H}^Q$ 送入一个传统的matching attentio层(由WANG ET AL.2017提出)
${\bf M}={\rm softMax}({\bf H}({\bf WH}^Q+{{\bf b} \bigotimes{\bf e}}^T))$

${\bf H'=MH}^Q \tag{6}$

$\bf W$ 和 $\bf b$ 是学习的参数
$\bf e$ 是一个全一向量，用于将偏差向量重复到矩阵中
$\bf M$ 代表了两个序列不同的隐藏层状态的权重
$\bf H'$ 是所有的隐藏层状态的权重和，代表了 $\bf H$ 中的向量如何与 ${\bf H}^Q$ 中的每一个隐藏状态对齐，并且用于最后的预测

2.2.2 Span Prediction

用一个Linear layer进行softmax，并将 $\bf H'$ 作为其输入，得到开始位置和结束位置的概率
$\propto \rm SoftMax(FFN(\bf H')) \tag{7}$
训练的loss定义为
$L^{span}=-\frac{1}{N}\Sigma_{i}^{N}[\log(p^s_{y^s_i})+log(p^e_{y_{i}^e})] \tag{8}$

其中 $y_i^s$ 和 $y_i^e$ 代表i样本ground-truth的开始和结束

2.2.3 Internal Front Verification(I-FV)

I-FV-CE

$\overline{y}_{i,k}=\operatorname{SoftMax}(\operatorname{FFN}(h_1')) \\ L^{ans} = -\frac{1}{N}\sum_{i=1}^N\sum_{k=1}^K[y_{i,k}\log{\overline{y}}_{i,k}] \tag{9}$
- $K$ 代表类别的数量，这里 $K = 2$
I-FV-BE

$\overline{y}_{i}=\operatorname{Sigmoid}(\operatorname{FFN}(h_1')) \\ L^{ans} = -\frac{1}{N}\sum_{i=1}^{N}[y_i\log\overline{y}_i+(1-y_i)\log(1-\overline{y}_i)] \tag{10}$

I-FV-MSE

$\overline{y}_{i}=\operatorname{FFN}(h_1') \\ L^{ans} = -\frac{1}{N}\sum_{i=1}^{N}(y_i-\overline{y_i})^2 \tag{11}$

FV的联合loss function如下：
$L=\alpha_1L^{span}+\alpha_2L^{ans} \tag{13}$

2.2.4 Threshold-based Answerable Verification(TAV)

$score_{has}=\max(s_k+e_l), 1< k \leq l < n score_{null} = s_1+e_1 \tag{14} score_{diff} = score_{null}-score_{has}$

2.3 Rear Verification

将E-FV和I-FV的predicted probilities拼接，
$v=\beta_1score_{diff}+\beta_2score_{ext}$

$\beta_1$ 和 $\beta_2$ 是权重
当 $\delta$ ， $\delta$ 是模型预测出来 $h a s - a n s w e r$ 的得分，问题有解

3. 实验分析

数据集：SQuAD2.0和NewsQA

模型样例：

4. Reference

aopolin

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Retrospective Reader for Machine Reading Comprehension Zhuosheng论文总结

Retrospective Reader for Machine Reading Comprehension Zhuosheng论文总结研究问题：当MRC任务中涉及无法回答的问题时，除encoder外，还需要一个称为verification的基本验证模块，以MRC建模的最新实践仍然主要受益于采用经过良好训练的pre-trained LM作为encoder block，只关注“读取”。解决方法：为了更好的探索verifier的设计，该论文提出Retrospective reader(Retro-Reade
复制链接

扫一扫