论文笔记—HopRetriever—AAAI_hop encoding-CSDN博客

本文链接：https://blog.csdn.net/ziuno/article/details/117263178

Paper

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions.

Defination

Symbol	Function
Retriever	依据检索模块
Reader	答案抽取模块
$q$	问题
$K$	知识
$D_p=$ Retriever $(q, K)$	用于回答问题 $q$ 的多个文档
$a =$ Reader $q,D_q)$	答案
$d_i$	文档 $i$
$e_i$	$d_i$ 描述的实体
$m_{i,j}=e_i\stackrel{d_i}{\rightarrow} e_j$	在介绍 $e_i$ 的 $d_i$ 中“提及”了 $e_j$

Hop encoding

Mention embedding: $\mathrm{m}_{i,j}$

在实体 $e$ 的两侧加入名为[MARKER]的token，第一个[MARKER]输出的表示作为“提及”向量。
$\mathrm{m}_{i,j}= \left\{ \begin{aligned} BERT_{[M-j]}(q;d_i), && if \;m_{i,j}\in M \\ \mathrm{m}_P（均匀向量）, && otherwise \end{aligned} \right.$

Document embedding: $\mathrm{u}_j$

被提及的 $e_j$ 的非结构化文档，通过BERT的[CLS]输出作为 $\mathrm{u}_j$ 。
$\mathrm{u}_j=BERT_{[CLS]}(q;d_j)$

Knowledge fusion: $\mathrm{hop}_{i,j}$

$a_m=\mathrm{h}\mathrm{W}_k\mathrm{m}_{i,j}$
$a_u=\mathrm{h}\mathrm{W}_k\mathrm{u}_j$
${w_m,w_u\}=softmax(\{a_m,a_u\})$
$\mathrm{hop}_{i,j}=w_m\cdot\mathrm{W}_v\mathrm{m}_{i,j}+w_u\cdot\mathrm{W}_v\mathrm{u}_j$

$\mathrm{h}$ ：检索历史编码，相当于 $q u e r y$
$\mathrm{W}_k$ ：将 $\mathrm{m}_{i,j}$ 和 $\mathrm{u}_j$ 映射为 $k e y$

Probability of retrieving $d_j$ : $p(d_j)$

从前 $t - 1$ 步检索得到实体 $e_i$ ，在第 $t$ 步，由 $e_i$ 确定检索 $d_j$ 的概率为 $p(d_j)$ 。
$p(d_j)=sigmoid(\mathrm{h}_t^\mathrm{T}\mathrm{hop}_{i,j})\\ \mathrm{h}_t= \left\{ \begin{aligned} \mathrm{h}_s, && t=1\\ RNN(\mathrm{h}_{t-1},\mathrm{hop}_{k,i}) && t\geq2 \end{aligned} \right.$

$\mathrm{h}_s$ ：初始隐藏状态向量
$\mathrm{hop}_{k,i}$ ： $t - 1$ 步的编码
$hop_e$ ：检索进程结束标识

$h o p$	编码	注释
$hop_{i,j}$	$f(\mathrm{m}_P,\mathrm{u}_j)$	$e_j$ 在 $d_i$ 中未被“提及”
同上	$f(\mathrm{m}_{i,j},\mathrm{u}_j)$	$e_j$ 在 $d_i$ 中被“提及”
$hop_{s,j}$	$f(\mathrm{m}_P,\mathrm{u}_j)$	选择 $d_j$ 作为开始
同上	$f(\mathrm{m}_P,\mathrm{u}_e)$	检索结束

Sentence-Level Retrieval

$\mathrm{s}_{i,l}=BERT_{[SM-l]}(q;d_i)\\ p(s_i,l)=sigmoid(\mathrm{h}_t\mathrm{W}_s\mathrm{s}_{i,l})$

$\mathrm{s}_{i,l}$ ： $d_i$ 中第 $l$ 个句子结尾处插入[SM-l]对应的句子marker
$p(s_i,l)$ ：若 $p(s_i,l)>0.5$ ，则第 $l$ 个句子作为答案的支撑句子。

Objective Functions: Cross Entropy

param	func
$d_j$	正确文档为 $d_j$
$s_{i,l}$	$d_i$ 中的第 $l$ 个句子
$L_i$	支撑句子下标

第 $t$ 步检索对应的目标函数：
$\log p(d_j)+\sum_{\overline{d}_j\in D,\overline{d}_j\neq d_j}\log(1-p(\overline{d}_j))$
支撑句子预测目标函数：
$\sum_{l\in L_i}\log p(s_{i,l})+\sum_{l\notin L_i}\log(1-p(s_{i,l}))$
同时max两个目标函数