Leveraging Query Resolution and Reading Comprehension for Conversational Passage Retrieval

在对话式段落检索中实现Query Resolution and  Reading Comprehension

我们的passage retrieval pipeline 如上图 Figure 1 : Given the original current turn query Q and the conversation history H, 首先perform query resolution, 得到扩展后的查询:resolved query {Q}'[8]. Next, we perform initial retrieval using  {Q}' to get a  list of top-k passages P. Finally, for each passage in P, we combine the scores of a reranking module(BERT) and a reading comprehension module to obtain the final ranked list R. The interpolation weight w(应该是比例权重) is optimized on the TREC CAsT 2019 dataset [1].

query resolution :QuReTeC, a binary term classification query resolution model, which
uses BERT to classify each term in the conversation history as relevant or not,and adds the relevant terms to the original current turn query 

Re-ranking (BERT). 对每一个passage[4]应用bert model. 模型用 bert-large初始化,然后在 MS MARCO passage retrieval dataset[6] 上微调。
Reading Comprehension. The model is a RoBERTa-Large model ,模型在passage中predict a text span 或者是 “No Answer”It is fine-tuned on the MRQA dataset [3].
我们用the sum of the predicted start and end span logits: (lstart+ lend).作为这个模型的分数。

Quantitative analysis

在我们的 pipeline中, passage retrieval 性能依赖于query resolution module. 所以我们需要分别评估这两个模块。具体来说,我们使用Original queries、the QuReTeC-resolved queries、Human rewritten queries来比较passage retrieval performance。

错误分类: (i)ranking error, (ii) query resolution error and (iii) no error. 为了简化分析, we first choose a ranking metric m (e.g., NDCG@3) and a threshold t.如果使用人工重写的查询得到不好的ranking performance (m <= t) 就属于(i)ranking error。 如果Human rewritten query has performance m > t, but for which the QuReTeC resolved query has performance m <= t就是(ii) query resolution error 。

表2显示了分析结果:因为我们假设人工重写总是被很好地指定的,13.5%的人工查询重写错误被归为ranking error。61.0%被QuReTeC正确解决,25.5%没有。这说明会话语篇检索的查询分辨率有较大的提高空间。此外,我们观察到(0 + 1 + 2 + 39)/208≈20%的查询在数据集中不需要解析,即当使用这些查询时,我们可以在在top-3检索到至少一个相关的段落。

表三中右边的每一列都一共包含208个查询,分别表示每一个阈值下的分组情况。可以看出随着性能阈值的增加,排名错误的数量也在增加,这说明通道排名模块还有很大的改进空间。

结论:

绿色是查询解析模块优化的区间,橙色是查询解析模块有待提高的区间,蓝色是检索模块有待提高的区间。

References
1. Dalton, J., Xiong, C., Kumar, V., Callan, J.: Cast-19: A dataset for conversational
information seeking. In: SIGIR (2020)
2. Elgohary, A., Peskov, D., Boyd-Graber, J.: Can you unpack that? learning to rewrite
questions-in-context. In: EMNLP (2019)
3. Fisch, A., Talmor, A., Jia, R., Seo, M., Choi, E., Chen, D.: MRQA 2019 shared
task: Evaluating generalization in reading comprehension. In: MRQA (2019)
4. Nogueira,  R.,  Cho,  K.:  Passage  re-ranking  with  bert.  arXiv  preprint
arXiv:1901.04085 (2019)
5. Vakulenko, S., Longpre, S., Tu, Z., Anantha, R.: A wrong answer or a wrong ques-
tion? an intricate relationship between question reformulation and answer selection
in conversational question answering. In: SCAI (2020)
6. Vakulenko, S., Longpre , S., Tu, Z., Anantha, R.: Question rewriting for conversa-
tional question answering. In: WSDM (2021)
7. Vakulenko, S., Voskarides, N., Tu, Z., Longpre, S.: A comparison of question rewrit-
ing methods for conversational passage retrieval. In: ECIR (2021)
8. Voskarides, N., Li, D., Ren, P., Kanoulas, E., de Rijke, M.: Query resolution for
conversational search with limited supervision. In: SIGIR (2020)

  • 0
    点赞
  • 1
    收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

anyanyanyway

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值