KBQA 常用数据集之 ComplexWebQuestions

最新推荐文章于 2024-12-16 10:22:02 发布

Toady 元气满满

最新推荐文章于 2024-12-16 10:22:02 发布

阅读量3.1k

点赞数 1

分类专栏： KBQA常用数据集文章标签： nlp 知识图谱自然语言处理

本文链接：https://blog.csdn.net/lft_happiness/article/details/122909370

版权

KBQA常用数据集专栏收录该内容

9 篇文章

订阅专栏

本文概述了ComplexWebQuestions数据集，包括其内容介绍、数据统计，并详细比较了近年来在回答复杂知识库问题上涌现的多种模型，如TextRay、PullNet等，展示了它们在准确率、精度等方面的性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. 论文相关

ComplexWebQuestions [Talmor and Berant 2018b]

源自论文：The Web as a Knowledge-base for Answering Complex Questions

数据集：https://www.dropbox.com/sh/7pkwkrfnwqhsnpo/AACuu4v3YNkhirzBOeeaHYala

Learboard: Leaderboard | tau-nlp

2. 数据集概述

2.1 内容介绍

CWQ(ComplexWebQuestions)涉及到的知识库是Freebase。该数据集中包含Question 文件和Web Snippet 文件。

其中，Question files 主要有以下字段：

ID	The unique ID of the example
webqsp_ID	The original WebQuestionsSP ID from which the question was constructed
websq_question	The WebQuestionsSP Question from which the question was constructed
machine_question	The artificial complex question, before paraphrasing
question	The natural language complex question
sparql	Freebase SPARQL query for the question. Note that the SPARQL was constructed for the machine question, the actual question after paraphrasing may differ from the SPARQL.
compositionality_type	An estimation of the type of compositionally. {composition, conjunction, comparative, superlative}. The estimation has not been manually verified, the question after paraphrasing may differ from this estimation
answers	a list of answers each containing answer: the actual answer; answer_id: the Freebase answer id; aliases: freebase extracted aliases for the answer
created	creation time

Web Snippet Files 中有以下字段：

question_ID	the ID of related question, containing at least 3 instances of the same ID (full question, split1, split2)
question	The natural language complex question
web_query	Query sent to the search engine
split_source	'noisy supervision split' or ‘ptrnet split’, please train on examples containing “ptrnet split” when comparing to Split+Decomp from https://arxiv.org/abs/1807.09623
split_type	'full_question' or ‘split_part1' or ‘split_part2’ please use ‘composition_answer’ in question of type composition and split_type: “split_part1” when training a reading comprehension model on splits as in Split+Decomp from https://arxiv.org/abs/1807.09623 (in the rest of the cases use the original answer).
web_snippets	~100 web snippets per query. Each snippet includes Title,Snippet.

2.2数据统计

Question Files 数据集划分
类别	数量
Train	27,734
Dev	3,480
Test	3,475
Total	34,689

Web Snippet Files 数据集划分
train set snippets	10,035,571
dev set snippets	1,350,950
test set snippets	1,339,468

3. 模型性能比较

各模型在ComplexWebQuestions上的表现
模型(年份)	Accuracy	Precision	Hit@1	F1	论文	代码链接
TextRay (2019)		40.83		33.87	Learning to Answer Complex Questions over Knowledge Bases with Query Composition	GitHub - umich-dbgroup/TextRay-Release at master
PullNet (2019)			47.2		PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text
FullModel (2019)		39.3	36.5		Knowledge Base Question Answering with Topic Units
HSP (2019)	66.18				Complex Question Decomposition for Semantic Parsing	https://github.com/cairohy/hsp
QGG (2020)		44.1		40.4	Query Graph Generation for Answering Multi-hop Complex Questions from Knowledge Bases	GitHub - lanyunshi/Multi-hopComplexKBQA
SPARQA (2020)		31.57			SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases	GitHub - nju-websoft/SPARQA: SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases, AAAI 2020
MULTIQUE (2020)		41.23		34.62	Answering Complex Questions by Combining Information from Curated and Extracted Knowledge Bases
Rigel-intersect (2021)			48.7		Expanding End-to-End Question Answering on Differentiable Knowledge Graphs with Intersection
TransferNet (2021)			48.6		TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph	GitHub - shijx12/TransferNet: Pytorch implementation of EMNLP 2021 paper "TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph "
NSM(2021)			47.6		Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals	https://github.com/RichardHGL/WSDM2021_NSM
BERT-Large (2021)			66.4	68.2	Unseen Entity Handling in Complex Question Answering over Knowledge Base via Language Generation
shrink KB (2021)				46.2	Improving Query Graph Generation for Complex Question Answering over Knowledge Base