KBQA 常用数据集之 ComplexWebQuestions

目录

1. 论文相关

2. 数据集概述

      2.1 内容介绍

       2.2 数据统计

3. 模型性能比较


1. 论文相关

      ComplexWebQuestions [Talmor and Berant 2018b]

      源自论文:The Web as a Knowledge-base for Answering Complex Questions

      数据集:https://www.dropbox.com/sh/7pkwkrfnwqhsnpo/AACuu4v3YNkhirzBOeeaHYala

      Learboard: Leaderboard | tau-nlp

2. 数据集概述

    2.1 内容介绍

      CWQ(ComplexWebQuestions)涉及到的知识库是Freebase。该数据集中包含Question 文件和Web Snippet 文件。

      其中,Question files 主要有以下字段:

IDThe unique ID of the example
webqsp_IDThe original WebQuestionsSP ID from which the question was constructed
websq_questionThe WebQuestionsSP Question from which the question was constructed
machine_questionThe artificial complex question, before paraphrasing
question The natural language complex question
sparqlFreebase SPARQL query for the question. Note that the SPARQL was constructed for the machine question, the actual question after paraphrasing may differ from the SPARQL. 
compositionality_typeAn estimation of the type of compositionally. {composition, conjunction, comparative, superlative}. The estimation has not been manually verified,  the question after paraphrasing may differ from this estimation
answersa list of answers each containing answer: the actual answer; answer_id: the Freebase answer id; aliases: freebase extracted aliases for the answer
createdcreation time

       Web Snippet Files 中有以下字段:

question_IDthe ID of related question, containing at least 3 instances of the same ID (full question, split1, split2)
questionThe natural language complex question
web_queryQuery sent to the search engine
split_source'noisy supervision split' or ‘ptrnet split’, please train on examples containing “ptrnet split” when comparing to Split+Decomp  from https://arxiv.org/abs/1807.09623
split_type'full_question' or ‘split_part1' or ‘split_part2’ please use ‘composition_answer’ in question of type composition and split_type: “split_part1” when training a reading comprehension model on splits as in Split+Decomp  from https://arxiv.org/abs/1807.09623 (in the rest of the cases use the original answer).
web_snippets~100 web snippets per query. Each snippet includes Title,Snippet. 

    2.2数据统计

Question Files 数据集划分
类别数量
Train

27,734

Dev3,480
Test3,475
Total34,689
Web Snippet Files 数据集划分
train set snippets10,035,571
dev set snippets1,350,950
test set snippets1,339,468

3. 模型性能比较

各模型在ComplexWebQuestions上的表现
模型(年份)AccuracyPrecisionHit@1F1论文代码链接

TextRay

(2019)

40.8333.87
Learning to Answer Complex Questions over Knowledge Bases with Query Composition
GitHub - umich-dbgroup/TextRay-Release at master

PullNet

(2019)

47.2PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text

FullModel

(2019)

39.336.5Knowledge Base Question Answering with Topic Units

HSP

(2019)

66.18Complex Question Decomposition for Semantic Parsinghttps://github.com/cairohy/hsp

QGG

(2020)

44.140.4Query Graph Generation for Answering Multi-hop Complex Questions from Knowledge BasesGitHub - lanyunshi/Multi-hopComplexKBQA

SPARQA

(2020)

31.57SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge BasesGitHub - nju-websoft/SPARQA: SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases, AAAI 2020

MULTIQUE

(2020)

41.2334.62Answering Complex Questions by Combining Information from Curated and Extracted Knowledge Bases

Rigel-intersect

(2021)

48.7Expanding End-to-End Question Answering on Differentiable Knowledge Graphs with Intersection

TransferNet

(2021)

48.6TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation GraphGitHub - shijx12/TransferNet: Pytorch implementation of EMNLP 2021 paper "TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph "
NSM(2021)47.6Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signalshttps://github.com/​​​​RichardHGL/WSDM2021_NSM

BERT-Large

(2021)

66.468.2Unseen Entity Handling in Complex Question Answering over Knowledge Base via Language Generation

shrink KB

(2021)

46.2Improving Query Graph Generation for Complex Question Answering over Knowledge Base

 

内容将持续更新,欢迎大家评论补充~

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
基于知识图谱的问答系统(KBQA)是一种利用知识图谱来回答用户提出的自然语言问题的技术。知识图谱是一种用于表达和存储实体、属性和实体之间关系的数据结构,能够将海量的结构化和半结构化数据进行组织和展示。KBQA系统通过将用户的问题解析为知识图谱中的查询操作,从而能够高效地从知识图谱中获取和推理出正确的答案。 在KBQA系统的实现过程中,首先需要将海量的结构化和半结构化数据转化为知识图谱的表示形式。这可以通过自动抽取和构建知识库来实现,也可以借助人工标注和知识工程师的专业知识来构建。知识图谱的构建通常包括实体识别、属性提取和关系抽取等步骤。 当用户提出一个问题时,KBQA系统会首先对问题进行语义解析,提取出问题的关键词和实体信息。然后,系统根据问题的语义和知识图谱的结构,利用查询语言(如SPARQL)将问题转化为对知识图谱的查询操作。通过在知识图谱中查找符合查询条件的实体和关系,系统能够找到与用户问题相关的知识,并生成相应的答案。 KBQA系统的优势在于能够从结构化的知识图谱中获取精确和全面的答案,同时还能够支持复杂的查询操作和多跳推理。它可以广泛应用于各个领域,如智能搜索、智能助理和智能客服等。然而,知识图谱的构建和维护需要耗费大量的人力和时间,并且对领域知识的要求较高,这是KBQA系统应用中的挑战之一。未来,随着知识图谱技术的不断发展和完善,KBQA系统有望在实际应用中的效果得到进一步提升。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值