目录
1. 论文相关
ComplexWebQuestions [Talmor and Berant 2018b]
源自论文:The Web as a Knowledge-base for Answering Complex Questions
数据集:https://www.dropbox.com/sh/7pkwkrfnwqhsnpo/AACuu4v3YNkhirzBOeeaHYala
Learboard: Leaderboard | tau-nlp
2. 数据集概述
2.1 内容介绍
CWQ(ComplexWebQuestions)涉及到的知识库是Freebase。该数据集中包含Question 文件和Web Snippet 文件。
其中,Question files 主要有以下字段:
ID | The unique ID of the example |
webqsp_ID | The original WebQuestionsSP ID from which the question was constructed |
websq_question | The WebQuestionsSP Question from which the question was constructed |
machine_question | The artificial complex question, before paraphrasing |
question | The natural language complex question |
sparql | Freebase SPARQL query for the question. Note that the SPARQL was constructed for the machine question, the actual question after paraphrasing may differ from the SPARQL. |
compositionality_type | An estimation of the type of compositionally. {composition, conjunction, comparative, superlative}. The estimation has not been manually verified, the question after paraphrasing may differ from this estimation |
answers | a list of answers each containing answer: the actual answer; answer_id: the Freebase answer id; aliases: freebase extracted aliases for the answer |
created | creation time |
Web Snippet Files 中有以下字段:
question_ID | the ID of related question, containing at least 3 instances of the same ID (full question, split1, split2) |
question | The natural language complex question |
web_query | Query sent to the search engine |
split_source | 'noisy supervision split' or ‘ptrnet split’, please train on examples containing “ptrnet split” when comparing to Split+Decomp from https://arxiv.org/abs/1807.09623 |
split_type | 'full_question' or ‘split_part1' or ‘split_part2’ please use ‘composition_answer’ in question of type composition and split_type: “split_part1” when training a reading comprehension model on splits as in Split+Decomp from https://arxiv.org/abs/1807.09623 (in the rest of the cases use the original answer). |
web_snippets | ~100 web snippets per query. Each snippet includes Title,Snippet. |
2.2数据统计
类别 | 数量 |
Train | 27,734 |
Dev | 3,480 |
Test | 3,475 |
Total | 34,689 |
train set snippets | 10,035,571 |
dev set snippets | 1,350,950 |
test set snippets | 1,339,468 |
3. 模型性能比较
内容将持续更新,欢迎大家评论补充~