fer2013数据集介绍_介绍一些数据集

本文介绍了两个重要的问答数据集——MS MARCO和SQuAD 2.0。MS MARCO是Microsoft的机器阅读理解数据集,包含排序和三元组形式的问答数据。SQuAD 2.0则在原有基础上增加了对抗性问题,用于提升模型的问答能力。此外,还提及了openKP和SearchQA等相关数据集。
摘要由CSDN通过智能技术生成

持续更新


MS MARCO

全称是 Microsoft MAchine Reading Comprehension Dataset

这是一系列数据集的合

microsoft/MSMARCO-Question-Answering​github.com
0f1a04bdeb4cdb29192fd16e949b0a72.png

问答数据集:就是问答的数据集、jsonl格式如下、注意其中有一部分是人生成的、而大部分是 span based的

{
	"answers":["A corporation is a company or group of people authorized to act as a single entity and recognized as such in law."],
	"passages":[
		{
			"is_selected":0,
			"url":"http://www.wisegeek.com/what-is-a-corporation.htm",
			"passage_text":"A company is incorporated in a specific nation, often within the bounds of a smaller subset of that nation, such as a state or province. The corporation is then governed by the laws of incorporation in that state. A corporation may issue stock, either private or public, or may be classified as a non-stock corporation. If stock is issued, the corporation will usually be governed by its shareholders, either directly or indirectly."},
		...
		}],
	"query":". what is a corporation?",
	"query_id":1102432,
	"query_type":"DESCRIPTION",
	"wellFormedAnswers":"[]"
}
microsoft/MSMARCO-Passage-Ranking​github.com
0f1a04bdeb4cdb29192fd16e949b0a72.png

这个是排序的、主要得到与问题相关的自然段、可以认为是上一个QA的上游任务。

数据集有排序的、有三元组的:一个相关自然段一个不相关自然段二选一

SQuAD 2.0

由十万个问题增加了一些对抗的

openKP

microsoft/OpenKP​github.com
0f1a04bdeb4cdb29192fd16e949b0a72.png

SearchQA

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值