GLUE基准数据集介绍

不二当码农

已于 2022-06-19 17:11:02 修改

阅读量667

点赞数 3

文章标签：大数据 python

于 2022-06-19 17:02:40 首次发布

本文链接：https://blog.csdn.net/qq_43525676/article/details/125358377

版权

一、简介

GLUE（General Language Understanding Evaluation）由来自纽约大学、华盛顿大学等机构创建的一个多任务的自然语言理解基准和分析平台。
GLUE共有九个任务，分别是CoLA、SST-2、MRPC、STS-B、QQP、MNLI、QNLI、RTE、WNLI。如下图图2所示，可以分为三类，分别是单句任务，相似性和释义任务。
图2：GLUE九大任务的描述和统计。所有任务都是单句或者句子对分类，除了STS-B是一个回归任务。MNLI有3个类别，所有其他分类任务都是2个类别。测试集中加粗的表示测试集中标签从未在公共论坛等场所展示过

二、任务介绍

单句任务-CoLA、SST-2

2.1 CoLA

CoLA(The Corpus of Linguistic Acceptability，语言可接受性语料库)，单句子分类任务，语料来自语言理论的书籍和期刊，每个句子被标注为是否合乎语法的单词序列。本任务是一个二分类任务，标签共两个，分别是0和1，其中0表示不合乎语法，1表示合乎语法。

样本个数：训练集8, 551个，开发集1, 043个，测试集1, 063个。

任务：可接受程度，合乎语法与不合乎语法二分类。

评价准则：Matthews correlation coefficient。
CoLA下载

Data Format

Each line in the .tsv files consists of 4 tab-separated columns.
Column 1: the code representing the source of the sentence.
Column 2: the acceptability judgment label (0=unacceptable, 1=acceptable).
Column 3: the acceptability judgment as originally notated by the author.
Column 4: the sentence.

Corpus Sample

clc95	0	*	In which way is Sandy very anxious to see if the students will be able to solve the homework problem?
c-05	1		The book was written by John.
c-05	0	*	Books were sent to each other by the students.
swb04	1		She voted for herself.
swb04	1		I saw that gas can explode.

2.2 SST-2

SST-2(The Stanford Sentiment Treebank，斯坦福情感树库)，单句子分类任务，包含电影评论中的句子和它们情感的人类注释。这项任务是给定句子的情感，类别分为两类正面情感（positive，样本标签对应为1）和负面情感（negative，样本标签对应为0），并且只用句子级别的标签。也就是，本任务也是一个二分类任务，针对句子级别，分为正面和负面情感。

样本个数：训练集67, 350个，开发集873个，测试集1, 821个。

任务：情感分类，正面情感和负面情感二分类。

评价准则：accuracy。

SST下载

sentence        label
a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films 1
apparently reassembled from the cutting-room floor of any given daytime soap .  0
they presume their audience wo n't sit still for