NLP数据集：GLUE【CoLA（单句子分类）、SST-2（情感二分类）、MRPC、STS-B、QQP、MNLI、QNLI、RTE、WNLI】【知名模型都会在此基准上进行测试】

u013250861

已于 2022-03-25 22:15:07 修改

阅读量5.8k

点赞数 3

分类专栏：数据集文章标签：人工智能 GLUE 数据集

于 2022-03-19 10:47:07 首次发布

本文链接：https://blog.csdn.net/u013250861/article/details/123590362

版权

数据集专栏收录该内容

38 篇文章

订阅专栏

GLUE是一个用于自然语言理解的多任务基准，包含9个任务，如CoLA、SST-2、MRPC等。CoLA评估句子的语法合宜性，SST-2处理电影评论的情感分析。每个任务都有特定的评价标准，如Matthews相关系数或准确率。GLUE提供了一个平台来测试和比较NLU模型的性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

GLUE的论文为：GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

GLUE的官网为：gluebenchmark.com/

本文的目的在于针对GLUE的九个任务分别做一个相对详细的说明，给出一些样例，有一个相对整体确切的感受，同时提供一个可以方便下载GLUE数据集的链接，供读者使用。

在这里插入图片描述

一、任务介绍

GLUE共有九个任务，分别是CoLA、SST-2、MRPC、STS-B、QQP、MNLI、QNLI、RTE、WNLI。如下图图2所示，可以分为三类，分别是单句任务，相似性和释义任务，

GLUE九大任务的描述和统计。所有任务都是单句或者句子对分类，除了STS-B是一个回归任务。MNLI有3个类别，所有其他分类任务都是2个类别。测试集中加粗的表示测试集中标签从未在公共论坛等场所展示过
在这里插入图片描述

1、CoLA

CoLA(The Corpus of Linguistic Acceptability，语言可接受性语料库)，单句子分类任务，语料来自语言理论的书籍和期刊，每个句子被标注为是否合乎语法的单词序列。本任务是一个二分类任务，标签共两个，分别是0和1，其中0表示不合乎语法，1表示合乎语法。

样本个数：训练集8, 551个，开发集1, 043个，测试集1, 063个。

任务：可接受程度，合乎语法与不合乎语法二分类。

评价准则：Matthews correlation coefficient。

标签为1（合乎语法）的样例：

She is proud.
she is the mother.
John thinks Mary left.
Yes, she did.
Will John not go to school?
Mary noticed John’s excessive appreciation of himself.

标签为0（不合语法）的样例：

Mary sent.
Yes, she used.
Mary wonders for Bill to come.
They are intense of Bill.
Mary thinks whether Bill will come.
Mary noticed John’s excessive appreciation of herself.

注意到，这里面的句子看起来不是很长，有些错误是性别不符，有些是缺词、少词，有些是加s不加s的情况，各种语法错误。但我也注意到，有一些看起来错误并没有那么严重，甚至在某些情况还是可以说的通的。

2、SST-2

SST-2(The Stanford Sentiment Treebank，斯坦福情感树库)，单句子分类任务，包含电影评论中的句子和它们情感的人类注释。这项任务是给定句子的情感，类别分为两类正面情感（positive，样本标签对应为1）和负面情感（negative，样本标签对应为0），并且只用句子级别的标签。也就是，本任务也是一个二分类任务，针对句子级别，分为正面和负面情感。

样本个数：训练集67, 350个，开发集873个，测试集1, 821个。

任务：情感分类，正面情感和负面情感二分类。

评价准则：accuracy。

标签为1（正面情感，positive）的样例：

two central performances
against shimmering cinematography that lends the setting the ethereal beauty of an asian landscape painting
the situation in a well-balanced fashion
a better movie
at achieving the modest , crowd-pleasing goals it sets for itself
a patient viewer

标签为0（负面情感，negative）的样例：

a transparently hypocritical work that feels as though it 's trying to set the women 's liberation movement back 20 years
so pat it makes your teeth hurt
blood work is laughable in the solemnity with which it tries to pump life into overworked elements from eastwood 's dirty harry period .
faced with the possibility that her life is meaningless , vapid and devoid of substance , in a movie that is definitely meaningless , vapid and devoid of substance
monotone
this new jangle of noise , mayhem and stupidity must be a serious contender for the title .

注意到，由于句子来源于电影评论，又有它们情感的人类注释，不同于CoLA的整体偏短，有些句子很长，有些句子很短，长短并不整齐划一。

参考资料：
GLUE基准数据集介绍及下载
 https://gluebenchmark.com/tasks