一、简介
GLUE(General Language Understanding Evaluation)由来自纽约大学、华盛顿大学等机构创建的一个多任务的自然语言理解基准和分析平台。
GLUE共有九个任务,分别是CoLA、SST-2、MRPC、STS-B、QQP、MNLI、QNLI、RTE、WNLI。如下图图2所示,可以分为三类,分别是单句任务,相似性和释义任务。
二、任务介绍
单句任务-CoLA、SST-2
2.1 CoLA
CoLA(The Corpus of Linguistic Acceptability,语言可接受性语料库),单句子分类任务,语料来自语言理论的书籍和期刊,每个句子被标注为是否合乎语法的单词序列。本任务是一个二分类任务,标签共两个,分别是0和1,其中0表示不合乎语法,1表示合乎语法。
样本个数:训练集8, 551个,开发集1, 043个,测试集1, 063个。
任务:可接受程度,合乎语法与不合乎语法二分类。
评价准则:Matthews correlation coefficient。
CoLA下载
Data Format
Each line in the .tsv files consists of 4 tab-separated columns.
Column 1: the code representing the source of the sentence.
Column 2: the acceptability judgment label (0=unacceptable, 1=acceptable).
Column 3: the acceptability judgment as originally notated by the author.
Column 4: the sentence.
Corpus Sample
clc95 0 * In which way is Sandy very anxious to see if the students will be able to solve the homework problem?
c-05 1 The book was written by John.
c-05 0 * Books were sent to each other by the students.
swb04 1 She voted for herself.
swb04 1 I saw that gas can explode.
2.2 SST-2
SST-2(The Stanford Sentiment Treebank,斯坦福情感树库),单句子分类任务,包含电影评论中的句子和它们情感的人类注释。这项任务是给定句子的情感,类别分为两类正面情感(positive,样本标签对应为1)和负面情感(negative,样本标签对应为0),并且只用句子级别的标签。也就是,本任务也是一个二分类任务,针对句子级别,分为正面和负面情感。
样本个数:训练集67, 350个,开发集873个,测试集1, 821个。
任务:情感分类,正面情感和负面情感二分类。
评价准则:accuracy。
sentence label
a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films 1
apparently reassembled from the cutting-room floor of any given daytime soap . 0
they presume their audience wo n't sit still for