GLUE 基准介绍

详细介绍请阅读:GLUE基准数据集介绍及下载 - 知乎

自然语言处理(NLP)主要自然语言理解(NLU)和自然语言生成(NLG)。为了让NLU任务发挥最大的作用,来自纽约大学、华盛顿大学等机构创建了一个多任务的自然语言理解基准和分析平台,也就是GLUE(General Language Understanding Evaluation)。

GLUE包含九项NLU任务,语言均为英语。GLUE九项任务涉及到自然语言推断、文本蕴含、情感分析、语义相似等多个任务。像BERT、XLNet、RoBERTa、ERINE、T5等知名模型都会在此基准上进行测试。目前,大家要把预测结果上传到官方的网站上,官方会给出测试的结果。

GLUE的论文为:GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[1]

GLUE的官网为:https://gluebenchmark.com/

本文的目的在于针对GLUE的九个任务分别做一个相对详细的说明,给出一些样例,有一个相对整体确切的感受,同时提供一个可以方便下载GLUE数据集的链接,供读者使用。

The GLUE Benchmark is a group of nine classification tasks on sentences or pairs of sentences which are:

  • CoLA (Corpus of Linguistic Acceptability) Determine if a sentence is grammatically correct or not.is a dataset containing sentences labeled grammatically correct or not.
  • MNLI (Multi-Genre Natural Language Inference) Determine if a sentence entails, contradicts or is unrelated to a given hypothesis. (This dataset has two versions, one with the validation and test set coming from the same distribution, another called mismatched where the validation and test use out-of-domain data.)
  • MRPC (Microsoft Research Paraphrase Corpus) Determine if two sentences are paraphrases from one another or not.
  • QNLI (Question-answering Natural Language Inference) Determine if the answer to a question is in the second sentence or not. (This dataset is built from the SQuAD dataset.)
  • QQP (Quora Question Pairs2) Determine if two questions are semantically equivalent or not.
  • RTE (Recognizing Textual Entailment) Determine if a sentence entails a given hypothesis or not.
  • SST-2 (Stanford Sentiment Treebank) Determine if the sentence has a positive or negative sentiment.
  • STS-B (Semantic Textual Similarity Benchmark) Determine the similarity of two sentences with a score from 1 to 5.
  • WNLI (Winograd Natural Language Inference) Determine if a sentence with an anonymous pronoun and a sentence with this pronoun replaced are entailed or not. (This dataset is built from the Winograd Schema Challenge dataset.)

数据下载地址:

https://gluebenchmark.com/tasks

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值