===================================================================
General Language Understanding Evaluation (GLUE) benchmark
STB 回归问题,其余皆为单句子,或句子对分类问题。
MNLI是三分类,其余皆为二分类。
Ref
GLUE: A MULTI-TASK BENCHMARK AND ANALYSIS PLATFORM FOR NATURAL LANGUAGE UNDERSTANDING
https://openreview.net/pdf?id=rJ4km2R5t7
任务 https://gluebenchmark.com/tasks
排行榜 https://gluebenchmark.com/leaderboard
GLUE: 自然语言理解的标杆
https://blog.csdn.net/weixin_43269174/article/details/106382651
Ref
===================================================================
CoQA ,A Conversational Question Answering Challenge 问答系统数据集
paper https://arxiv.org/pdf/1808.07042v1.pdf
github https://stanfordnlp.github.io/coqa/
CoQA 基于对话的问答系统
https://blog.csdn.net/cindy_1102/article/details/88560048
===================================================================
SQuAD2.0,The Stanford Question Answering Dataset 阅读理解数据集
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset,
SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions
written adversarially by crowdworkers to look similar to answerable ones.
https://rajpurkar.github.io/SQuAD-explorer/
===================================================================
===================================================================