常见的32项NLP任务以及对应的评测数据、评测指标、目前的SOTA结果以及对应的Paper

任务

描述

corpus/dataset

评价指标

SOTA

结果

Papers

Chunking

组块分析

Penn Treebank

F1

95.77

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Common sense reasoning

常识推理

Event2Mind

cross-entropy

4.22

Event2Mind: Commonsense Inference on Events, Intents, and Reactions

Parsing

句法分析

Penn Treebank

F1

95.13

Constituency Parsing with a Self-Attentive Encoder

Coreference resolution

指代消解

CoNLL 2012

average F1

73

Higher-order Coreference Resolution with Coarse-to-fine Inference

Dependency parsing

依存句法分析

Penn Treebank

POS

UAS

LAS

97.3

95.44

93.76

Deep Biaffine Attention for Neural Dependency Parsing

Task-Oriented Dialogue/Intent Detection

任务型对话/意图识别

ATIS/Snips

accuracy

94.1   97.0

Slot-Gated Modeling for Joint Slot Filling and Intent Prediction

Task-Oriented Dialogue/Slot Filling

任务型对话/槽填充

ATIS/Snips

F1

95.2

88.8

Slot-Gated Modeling for Joint Slot Filling and Intent Prediction

Task-Oriented Dialogue/Dialogue State Tracking

任务型对话/状态追踪

DSTC2

Area

Food

Price

Joint

90

84

92

72

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems

Domain adaptation

领域适配

Multi-Domain Sentiment Dataset

average accuracy

79.15

Strong Baselines for Neural Semi-supervised Learning under Domain Shift

Entity Linking

实体链接

AIDA CoNLL-YAGO

Micro-F1-strong

Macro-F1-strong

86.6 

89.4

End-to-End Neural Entity Linking

Information Extraction

信息抽取

ReVerb45K

Precision

Recall

F1

62.7

84.4

81.9

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Grammatical Error Correction

语法错误纠正

JFLEG

GLEU

61.5

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

Language modeling

语言模型

Penn Treebank

Validation perplexity         

Test perplexity

48.33

47.69

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Lexical Normalization

词汇规范化

LexNorm2015

F1

Precision

Recall

86.39 93.53 80.26

MoNoise: Modeling Noise Using a Modular Normalization System

Machine translation

机器翻译

WMT 2014 EN-DE

BLEU

35.0

Understanding Back-Translation at Scale

Multimodal Emotion Recognition

多模态情感识别

IEMOCAP

Accuracy

76.5

Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling

Multimodal Metaphor Recognition

多模态隐喻识别

verb-noun pairs adjective-noun pairs

F1

0.75

0.79

Black Holes and White Rabbits: Metaphor Identification with Visual Features

Multimodal Sentiment Analysis

多模态情感分析

MOSI

Accuracy

80.3

Context-Dependent Sentiment Analysis in User-Generated Videos

Named entity recognition

命名实体识别

CoNLL 2003

F1

93.09

Contextual String Embeddings for Sequence Labeling

Natural language inference

自然语言推理

SciTail

Accuracy

88.3

Improving Language Understanding by Generative Pre-Training

Part-of-speech tagging

词性标注

Penn Treebank

Accuracy

97.96

Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings

Question answering

问答

CliCR

F1

33.9

CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension

Word segmentation

分词

VLSP 2013

F1

97.90

A Fast and Accurate Vietnamese Word Segmenter

Word Sense Disambiguation

词义消歧

SemEval 2015

F1

67.1

Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison

Text classification

文本分类

AG News

Error rate

5.01

Universal Language Model Fine-tuning for Text Classification

Summarization

摘要

Gigaword

ROUGE-1

ROUGE-2

ROUGE-L

37.04

19.03

34.46

Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization

Sentiment analysis

情感分析

IMDb

Accuracy

95.4

Universal Language Model Fine-tuning for Text Classification

Semantic role labeling

语义角色标注

OntoNotes

F1

85.5

Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling

Semantic parsing

语义解析

LDC2014T12

F1 Newswire

F1 Full

0.71

0.66

AMR Parsing with an Incremental Joint Model

Semantic textual similarity

语义文本相似度

SentEval

MRPC

SICK-R

SICK-E

STS

78.6/84.4

0.888

87.8

78.9/78.6

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Relationship Extraction

关系抽取

New York Times Corpus

P@10%

P@30%

73.6

59.5

RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information

Relation Prediction

关系预测

WN18RR

H@10

H@1

MRR

59.02

45.37

49.83

Predicting Semantic Relations using Global Graph Properties

©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页