任务 | 描述 | corpus/dataset | 评价指标 | SOTA 结果 | Papers |
Chunking | 组块分析 | Penn Treebank | F1 | 95.77 | A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks |
Common sense reasoning | 常识推理 | Event2Mind | cross-entropy | 4.22 | Event2Mind: Commonsense Inference on Events, Intents, and Reactions |
Parsing | 句法分析 | Penn Treebank | F1 | 95.13 | Constituency Parsing with a Self-Attentive Encoder |
Coreference resolution | 指代消解 | CoNLL 2012 | average F1 | 73 | Higher-order Coreference Resolution with Coarse-to-fine Inference |
Dependency parsing | 依存句法分析 | Penn Treebank | POS UAS LAS | 97.3 95.44 93.76 | Deep Biaffine Attention for Neural Dependency Parsing |
Task-Oriented Dialogue/Intent Detection | 任务型对话/意图识别 | ATIS/Snips | accuracy | 94.1 97.0 | Slot-Gated Modeling for Joint Slot Filling and Intent Prediction |
Task-Oriented Dialogue/Slot Filling | 任务型对话/槽填充 | ATIS/Snips | F1 | 95.2 88.8 | Slot-Gated Modeling for Joint Slot Filling and Intent Prediction |
Task-Oriented Dialogue/Dialogue State Tracking | 任务型对话/状态追踪 | DSTC2 | Area Food Price Joint | 90 84 92 72 | Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems |
Domain adaptation | 领域适配 | Multi-Domain Sentiment Dataset | average accuracy | 79.15 | Strong Baselines for Neural Semi-supervised Learning under Domain Shift |
Entity Linking | 实体链接 | AIDA CoNLL-YAGO | Micro-F1-strong Macro-F1-strong | 86.6 89.4 | End-to-End Neural Entity Linking |
Information Extraction | 信息抽取 | ReVerb45K | Precision Recall F1 | 62.7 84.4 81.9 | CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information |
Grammatical Error Correction | 语法错误纠正 | JFLEG | GLEU | 61.5 | Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation |
Language modeling | 语言模型 | Penn Treebank | Validation perplexity Test perplexity | 48.33 47.69 | Breaking the Softmax Bottleneck: A High-Rank RNN Language Model |
Lexical Normalization | 词汇规范化 | LexNorm2015 | F1 Precision Recall | 86.39 93.53 80.26 | MoNoise: Modeling Noise Using a Modular Normalization System |
Machine translation | 机器翻译 | WMT 2014 EN-DE | BLEU | 35.0 | Understanding Back-Translation at Scale |
Multimodal Emotion Recognition | 多模态情感识别 | IEMOCAP | Accuracy | 76.5 | Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling |
Multimodal Metaphor Recognition | 多模态隐喻识别 | verb-noun pairs adjective-noun pairs | F1 | 0.75 0.79 | Black Holes and White Rabbits: Metaphor Identification with Visual Features |
Multimodal Sentiment Analysis | 多模态情感分析 | MOSI | Accuracy | 80.3 | Context-Dependent Sentiment Analysis in User-Generated Videos |
Named entity recognition | 命名实体识别 | CoNLL 2003 | F1 | 93.09 | Contextual String Embeddings for Sequence Labeling |
Natural language inference | 自然语言推理 | SciTail | Accuracy | 88.3 | Improving Language Understanding by Generative Pre-Training |
Part-of-speech tagging | 词性标注 | Penn Treebank | Accuracy | 97.96 | Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings |
Question answering | 问答 | CliCR | F1 | 33.9 | CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension |
Word segmentation | 分词 | VLSP 2013 | F1 | 97.90 | A Fast and Accurate Vietnamese Word Segmenter |
Word Sense Disambiguation | 词义消歧 | SemEval 2015 | F1 | 67.1 | Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison |
Text classification | 文本分类 | AG News | Error rate | 5.01 | Universal Language Model Fine-tuning for Text Classification |
Summarization | 摘要 | Gigaword | ROUGE-1 ROUGE-2 ROUGE-L | 37.04 19.03 34.46 | Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization |
Sentiment analysis | 情感分析 | IMDb | Accuracy | 95.4 | Universal Language Model Fine-tuning for Text Classification |
Semantic role labeling | 语义角色标注 | OntoNotes | F1 | 85.5 | Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling |
Semantic parsing | 语义解析 | LDC2014T12 | F1 Newswire F1 Full | 0.71 0.66 | AMR Parsing with an Incremental Joint Model |
Semantic textual similarity | 语义文本相似度 | SentEval | MRPC SICK-R SICK-E STS | 78.6/84.4 0.888 87.8 78.9/78.6 | Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning |
Relationship Extraction | 关系抽取 | New York Times Corpus | P@10% P@30% | 73.6 59.5 | RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information |
Relation Prediction | 关系预测 | WN18RR | H@10 H@1 MRR | 59.02 45.37 49.83 | Predicting Semantic Relations using Global Graph Properties |