[转]NLP Tasks

最新推荐文章于 2024-05-31 08:19:46 发布

weixin_34259559

最新推荐文章于 2024-05-31 08:19:46 发布

阅读量310

点赞数

文章标签：人工智能数据库

Natural Language Processing Tasks and Selected References

I've been working on several natural language processing tasks for a long time. One day, I felt like drawing a map of the NLP field where I earn a living. I'm sure I'm not the only person who wants to see at a glance which tasks are in NLP.

I did my best to cover as many as possible tasks in NLP, but admittedly this is far from exhaustive purely due to my lack of knowledge. And selected references are biased towards recent deep learning accomplishments. I expect these serve as a starting point when you're about to dig into the task. I'll keep updating this repo myself, but what I really hope is you collaborate on this work. Don't hesitate to send me a pull request!

Oct. 13, 2017.

by Kyubyong

Reviewed and updated by YJ Choe on Oct. 18, 2017.

Anaphora Resolution

See Coreference Resolution

Automated Essay Scoring

****PAPER**** Automatic Text Scoring Using Neural Networks
****PAPER**** A Neural Approach to Automated Essay Scoring
****CHALLENGE**** Kaggle: The Hewlett Foundation: Automated Essay Scoring
****PROJECT**** EASE (Enhanced AI Scoring Engine)

Automatic Speech Recognition

****WIKI**** Speech recognition
****PAPER**** Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
****PAPER**** WaveNet: A Generative Model for Raw Audio
****PROJECT**** A TensorFlow implementation of Baidu's DeepSpeech architecture
****PROJECT**** Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition using DeepMind's WaveNet
****CHALLENGE**** The 5th CHiME Speech Separation and Recognition Challenge
****DATA**** The 5th CHiME Speech Separation and Recognition Challenge
****DATA**** CSTR VCTK Corpus
****DATA**** LibriSpeech ASR corpus
****DATA**** Switchboard-1 Telephone Speech Corpus
****DATA**** TED-LIUM Corpus
****DATA**** Open Speech and Language Resources

Automatic Summarisation

****WIKI**** Automatic summarization
****BOOK**** Automatic Text Summarization
****PAPER**** Text Summarization Using Neural Networks
****PAPER**** Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization
****DATA**** Text Analytics Conferences (TAC)
****DATA**** Document Understanding Conferences (DUC)

Coreference Resolution

****INFO**** Coreference Resolution
****PAPER**** Deep Reinforcement Learning for Mention-Ranking Coreference Models
****PAPER**** Improving Coreference Resolution by Learning Entity-Level Distributed Representations
****CHALLENGE**** CoNLL 2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes
****CHALLENGE**** CoNLL 2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes
****CHALLENGE**** SemEval 2018 Task 4: Character Identification on Multiparty Dialogues (In Progress)

Entity Linking

See Named Entity Disambiguation

Grammatical Error Correction

****PAPER**** Neural Network Translation Models for Grammatical Error Correction
****PAPER**** Adapting Sequence Models for Sentence Correction
****CHALLENGE**** CoNLL-2013 Shared Task: Grammatical Error Correction
****CHALLENGE**** CoNLL-2014 Shared Task: Grammatical Error Correction
****DATA**** NUS Non-commercial research/trial corpus license
****DATA**** Lang-8 Learner Corpora
****DATA**** Cornell Movie--Dialogs Corpus
****PROJECT**** Deep Text Corrector
****PRODUCT**** deep grammar

Grapheme To Phoneme Conversion

****PAPER**** Grapheme-to-Phoneme Models for (Almost) Any Language
****PAPER**** Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning
****PAPER**** Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion
****PROJECT**** Sequence-to-Sequence G2P toolkit
****DATA**** Multilingual Pronunciation Data

Humor and Sarcasm Detection

****PAPER**** Automatic Sarcasm Detection: A Survey
****PAPER**** Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very Personal
****PAPER**** Sarcasm Detection on Twitter: A Behavioral Modeling Approach
****CHALLENGE**** SemEval-2017 Task 6: #HashtagWars: Learning a Sense of Humor
****CHALLENGE**** SemEval-2017 Task 7: Detection and Interpretation of English Puns
****DATA**** Sarcastic comments from Reddit
****DATA**** Sarcasm Corpus V2
****DATA**** Sarcasm Amazon Reviews Corpus

Language Grounding

****WIKI**** Symbol grounding problem
****PAPER**** The Symbol Grounding Problem
****PAPER**** From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning
****PAPER**** Encoding of phonology in a recurrent neural model of grounded speech
****PAPER**** Gated-Attention Architectures for Task-Oriented Language Grounding
****PAPER**** Sound-Word2Vec: Learning Word Representations Grounded in Sounds
****COURSE**** Language Grounding to Vision and Control
****WORKSHOP**** Language Grounding for Robotics

Language Guessing

See Language Identification

Language Identification

****WIKI**** Language identification
****PAPER**** AUTOMATIC LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKS
****PAPER**** Natural Language Processing with Small Feed-Forward Networks
****CHALLENGE**** 2015 Language Recognition Evaluation

Language Modeling

****WIKI**** Language model
****TOOLKIT**** KenLM Language Model Toolkit
****PAPER**** Distributed Representations of Words and Phrases and their Compositionality
****PAPER**** Generating Sequences with Recurrent Neural Networks
****PAPER**** Character-Aware Neural Language Models
****THESIS**** Statistical Language Models Based on Neural Networks
****DATA**** Penn Treebank
****TUTORIAL**** TensorFlow Tutorial on Language Modeling with Recurrent Neural Networks

Language Recognition

See Language Identification

Lemmatisation

****WIKI**** Lemmatisation
****PAPER**** Joint Lemmatization and Morphological Tagging with LEMMING
****TOOLKIT**** WordNet Lemmatizer
****DATA**** Treebank-3

Lip-reading

****WIKI**** Lip reading
****PAPER**** Lip Reading Sentences in the Wild
****PAPER**** 3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition
****PROJECT**** Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks
****DATA**** The GRID audiovisual sentence corpus

Machine Translation

****PAPER**** Neural Machine Translation by Jointly Learning to Align and Translate
****PAPER**** Neural Machine Translation in Linear Time
****PAPER**** Attention Is All You Need
****PAPER**** Six Challenges for Neural Machine Translation
****CHALLENGE**** ACL 2014 NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION
****CHALLENGE**** EMNLP 2017 SECOND CONFERENCE ON MACHINE TRANSLATION (WMT17)
****DATA**** OpenSubtitles2016
****DATA**** WIT3: Web Inventory of Transcribed and Translated Talks
****DATA**** The QCRI Educational Domain (QED) Corpus
****PAPER**** Multi-task Sequence to Sequence Learning
****PAPER**** Unsupervised Pretraining for Sequence to Sequence Learning
****PAPER**** Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
****TOOLKIT**** Subword Neural Machine Translation with Byte Pair Encoding (BPE)
****TOOLKIT**** Multi-Way Neural Machine Translation
****TOOLKIT**** OpenNMT: Open-Source Toolkit for Neural Machine Translation

Morphological Inflection Generation

****WIKI**** Inflection
****PAPER**** Morphological Inflection Generation Using Character Sequence to Sequence Learning
****CHALLENGE**** SIGMORPHON 2016 Shared Task: Morphological Reinflection
****DATA**** sigmorphon2016

Named Entity Disambiguation

****WIKI**** Entity linking
****PAPER**** Robust and Collective Entity Disambiguation through Semantic Embeddings

Named Entity Recognition

****WIKI**** Named-entity recognition
****PAPER**** Neural Architectures for Named Entity Recognition
****PROJECT**** OSU Twitter NLP Tools
****CHALLENGE**** Named Entity Recognition in Twitter
****CHALLENGE**** CoNLL 2002 Language-Independent Named Entity Recognition
****CHALLENGE**** Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
****DATA**** CoNLL-2002 NER corpus
****DATA**** CoNLL-2003 NER corpus
****DATA**** NUT Named Entity Recognition in Twitter Shared task
****TOOLKIT**** Stanford Named Entity Recognizer

Paraphrase Detection

****PAPER**** Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
****PROJECT**** Paralex: Paraphrase-Driven Learning for Open Question Answering
****CHALLENGE**** SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter
****DATA**** Microsoft Research Paraphrase Corpus
****DATA**** Microsoft Research Video Description Corpus
****DATA**** Pascal Dataset
****DATA**** Flickr Dataset
****DATA**** The SICK data set
****DATA**** PPDB: The Paraphrase Database
****DATA**** WikiAnswers Paraphrase Corpus

Paraphrase Generation

****PAPER**** Neural Paraphrase Generation with Stacked Residual LSTM Networks
****DATA**** Neural Paraphrase Generation with Stacked Residual LSTM Networks
****CODE**** Neural Paraphrase Generation with Stacked Residual LSTM Networks
****PAPER**** A Deep Generative Framework for Paraphrase Generation
****PAPER**** Paraphrasing Revisited with Neural Machine Translation

Parsing

****WIKI**** Parsing
****TOOLKIT**** The Stanford Parser: A statistical parser
****TOOLKIT**** spaCy parser
****PAPER**** Grammar as a Foreign Language
****PAPER**** A fast and accurate dependency parser using neural networks
****PAPER**** Universal Semantic Parsing
****CHALLENGE**** CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
****CHALLENGE**** CoNLL 2016 Shared Task: Multilingual Shallow Discourse Parsing
****CHALLENGE**** CoNLL 2015 Shared Task: Shallow Discourse Parsing
****CHALLENGE**** SemEval-2016 Task 8: The meaning representations may be abstract, but this task is concrete!

Part-of-speech Tagging

****WIKI**** Part-of-speech tagging
****PAPER**** Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss
****PAPER**** Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models
****DATA**** Treebank-3
****TOOLKIT**** nltk.tag package

Pinyin-To-Chinese Conversion

****WIKI**** Pinyin input method
****PAPER**** Neural Network Language Model for Chinese Pinyin Input Method Engine
****PROJECT**** Neural Chinese Transliterator

Question Answering

****WIKI**** Question answering
****PAPER**** Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
****PAPER**** Dynamic Memory Networks for Visual and Textual Question Answering
****CHALLENGE**** TREC Question Answering Task
****CHALLENGE**** NTCIR-8: Advanced Cross-lingual Information Access (ACLIA)
****CHALLENGE**** CLEF Question Answering Track
****CHALLENGE**** SemEval-2017 Task 3: Community Question Answering
****CHALLENGE**** SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge (In Progress)
****DATA**** MS MARCO: Microsoft MAchine Reading COmprehension Dataset
****DATA**** Maluuba NewsQA
****DATA**** SQuAD: 100,000+ Questions for Machine Comprehension of Text
****DATA**** GraphQuestions: A Characteristic-rich Question Answering Dataset
****DATA**** Story Cloze Test and ROCStories Corpora
****DATA**** Microsoft Research WikiQA Corpus
****DATA**** DeepMind Q&A Dataset
****DATA**** QASent
****DATA**** Textbook Question Answering

Relationship Extraction

****WIKI**** Relationship extraction
****PAPER**** A deep learning approach for relationship extraction from interaction context in social manufacturing paradigm
****CHALLENGE**** SemEval-2018 task 7 Semantic Relation Extraction and Classification in Scientific Papers (In Progress)

Semantic Role Labeling

****WIKI**** Semantic role labeling
****BOOK**** Semantic Role Labeling
****PAPER**** End-to-end Learning of Semantic Role Labeling Using Recurrent Neural Networks
****PAPER**** Neural Semantic Role Labeling with Dependency Path Embeddings
****PAPER**** Deep Semantic Role Labeling: What Works and What's Next
****CHALLENGE**** CoNLL-2005 Shared Task: Semantic Role Labeling
****CHALLENGE**** CoNLL-2004 Shared Task: Semantic Role Labeling
****TOOLKIT**** Illinois Semantic Role Labeler (SRL)
****DATA**** CoNLL-2005 Shared Task: Semantic Role Labeling

Sentence Boundary Disambiguation

****WIKI**** Sentence boundary disambiguation
****PAPER**** A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain
****TOOLKIT**** NLTK Tokenizers
****DATA**** The British National Corpus
****DATA**** Switchboard-1 Telephone Speech Corpus

Sentiment Analysis

****WIKI**** Sentiment analysis
****INFO**** Awesome Sentiment Analysis
****CHALLENGE**** Kaggle: UMICH SI650 - Sentiment Classification
****CHALLENGE**** SemEval-2017 Task 4: Sentiment Analysis in Twitter
****CHALLENGE**** SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
****PROJECT**** SenticNet
****PROJECT**** Stanford NLP Group Sentiment Analysis
****DATA**** Multi-Domain Sentiment Dataset (version 2.0)
****DATA**** Stanford Sentiment Treebank
****DATA**** Twitter Sentiment Corpus
****DATA**** Twitter Sentiment Analysis Training Corpus
****DATA**** AFINN: List of English words rated for valence

Singing Voice Synthesis

****PAPER**** Singing voice synthesis based on deep neural networks
****PAPER**** A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs
****PRODUCT**** VOCALOID: voice synthesis technology and software developed by Yamaha
****CHALLENGE**** Special Session Interspeech 2016 Singing synthesis challenge "Fill-in the Gap"

****WORKSHOP**** NLP+CSS: Workshops on Natural Language Processing and Computational Social Science
****TOOLKIT**** Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
****TOOLKIT**** Online Variational Bayes for Latent Dirichlet Allocation (LDA)
****GROUP**** The University of Chicago Knowledge Lab

Source Separation

****WIKI**** Source separation
****PAPER**** From Blind to Guided Audio Source Separation
****PAPER**** Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
****CHALLENGE**** Signal Separation Evaluation Campaign (SiSEC)
****CHALLENGE**** CHiME Speech Separation and Recognition Challenge

Speaker Authentication

See Speaker Verification

Speaker Diarisation

****WIKI**** Speaker diarisation
****PAPER**** DNN-based speaker clustering for speaker diarisation
****PAPER**** Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach
****PAPER**** Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
****CHALLENGE**** Rich Transcription Evaluation

Speaker Recognition

****WIKI**** Speaker recognition
****PAPER**** A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK
****PAPER**** DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION
****CHALLENGE**** NIST Speaker Recognition Evaluation (SRE)
****INFO**** Are there any suggestions for free databases for speaker recognition?

Speech Reading

See Lip-reading

Speech Recognition

See Automatic Speech Recognition

Speech Segmentation

****WIKI**** Speech_segmentation
****PAPER**** Word Segmentation by 8-Month-Olds: When Speech Cues Count More Than Statistics
****PAPER**** Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings
****PAPER**** Unsupervised Lexicon Discovery from Acoustic Input
****PAPER**** Weakly supervised spoken term discovery using cross-lingual side information
****DATA**** CALLHOME Spanish Speech

Speech Synthesis

****WIKI**** Speech synthesis
****PAPER**** Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
****PAPER**** WaveNet: A Generative Model for Raw Audio
****PAPER**** Tacotron: Towards End-to-End Speech Synthesis
****PAPER**** Deep Voice 3: 2000-Speaker Neural Text-to-Speech
****PAPER**** Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
****DATA**** The World English Bible
****DATA**** LJ Speech Dataset
****DATA**** Lessac Data
****CHALLENGE**** Blizzard Challenge 2017
****PRODUCT**** Lyrebird
****PROJECT**** The Festvox project
****TOOLKIT**** Merlin: The Neural Network (NN) based Speech Synthesis System

Speech Enhancement

****WIKI**** Speech enhancement
****BOOK**** Speech enhancement: theory and practice
****PAPER**** An Experimental Study on Speech Enhancement BasedonDeepNeuralNetwork
****PAPER**** A Regression Approach to Speech Enhancement BasedonDeepNeuralNetworks
****PAPER**** Speech Enhancement Based on Deep Denoising Autoencoder

Speech-To-Text

See Automatic Speech Recognition

Spoken Term Detection

See Speech Segmentation

Stemming

****WIKI**** Stemming
****PAPER**** A BACKPROPAGATION NEURAL NETWORK TO IMPROVE ARABIC STEMMING
****TOOLKIT**** NLTK Stemmers

Term Extraction

****WIKI**** Terminology extraction
****PAPER**** Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection

Text Similarity

****WIKI**** Semantic similarity
****PAPER**** A Survey of Text Similarity Approaches
****PAPER**** Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks
****PAPER**** Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
****CHALLENGE**** SemEval-2014 Task 3: Cross-Level Semantic Similarity
****CHALLENGE**** SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
****CHALLENGE**** SemEval-2017 Task 1: Semantic Textual Similarity
****WIKI**** Semantic Textual Similarity Wiki

Text Simplification

****WIKI**** Text simplification
****PAPER**** Aligning Sentences from Standard Wikipedia to Simple Wikipedia
****PAPER**** Problems in Current Text Simplification Research: New Data Can Help
****DATA**** Newsela Data

Text-To-Speech

See Speech Synthesis

Textual Entailment

****WIKI**** Textual entailment
****PROJECT**** Textual Entailment with TensorFlow
****PAPER**** Textual Entailment with Structured Attentions and Composition
****CHALLENGE**** SemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment
****CHALLENGE**** SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge

Transliteration

****WIKI**** Transliteration
****INFO**** Transliteration of Non-Latin scripts
****PAPER**** A Deep Learning Approach to Machine Transliteration
****CHALLENGE**** NEWS 2016 Shared Task on Transliteration of Named Entities
****PROJECT**** Neural Japanese Transliteration—can you do better than SwiftKey™ Keyboard?

Voice Conversion

****PAPER**** PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAINING
****PROJECT**** Deep neural networks for voice conversion (voice style transfer) in Tensorflow
****PROJECT**** An implementation of voice conversion system utilizing phonetic posteriorgrams
****CHALLENGE**** Voice Conversion Challenge 2016
****CHALLENGE**** Voice Conversion Challenge 2018
****DATA**** CMU_ARCTIC speech synthesis databases
****DATA**** TIMIT Acoustic-Phonetic Continuous Speech Corpus

Voice Recognition

See Speaker recognition

Word Embeddings

****WIKI**** Word embedding
****TOOLKIT**** Gensim: word2vec
****TOOLKIT**** fastText
****TOOLKIT**** GloVe: Global Vectors for Word Representation
****INFO**** Where to get a pretrained model
****PROJECT**** Pre-trained word vectors of 30+ languages
****PROJECT**** Polyglot: Distributed word representations for multilingual NLP
****CHALLENGE**** SemEval 2018 Task 10 Capturing Discriminative Attributes (In Progress)
****PAPER**** Bilingual Word Embeddings for Phrase-Based Machine Translation
****PAPER**** A Survey of Cross-Lingual Embedding Models

Word Prediction

****INFO**** What is Word Prediction?
****PAPER**** The prediction of character based on recurrent neural network language model
****PAPER**** An Embedded Deep Learning based Word Prediction
****PAPER**** Evaluating Word Prediction: Framing Keystroke Savings
****DATA**** An Embedded Deep Learning based Word Prediction
****PROJECT**** Word Prediction using Convolutional Neural Networks—can you do better than iPhone™ Keyboard?
****CHALLENGE**** SemEval-2018 Task 2, Multilingual Emoji Prediction (In Progress)

Word Segmentation

****WIKI**** Word segmentation
****PAPER**** Neural Word Segmentation Learning for Chinese
****PROJECT**** Convolutional neural network for Chinese word segmentation
****TOOLKIT**** Stanford Word Segmenter
****TOOLKIT**** NLTK Tokenizers

Word Sense Disambiguation

****DATA**** Word-sense disambiguation
****PAPER**** Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data
****DATA**** Train-O-Matic Data
****DATA**** BabelNet

— Language Models, Segmentation
— Morphological Analysis, POS Tagging and Sequence Labeling
— Syntactic and Semantic Parsing
— Lexical and Compositional Semantics
— Discourse and Coreference
— Dialogue and Interactive Systems
— Narrative Understanding and Commonsense Reasoning
— Spoken Language Processing
— Text Mining
— Sentiment Analysis and Opinion Mining
— Information Retrieval, Question Answering
— Information Extraction
— Summarization
— Natural Language Generation
— Machine Translation
— Multilinguality and Cross-linguality
— Linguistic Theories and Resources
— Computational Psycholinguistics
— Multimodal and Grounded Language Processing
— Machine Learning for NLP
— Web, Social Media and Computational Social Science
— Ethics and Fairness in NLP
— Other NLP Applications

weixin_34259559

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。