持续更新收集***,更多内容详见Github
1、Bert系列
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - NAACL 2019)
- ERNIE 2.0: A Continual Pre-training Framework for Language Understanding - arXiv 2019)
- StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding - arXiv 2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach - arXiv 2019)
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations - arXiv 2019)
- Multi-Task Deep Neural Networks for Natural Language Understanding - arXiv 2019)
- What does BERT learn about the structure of language? (ACL2019)
- Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned (ACL2019) [github]
- Open Sesame: Getting Inside BERT's Linguistic Knowledge (ACL2019 WS)
- Analyzing the Structure of Attention in a Transformer Language Model (ACL2019 WS)
- What Does BERT Look At? An Analysis of BERT's Attention (ACL2019 WS)
- Do Attention Heads in BERT Track Syntactic Dependencies?
- Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains (ACL2019 WS)
- Inducing Syntactic Trees from BERT Representations (ACL2019 WS)
- A Multiscale Visualization of Attention in the Transformer Model (ACL2019 Demo)
- Visualizing and Measuring the Geometry of BERT
- How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (EMNLP2019)
- Are Sixteen Heads Really Better than One? (NeurIPS2019)
- On the Validity of Self-Attention as Explanation in Transformer Models
- Visualizing and Understanding the Effectiveness of BERT (EMNLP2019)
- Attention Interpretability Across NLP Tasks
- Revealing the Dark Secrets of BERT (EMNLP2019)
- Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs (EMNLP2019)
- The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives (EMNLP2019)
- A Primer in BERTology: What we know about how BERT works
- Do NLP Models Know Numbers? Probing Numeracy in Embeddings (EMNLP2019)
- How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations (CIKM2019)
- Whatcha lookin' at? DeepLIFTing BERT's Attention in Question Answering
- What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?
- Calibration of Pre-trained Transformers
- exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models [github]
- MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices [github]
- 最前沿的12个NLP预训练模型
- NLP预训练模型:从transformer到albert
- XLNet:运行机制及和Bert的异同比较
- Bert时代的创新(应用篇):Bert在NLP各领域的应用进展
2、Transformer系列
- Attention Is All You Need - arXiv 2017)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context - arXiv 2019)
- Universal Transformers - ICLR 2019)
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer - arXiv 2019)
- Reformer: The Efficient Transformer - ICLR 2020)
- Adaptive Attention Span in Transformers (ACL2019)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (ACL2019) [github]
- Generating Long Sequences with Sparse Transformers
- Adaptively Sparse Transformers (EMNLP2019)
- Compressive Transformers for Long-Range Sequence Modelling
- The Evolved Transformer (ICML2019)
- Reformer: The Efficient Transformer (ICLR2020) [github]
- GRET: Global Representation Enhanced Transformer (AAAI2020)
- Transformer on a Diet [github]
- Efficient Content-Based Sparse Attention with Routing Transformers
- BP-Transformer: Modelling Long-Range Context via Binary Partitioning
- Recipes for building an open-domain chatbot
- Longformer: The Long-Document Transformer
- UnifiedQA: Crossing Format Boundaries With a Single QA System [github]
- 《Attention is All You Need》浅读(简介+代码)
- 通俗易懂Transformer
- 放弃幻想,全面拥抱Transformer:自然语言处理三大特征抽取器(CNN/RNN/TF)比较