BERT/Transformer/迁移学习NLP资源大列表

点击上方,选择星标置顶,每天给你送干货

阅读大概需要5分钟

跟随小博主,每天进步一丢丢

整理:专知

编辑:zenRRan

【导读】cedrickchee维护这个项目包含用于自然语言处理(NLP)的大型机器(深度)学习资源,重点关注转换器(BERT)的双向编码器表示、注意机制、转换器架构/网络和NLP中的传输学习。

https://github.com/cedrickchee/awesome-bert-nlp

Papers

  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.

  • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai, Zhilin Yang, Yiming Yang, William W. Cohen, Jaime Carbonell, Quoc V. Le and Ruslan Salakhutdinov.

Uses smart caching to improve the learning of long-term dependency in Transformer. Key results: state-of-art on 5 language modeling benchmarks, including ppl of 21.8 on One Billion Word (LM1B) and 0.99 on enwiki8. The authors claim that the method is more flexible, faster during evaluation (1874 times speedup), generalizes well on small datasets, and is effective at modeling short and long sequences.

  • Conditional BERT Contextual Augmentation by Xing Wu, Shangwen Lv, Liangjun Zang, Jizhong Han and Songlin Hu.

  • SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering by Chenguang Zhu, Michael Zeng and Xuedong Huang.

  • Language Models are Unsupervised Multitask Learners by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever.

  • The Evolved Transformer by David R. So, Chen Liang and Quoc V. Le.

They used architecture search to improve Transformer architecture. Key is to use evolution and seed initial population with Transformer itself. The architecture is better and more efficient, especially for small size models.

Articles

BERT and Transformer

  • Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing from Google AI.

  • The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning).

  • Dissecting BERT by Miguel Romero and Francisco Ingham - Understand BERT in depth with an intuitive, straightforward explanation of the relevant concepts.

  • A Light Introduction to Transformer-XL.

  • Generalized Language Models by Lilian Weng, Research Scientist at OpenAI.

Attention Concept

  • The Annotated Transformer by Harvard NLP Group - Further reading to understand the "Attention is all you need" paper.

  • Attention? Attention! - Attention guide by Lilian Weng from OpenAI.

  • Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) by Jay Alammar, an Instructor from Udacity ML Engineer Nanodegree.

Transformer Architecture

  • The Transformer blog post.

  • The Illustrated Transformer by Jay Alammar, an Instructor from Udacity ML Engineer Nanodegree.

  • Watch Łukasz Kaiser’s talk walking through the model and its details.

  • Transformer-XL: Unleashing the Potential of Attention Models by Google Brain.

  • Generative Modeling with Sparse Transformers by OpenAI - an algorithmic improvement of the attention mechanism to extract patterns from sequences 30x longer than possible previously.

OpenAI Generative Pre-Training Transformer (GPT) and GPT-2

  • Better Language Models and Their Implications.

  • Improving Language Understanding with Unsupervised Learning - this is an overview of the original GPT model.

  • How to build a State-of-the-Art Conversational AI with Transfer Learning by Hugging Face.

Additional Reading

  • How to Build OpenAI's GPT-2: "The AI That's Too Dangerous to Release".

  • OpenAI’s GPT2 - Food to Media hype or Wake Up Call?

Official Implementations

  • google-research/bert - TensorFlow code and pre-trained models for BERT.

Other Implementations

PyTorch

  • huggingface/pytorch-pretrained-BERT - A PyTorch implementation of Google AI's BERT model with script to load Google's pre-trained models by Hugging Face.

  • codertimo/BERT-pytorch - Google AI 2018 BERT pytorch implementation.

  • innodatalabs/tbert - PyTorch port of BERT ML model.

  • kimiyoung/transformer-xl - Code repository associated with the Transformer-XL paper.

  • dreamgonfly/BERT-pytorch - PyTorch implementation of BERT in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".

  • dhlee347/pytorchic-bert - Pytorch implementation of Google BERT

Keras

  • Separius/BERT-keras - Keras implementation of BERT with pre-trained weights.

  • CyberZHG/keras-bert - Implementation of BERT that could load official pre-trained models for feature extraction and prediction.

TensorFlow

  • guotong1988/BERT-tensorflow - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

  • kimiyoung/transformer-xl - Code repository associated with the Transformer-XL paper.

Chainer

  • soskek/bert-chainer - Chainer implementation of "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".

编辑不易,还望给个好看!

今天留言内容为:

【day n】今天我学到了什么或者今天打算学什么。

(至少10个字,越详细越好)

督促自己,每天进步一丢丢!

推荐阅读:

一大批历史精彩文章啦

详解Transition-based Dependency parser基于转移的依存句法解析器

干货 | 找工作的经验总结(一)

经验 | 初入NLP领域的一些小建议

学术 | 如何写一篇合格的NLP论文

干货 | 那些高产的学者都是怎样工作的?

是时候研读一波导师的论文--一个简单有效的联合模型

近年来NLP在法律领域的相关研究工作


评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值