1.Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.[PDF](https://arxiv.org/pdf/1810.04805)
2.BERT 论文逐段精读.【B站】https://www.bilibili.com/video/BV1PL411M7eQ/
3.BERT 论文逐段精读--对应笔记.【B站】https://www.bilibili.com/read/cv14068934
4.NLP论文阅读:BERT.【知乎】https://zhuanlan.zhihu.com/p/449782671