BERT 采用Masked LM + Next Sentence Prediction作为pre-training tasks,基于Transformer的encoding的语言表征模型 Transformer Self-attention Multi-head Self-attention Positional Enconding 加入 e 连接任意位置信息 Training of BERT Masked LM Next Sentence Prediction How to use BERT Reference http://www/camdemy.com