ERNIE 3.0: LARGE-SCALE KNOWLEDGE ENHANCED PRE-TRAINING FOR LANGUAGE UNDERSTANDING AND GENERATION
Sun Y, Wang S, Feng S, et al. ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation[J]. arXiv preprint arXiv:2107.02137, 2021.
关键词:百亿参数大模型 \Transformer-XL\Knowledge graph
预训练中加入知识图谱三元组,模型基本单元从2.0的transformer换成transformer-XL
百度文心可以体验模型效果:https://wenxin.baidu.com/wenxin/ernie
1、ERNIE 3.0基本特点
(1)参数规模:10 billion
(2)引入知识图谱
- large-scale knowledge enhanced models :4TB corpus consisting of plain texts and a large-scale knowledge graph
(3) fuses auto-regressive network and auto-encoding network
- handle both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning.
(4)模型性能
- outperforms the state-of-the-art models on 54 Chinese NLP tasks
- English version achieves the first place on the SuperGLUE benchmark (July 3, 2021), surpassing the human performance by +0.8% (90.6% vs. 89.8%)
2、 ERNIE 3.0 framework
Continual Multi-Paradigms Unified Pre-training Framework
(1)Universal representation Module:通用语义表示层一旦预训练完成,就不再更新(即便在fine-tune时也不再更新)
(