期刊:arxiv 2019 年 7 月
类型:预训练语言模型
特点:对BERT模型进行进一步的探索,包括超参数设置,以及每一个任务对整体效果的贡献
本文带来Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du等人共同创作的文章
简介
想法来源
-
BERT模型存在缺点,有可改进的点
We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it.
-
通过实验的SOTA证明,该想法可行
These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements.
数据集的选择
预训练数据集:
BOOKCORPUS
CC-NEWS
OPENWEBTEXT
STORIES
验证数据集:
GLUE
SQuAD
RACE
创新点
- 细节性
-
动态掩