RoBERTa解读

最新推荐文章于 2024-08-09 06:40:06 发布

别水贴了

最新推荐文章于 2024-08-09 06:40:06 发布

阅读量1.2k

点赞数 1

分类专栏： NLP 文章标签：人工智能深度学习神经网络自然机器学习

本文链接：https://blog.csdn.net/fengzhou_/article/details/107028041

版权

NLP 专栏收录该内容

18 篇文章 2 订阅

订阅专栏

介绍

RoBERTa作为BERT的改进版本，本文主要介绍RoBERTa和BERT的区别，论文具体见《RoBERTa: A Robustly Optimized BERT Pretraining Approach》

RoBERTa VS BERT

Our modifications are simple, they include: (1) training the model longer, with bigger batches, over more data; (2) removing the next sentence prediction objective; (3) training on longer sequences; and (4) dynamically changing the masking pattern applied to the training data.

训练数据和参数上

RoBERTa引入了更多的训练数据，除了BERT所使用的Book-Corpus和Wikipedia(16G)，增加了160G的其他数据包括(CC-NEWS, OPENWEBTEXT, STORIES), 预训练数据相比BERT增加了10倍，随之训练时间也更长。

Past work in Neural Machine Translation has shown that training with very large mini-batches can both improve optimization speed and end-task performance when the learning rate is increased appropriately

同时借鉴机器翻译，增大mini-batch的大小，提升优化速度和性能
在这里插入图片描述