BERT and RoBERTa 知识点整理

最新推荐文章于 2024-07-01 08:53:46 发布

Jay_Tang

最新推荐文章于 2024-07-01 08:53:46 发布

阅读量1.6k

点赞数 2

分类专栏： NLP 核心推导文章标签：自然语言处理

本文链接：https://blog.csdn.net/Jay_Tang/article/details/108662688

版权

Bert (Bidirectional Encoder Representations from Transformers) uses a “masked language model” to randomly mask some tokens from the input and predict the original vocabulary id of the masked token.
Bert shows that “pre-trained representations reduce the need for many heavily-engineered task-specific architectures”.

During pre training, the model is trained on unlabeled data over different pre-training tasks.
Each down stream task has separate fine-tuned models after each is first initialized with pre-trained parameters.

In order to handle a variety of down-stream tasks, the input must be able to represent a single sentence and sentence pair in one sequence.
The first token of every sequence is always a classification token [CLS].
Sentence pairs are separated by a special token [SEP].
Learned embeddings are added to every token indication whether

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注