简单记录Bert

最新推荐文章于 2024-07-22 09:29:50 发布

xiaoyue_666

最新推荐文章于 2024-07-22 09:29:50 发布

阅读量71

点赞数

分类专栏： NLP基础文章标签： nlp

本文链接：https://blog.csdn.net/tan_1999/article/details/117572290

版权

NLP基础专栏收录该内容

3 篇文章 1 订阅

订阅专栏

读到一篇论文中对Bert的一点简单介绍，感觉比较便于加深理解，在此mark一下。

BERT [6] is a transformer [49] model that learns textual representations by conditioning on both left and right context for all layers. BERT was pre-trained for two different tasks, MLM and NSP. For MLM, 15% of the tokens are replaced with a [MASK] token, and the model is trained to predict the masked tokens). For NSP, the model is trained to distinguish (binary classification) between pairs of sentences A and B, where 50% of the time B is the next and 50% it is not the next
sentence (a random sentence is selected). The special token [CLS] is added to every sentence during pre-training; it is used for classification tasks. [SEP] is another special token that is used to separate sentence pairs that are packed together into a single sequence. Additionally, there is a special learned embedding which indicates whether each token comes from sentence A or B. BERT was pre-trained using both English Wikipedia (2.5m words) and the BookCorpus [63],

xiaoyue_666

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
简单记录Bert

读到一篇论文中对Bert的一点简单介绍，感觉比较便于加深理解，在此mark一下。BERT [6] is a transformer [49] model that learns textual representations by conditioning on both left and right context for all layers. BERT was pre-trained for two different tasks, MLM and NSP. For MLM, 15% of the
复制链接

扫一扫

专栏目录