Masked Language Modeling with Code Explaination, MLM及相关代码说明
Knowledge graph 本篇所涉及的知识点
- BERT concept and theory
- BERT application
- MLM: introduction and way to use
- NLP
- next sentence prediction (NSP)
Masked language Modeling(MLM)
关于BERT和MLM:
- BERT可以很方便地用于应用领域;
- BERT + MLM可以方便应用于特定领域及问题中;
Here I would like to introduce Masked language Modeling(MLM). Before the introduction, there are some basic ideas you need to know about BERT and MLM:
- BERT is easy to use in a general purpose of use;
- BERT with MLM can be used in specific areas and domains.
BERT + MLM 的思想在于:
在数据输入BERT训练前,使用MLM遮盖部分数据,然后让BERT填补这部分数据;MLM所遮盖的部分,可以是随机性遮盖一定比例的。
(mask some tokens before training in BERT; let BERT fill the missing part of the text)
使用BERT + MLM的过程
The whole processes:
- 文本特征化后得到三个张量 tokenize the text, after this, we will get three tensors:
- input_ids – this is what will be used as input to BERT
- token_type_ids – not necessary for MLM
- attention_mask
- 标签张量label tensors:
- calculate loss against and optimize towards
- simply input_ids – 只对这个张量进行操作
- MLM遮盖数据集 randomly mask some tokens in input_ids
- 15% of masking the tokens in pre-training model process.
- 计算损失函数 calculate loss – used for optimization the model
- input input_ids and labels in BERT
- do the calculation