BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

最新推荐文章于 2023-05-04 21:00:41 发布

可姆可汗

最新推荐文章于 2023-05-04 21:00:41 发布

阅读量139

点赞数

分类专栏： NLP论文笔记文章标签：自然语言处理

本文链接：https://blog.csdn.net/qq_42890800/article/details/109521442

版权

NLP论文笔记专栏收录该内容

5 篇文章 0 订阅

订阅专栏

文章目录

Model Architecture
Input/Output Representations
Pre-training BERT
- - Task #1: Masked Language Model (MLM)
  - Task # 2: Next Sentence Prediction (NSP)
Fine-tunning BERT

论文原文

Model Architecture

a multi-layer bidirectional Transformer encoder.

Input/Output Representations

input representation = Token Embeddings + Segment Embeddings + Position Embeddings

Token Embeddings use WordPiece embeddings which convert a token into a fixed length vector.

Segment Embeddings indicates whether a token belongs to sentence A or sentence B. Because sentence pairs are packed together into a single sequence.

Position Embeddings make BERT learn the position message.

Output Representation: the final hidden vector (dimension size is H)

Pre-training BERT

Task #1: Masked Language Model (MLM)

Defination: We simply mask some percentage of the input tokens at random, and then predict those masked tokens.

In this case, the final hidden vectors corresponding to the mask tokens are fed into an output softmax over the vocabulary, as in a standard LM. Finally, it will be used to predict the original token with entropy loss.

Task # 2: Next Sentence Prediction (NSP)

We pre-train for a binarized next sentence prediction task (binary classification).

When choosing the sentences A and B for each pretraining example, 50% of the time B is the actual next sentence that follows A (labeled as IsNext), and 50% of the time it is a random sentence from the corpus (labeled as NotNext).

Fine-tunning BERT

BERT encodes a concatenated text
pair with self-attention effectively includes bidirectional cross attention between two sentences.

在这里插入图片描述

可姆可汗

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

文章目录BERTModel ArchitectureInput/Output RepresentationsPre-training BERTTask #1: Masked Language Model (MLM)Task # 2: Next Sentence Prediction (NSP)BERTModel Architecturea multi-layer bidirectional Transformer encoder.Input/Output Representationsinput
复制链接

扫一扫