The Embedding Layer Learning for BERT

最新推荐文章于 2024-08-24 22:36:58 发布

Drscq

最新推荐文章于 2024-08-24 22:36:58 发布

阅读量309

点赞数

分类专栏： BERT 文章标签： bert 深度学习 pytorch

本文链接：https://blog.csdn.net/weixin_38396940/article/details/129624325

版权

BERT 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

该代码定义了一个PyTorch模块，用于创建Transformer模型的嵌入。模块接受词汇大小、模型维度和最大序列长度参数，包含词嵌入、位置嵌入和段嵌入三层。在前向传播方法中，结合输入序列、位置信息和段信息生成并规范化嵌入向量。

摘要由CSDN通过智能技术生成

The Implementation of Embedding layer for BERT in Pytorch

import torch
from torch import nn

class Embedding(nn.Module):
  def __init__(self, vocab_size=10, d_model=2, maxlen=512, n_segments=2):
    super(Embedding, self).__init__()
    self.tok_embed = nn.Embedding(vocab_size, d_model)
    self.pos_embed = nn.Embedding(maxlen, d_model)
    self.seg_embed = nn.Embedding(n_segments, d_model)
    self.norm = nn.LayerNorm(d_model)
  def forward(self, x, seg):
    seq_len = x.size(1)
    pos = torch.arange(seq_len, dtype=torch.long)
    pos = pos.unsqueeze(0).expand_as(x)
    embedding = self.tok_embed(x) + self.tok_embed(pos) + self.seg_embed(seg)
    return self.norm(embedding)

Test the Embedding class

batch_size = 2
seq_len = 3
x = torch.randint(0, 10, (batch_size, seq_len))
seg = torch.randint(0, 2, (batch_size, seq_len))

embed_layer = Embedding()
print(f"tok_embed: {embed_layer.tok_embed.weight.size()}")
output_embedding = embed_layer(x, seg) 
print(f"output_embedding: {output_embedding}")

Corresponding Outputs:

tok_embed: torch.Size([10, 2]) output_embedding: tensor([[[-0.9996, 
0.9996],
         [-1.0000,  1.0000],
         [ 1.0000, -1.0000]],
        [[-1.0000,  1.0000],
         [-1.0000,  1.0000],
         [-1.0000,  1.0000]]], grad_fn=<NativeLayerNormBackward0>)

The illustration for above codes

This code defines a PyTorch module for creating embeddings for a transformer model. The model takes in three parameters: vocab_size, d_model, and maxlen. The vocab_size is the size of the vocabulary, d_model is the dimensionality of the embedding, and maxlen is the maximum length of the input sequence. The module also takes in a n_segments parameter, which is the number of segments in the input sequence.

The module contains three embedding layers: tok_embed, pos_embed, and seg_embed. tok_embed is an embedding layer for the input tokens, pos_embed is an embedding layer for the position of each token in the sequence, and seg_embed is an embedding layer for the segment of each token in the sequence.

The forward method of the module takes in two inputs: x and seg. x is a tensor of shape (batch_size, seq_size) containing the input sequence, and seg is a tensor of shape (batch_size, seq_len) containing the segment of each token in the sequence. The method first generates a position embedding for each token in the sequence using the pos_embed layer, and then combines this with the token embedding and the segment embedding using element-wise addition. The resulting embeddings are then normalized using layer normalization and returned.

The illustration for ‘pos’

In the given code, pos is a tensor of shape (seq_len,) that contains the positions of tokens in the input sequence. It is generated using the torch.arange function, which creates a 1-dimensional tensor containing a sequence of integers from 0 to seq_len-1. The dtype argument specifies the data type of the tensor, which in this case is torch.long.

Next, the unsqueeze(0) method is called on pos, which adds a new dimension of size 1 at the beginning of the tensor. This is done to make the shape of pos compatible with the shape of the input tensor x, which has shape (batch_size, seq_len).

Finally, the expand_as(x) method is called on pos, which creates a new tensor by repeating the elements of pos along the first dimension (batch_size) to match the shape of x. This allows the position embedding to be added element-wise to the token embedding using broadcasting.