The Embedding Layer Learning for BERT

该代码定义了一个PyTorch模块,用于创建Transformer模型的嵌入。模块接受词汇大小、模型维度和最大序列长度参数,包含词嵌入、位置嵌入和段嵌入三层。在前向传播方法中,结合输入序列、位置信息和段信息生成并规范化嵌入向量。
摘要由CSDN通过智能技术生成

The Implementation of Embedding layer for BERT in Pytorch

import torch
from torch import nn

class Embedding(nn.Module):
  def __init__(self, vocab_size=10, d_model=2, maxlen=512, n_segments=2):
    super(Embedding, self).__init__()
    self.tok_embed = nn.Embedding(vocab_size, d_model)
    self.pos_embed = nn.Embedding(maxlen, d_model)
    self.seg_embed = nn.Embedding(n_segments, d_model)
    self.norm = nn.LayerNorm(d_model)
  def forward(self, x, seg):
    seq_len = x.size(1)
    pos = torch.arange(seq_len, dtype=torch.long)
    pos = pos.unsqueeze(0).expand_as(x)
    embedding = self.tok_embed(x) + self.tok_embed(pos) + self.seg_embed(seg)
    return self.norm(embedding)

Test the Embedding class

batch_size = 2
seq_len = 3
x = torch.randint(0, 10, (batch_size, seq_len))
seg = torch.randint(0, 2, (batch_size, seq_len))

embed_layer = Embedding()
print(f"tok_embed: {embed_layer.tok_embed.weight.size()}")
output_embedding = embed_layer(x, seg) 
print(f"output_embedding: {output_embedding}")

Corresponding Outputs:

tok_embed: torch.Size([10, 2]) output_embedding: tensor([[[-0.9996, 
0.9996],
         [-1.0000,  1.0000],
         [ 1.0000, -1.0000]],
        [[-1.0000,  1.0000],
         [-1.0000,  1.0000],
         [-1.0000,  1.0000]]], grad_fn=<NativeLayerNormBackward0>)

The illustration for above codes

This code defines a PyTorch module for creating embeddings for a transformer model. The model takes in three parameters: vocab_size, d_model, and maxlen. The vocab_size is the size of the vocabulary, d_model is the dimensionality of the embedding, and maxlen is the maximum length of the input sequence. The module also takes in a n_segments parameter, which is the number of segments in the input sequence.


The module contains three embedding layers: tok_embed, pos_embed, and seg_embed. tok_embed is an embedding layer for the input tokens, pos_embed is an embedding layer for the position of each token in the sequence, and seg_embed is an embedding layer for the segment of each token in the sequence.


The forward method of the module takes in two inputs: x and seg. x is a tensor of shape (batch_size, seq_size) containing the input sequence, and seg is a tensor of shape (batch_size, seq_len) containing the segment of each token in the sequence. The method first generates a position embedding for each token in the sequence using the pos_embed layer, and then combines this with the token embedding and the segment embedding using element-wise addition. The resulting embeddings are then normalized using layer normalization and returned.

The illustration for ‘pos

In the given code, pos is a tensor of shape (seq_len,) that contains the positions of tokens in the input sequence. It is generated using the torch.arange function, which creates a 1-dimensional tensor containing a sequence of integers from 0 to seq_len-1. The dtype argument specifies the data type of the tensor, which in this case is torch.long.


Next, the unsqueeze(0) method is called on pos, which adds a new dimension of size 1 at the beginning of the tensor. This is done to make the shape of pos compatible with the shape of the input tensor x, which has shape (batch_size, seq_len).


Finally, the expand_as(x) method is called on pos, which creates a new tensor by repeating the elements of pos along the first dimension (batch_size) to match the shape of x. This allows the position embedding to be added element-wise to the token embedding using broadcasting.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值