Pytorch 词嵌入word_embedding1初识

最新推荐文章于 2024-02-22 14:45:00 发布

csdn_1HAO

最新推荐文章于 2024-02-22 14:45:00 发布

阅读量274

点赞数

分类专栏： Pytorch

本文链接：https://blog.csdn.net/caomin1hao/article/details/106795809

版权

Pytorch 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

torch.nn.Embedding(num_embeddings, embedding_dim, 
                   padding_idx=None, max_norm=None, 
                   norm_type=2, scale_grad_by_freq=False, 
                   sparse=False)

参数所表示的含义：

num_embeddings (int) ：嵌入字典的大小

embedding_dim (int) ：每个嵌入向量的大小

padding_idx (int, optional) ：如果提供的话，输出遇到此下标时用零填充

max_norm (float, optional) ：如果提供的话，会重新归一化词嵌入，使它们的范数小于提供的值。

norm_type (float, optional) ：对于max_norm选项计算p范数时的p

scale_grad_by_freq (boolean, optional) ：如果提供的话，会根据字典中单词频率缩放梯度注意：没有指定训练好的词向量时, embedding会自动生成一个随机的词向量。

示例：

import torch
import torch.nn as nn
from torch.autograd import Variable

#-----------定义你的数据-----------
###每个单词需要用一个数字去表示，如'hello'用0来表示###########
word2id = {'hello': 0, 'world': 1}

#---------开始创建初始词向量--------
#nn.Embedding(vocab_size, embedding_dim)，
# vocab_size为词的个数，embedding_dim为词向量的长度
#如果有1000个词，每个词希望是100维，表示为nn.Embedding(1000, 100)
#这里2表示有2个词，5表示5维度，
embeds = nn.Embedding(2, 5)


#---下面几行的代码是为了访问每一个词的词向量-----------
#访问nn.Embedding里面定义的元素，
# 由于word embeding算是神经网络里面的参数，所以需要定义Variable。
#下面这行代码意思就是可以通过torch.LongTensor([0])直接构建一个Tensor
hello_idx = torch.LongTensor([word2id['hello']])
print(hello_idx)
#得到一个Variable，它的值是hello这个词的index，也就是0。
hello_idx = Variable(hello_idx)
print(hello_idx)



#得到word embedding里面关于hello这个词的初始词向量，
hello_embed = embeds(hello_idx)
print(hello_embed)