词嵌入矩阵,可以加载使用word2vector,glove
API
CLASS torch.nn.Embedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[torch.Tensor] = None)
参数 | 描述 |
---|---|
num_embeddings (int) | size of the dictionary of embeddings |
embedding_dim (int) | the size of each embedding vector |
padding_idx (int, optional) | If given, pads the output with the embedding vector at padding_idx (initialized to zeros) whenever it encounters the index. |
max_norm (float, optional) | If given, each embedding vector with norm larger than max_norm is renormalized to have norm max_norm. |
norm_type (float, optional) | The p of the p-norm to compute for the max_norm option. Default 2. |
scale_grad_by_freq (boolean, optional) | If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default False. |
sparse (bool, optional) | If True, gradient w.r.t. weight matrix will be a sparse tensor. See Notes for more details regarding sparse gradients. |
>>> # an Embedding module containing 10 tensors of size 3
>>> embedding = nn.Embedding(10, 3) # 10表示有10个词,3表示3个维度
>>> # a batch of 2 samples of 4 indices each
>>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
>>> embedding(input)
tensor([[[-0.0251, -1.6902, 0.7172],
[-0.6431, 0.0748, 0.6969],
[ 1.4970, 1.3448, -0.9685],
[-0.3677, -2.7265, -0.1685]],
[[ 1.4970, 1.3448, -0.9685],
[ 0.4362, -0.4004, 0.9400],
[-0.6431, 0.0748, 0.6969],
[ 0.9124, -2.3616, 1.1151]]])
参考:
https://pytorch.org/docs/master/generated/torch.nn.Embedding.html?highlight=nn%20embedding#torch.nn.Embedding
https://www.cnblogs.com/lindaxin/p/7991436.html