pytorch中nn.Embedding和nn.LSTM和nn.Linear

最新推荐文章于 2024-03-20 10:40:37 发布

非常规程序猿

最新推荐文章于 2024-03-20 10:40:37 发布

阅读量4.2k

点赞数 7

分类专栏： python word embedding NLP

本文链接：https://blog.csdn.net/wangyangjingjing/article/details/113932877

版权

NLP 同时被 3 个专栏收录

3 篇文章 0 订阅

订阅专栏

python

2 篇文章 0 订阅

订阅专栏

word embedding

2 篇文章 0 订阅

订阅专栏

使用pytorch实现一个LSTM网络很简单，最基本的有三个要素：nn.Embedding, nn.LSTM, nn.Linear

基本框架为：

class LSTMModel(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
        super(LSTMModel, self).__init__()
        self.hidden_dim = hidden_dim
        
        # vacab_size是使用的字典的长度
        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)

        # LSTM模块使用word_embeddings作为输入，输出的维度为hidden_dim
        self.lstm = nn.LSTM(embedding_dim, hidden_dim)

        # nn.Linear将LSTM模块的输出映射到目标向量空间，即线性空间
        self.linear = nn.Linear(hidden_dim, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))
        linear_out = self.linear(lstm_out.view(len(sentence), -1))
        
        # 之后使用score function计算score并返回结果
        score = 分数计算方法

        return score

可以直接看结尾的总结

CLASS torch.nn.Embedding(num_embeddings: int, embedding_dim: int, padding_idx: 
      Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, 
      scale_grad_by_freq: bool = False, sparse: bool = False, _weight:  
      Optional[torch.Tensor] = None)

先看pytorch官网的定义：

A simple lookup table that stores embeddings of a fixed dictionary and size. 一个有设置了固定词典和大小的查询表，存储的是embeddings（嵌入）

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings. 这个模块经常用来存储词嵌入和通过索引进行检索。该模块的输入是一个索引列表，输出是相对应的词嵌入。

再看初始化该模块时所需要的参数：

Parameters

num_embeddings (int) – size of the dictionary of embeddings 这个就是所用词典的大小
embedding_dim (int) – the size of each embedding vector 词嵌入向量的维度
下边的都是可选参数：
padding_idx (int, optional) – If given, pads the output with the embedding vector at padding_idx (initialized to zeros) whenever it encounters the index. 用来指定padding的位置，初始化为0
max_norm (float, optional) – If given, each embedding vector with norm larger than max_norm is renormalized to have norm max_norm.
norm_type (float, optional) – The p of the p-norm to compute for the max_norm option. Default 2.
scale_grad_by_freq (boolean, optional) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default False.
sparse (bool, optional) – If True, gradient w.r.t. weight matrix will be a sparse tensor. See Notes for more details regarding sparse gradients.

Variables

~Embedding.weight (Tensor) – the learnable weights of the module of shape (num_embeddings, embedding_dim) initialized from \mathcal{N}(0, 1)N(0,1)

Shape:

Input: (*), LongTensor of arbitrary shape containing the indices to extract. 一定是LongTensor，任意shape，
Output: (*, H), where * is the input shape and H=embedding_dim

Examples:

>>> # an Embedding module containing 10 tensors of size 3
>>> embedding = nn.Embedding(10, 3)
>>> # a batch of 2 samples of 4 indices each
>>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
>>> embedding(input)

tensor([[[-0.0251, -1.6902,  0.7172],  # 1
         [-0.6431,  0.0748,  0.6969],  # 2
         [ 1.4970,  1.3448, -0.9685],  # 4
         [-0.3677, -2.7265, -0.1685]], # 5

        [[ 1.4970,  1.3448, -0.9685],  # 4
         [ 0.4362, -0.4004,  0.9400],  # 3
         [-0.6431,  0.0748,  0.6969],  # 2
         [ 0.9124, -2.3616,  1.1151]]]) # 9


>>> # example with padding_idx
>>> embedding = nn.Embedding(10, 3, padding_idx=0)
>>> input = torch.LongTensor([[0,2,0,5]])
>>> embedding(input)

tensor([[[ 0.0000,  0.0000,  0.0000],    # 0
         [ 0.1535, -2.0309,  0.9315],    # 2
         [ 0.0000,  0.0000,  0.0000],    # 0
         [-0.1655,  0.9897,  0.0635]]])  # 5

该模型初始化为：包括10个tensor向量，每个向量的size是3

输入的size为（2，4），2为batch_size，即有多少个sequence，4为每个sequence的size，———— （batch_size, sequence_length）

输出的size为（2，4，3），3为embedding_dim，也就是每个embedding向量的长度

CLASS torch.nn.LSTM(*args, **kwargs)

介绍就不说了，参考https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Parameters

input_size – The number of expected features in the input x 每个输入sample里feature向量的长度是多少，对应Embedding里的embedding_dim
hidden_size – The number of features in the hidden state h
num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1
bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True
batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False
dropout – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout. Default: 0
bidirectional – If True, becomes a bidirectional LSTM. Default: False

Inputs: input, (h_0, c_0)

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See torch.nn.utils.rnn.pack_padded_sequence() or torch.nn.utils.rnn.pack_sequence() for details. 主要输入的shape，(seq_len, batch, input_size)，batch在第二个位置
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. If the LSTM is bidirectional, num_directions should be 2, else it should be 1. 初始hidden_state
c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial cell state for each element in the batch. 初始cell_state

If (h_0, c_0) is not provided, both h_0 and c_0 default to zero. 没有手动初始化，会自动初始化为0

Outputs: output, (h_n, c_n)

output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.

For the unpacked case, the directions can be separated using output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len.

Like output, the layers can be separated using h_n.view(num_layers, num_directions, batch, hidden_size) and similarly for c_n.
c_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t = seq_len.

Examples:

>>> rnn = nn.LSTM(10, 20, 2)  # input_size: 10, hidden_size: 20, num_layer: 2
>>> input = torch.randn(5, 3, 10)  # sequence length: 5, batch size: 3, input size: 10
>>> h0 = torch.randn(2, 3, 20)
>>> c0 = torch.randn(2, 3, 20)
>>> output, (hn, cn) = rnn(input, (h0, c0))
>>> output.size()

torch.Size([5, 3, 20])

初始化的lstm的input size为10， hidden size为20，num layer为2

输入的shape为（5，3，10），sequence长度为5，batch size为3，input size为10

输出的shape为（5，3，20），sequence长度为5，batch size为3，output size为20，20是由方向的数量（1 or 2）* hidden size

CLASS torch.nn.Linear(in_features: int, out_features: int, bias: bool = True)

Parameters

in_features – size of each input sample 每个input的feature的数量
out_features – size of each output sample 每个output的feature的数量
bias – If set to False, the layer will not learn an additive bias. Default: True

Shape:

Input: (N,∗,Hin) where *∗ means any number of additional dimensions and Hin=in_features 输入的size
Output: (N,∗,Hout) where all but the last dimension are the same shape as the input and Hout=out_features. 输出的size

>>> m = nn.Linear(20, 30)
>>> input = torch.randn(128, 20)
>>> output = m(input)  # 输入有128个sample，每个sample的长度是20
>>> print(output.size())  # 输出同样会是128个sample，每个sample的长度为30
torch.Size([128, 30])

总结：

nn.Embeddings:

初始化需要num_embeddings，即使用的字典里字的数量， embedding_dim，即生成的词嵌入向量的长度

输入：（batch_size, sequence_size）

输出：（batch_size, sequence_size, embedding_dim）将shape变化一下，可以直接作为lstm的输入

nn.LSTM:

初始化需要input_size，即每个输入的sample里feature向量的长度，对应embedding里的embedding_dim， hidden_size，即hidden_state向量的长度，num_layer，即几层lstm

输入：（sequence_len, batch_size, input_size）

输出：（sequence_len, batch_size, num_directions*hidden_size）

nn.Linear

初始化需要in_features，out_features

输入：（batch_size，in_features）

输出：（batch_size，out_features）

非常规程序猿

关注

7
点赞
踩
18

收藏

觉得还不错? 一键收藏
0
评论
pytorch中nn.Embedding和nn.LSTM和nn.Linear

CLASS torch.nn.Embedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[.
复制链接

扫一扫