torch.nn中LSTM使用

就是一顿骚操作

已于 2022-10-10 11:00:46 修改

阅读量3.2k

点赞数 3

文章标签： pytorch lstm

于 2022-10-09 18:27:58 首次发布

本文链接：https://blog.csdn.net/weixin_36893273/article/details/127230121

版权

一、pytorch中LSTM实现原理：

对于输入序列中的每个元素，每层计算以下函数：
$i_t = \sigma(W_{ii}x_t + b_{ii} + W_{hi}h_{t-1}+b_{hi})$
$f_t = \sigma(W_{if}x_t + b_{if} + W_{hf}h_{t-1}+b_{hf})$
$o_t = \sigma(W_{io}x_t + b_{io} + W_{ho}h_{t-1}+b_{ho})$
$g_t=tanh(W_{ig}x_t + b_{ig} + W_{hg}h_{t-1} + b_{hg})$
$c_t=f_t \odot c_{t-1} + i_t \odot g_t$
$h_t = o_t \odot tanh(c_t)$
其中各个变量的含义如下：

$h_t$ ：表示在t时间步的hidden state
$c_t$ ：表示在t时间步的cell state
$x_t$ ：表示在t时间步的输入
$h_{t-1}$ ：表示在上个时间步的hidden state 或者在初始时间步的hidden state
$i_t$ , $f_t$ , $o_t$ 表示输入门，遗忘门，输出门
$\sigma$ ：表示sigmoid函数
$\odot$ ：表示点积
在多层LSTM中，第 $l$ 层的输入 $x_t^{(l)}$ ( $l$ >=2)来自于前一层对应时间步的hidden state做dorpout的结果。

二、参数

1. 初始化参数

input_size：输入x的维度
hidden_size：hidden state 的维度
num_layers：LSTM堆叠层数，设置num_layers=2 的话，表示堆叠两层LSTM到一起，第二个 LSTM 接收第一个 LSTM 的输出并计算最终结果；默认为1
bias：如果是False，将不再加入 b_h和b_hh，默认为True
batch_first：如果为True，则输入和输出的tensors的维度为(batch,seq,feature)而不是(seq,batch,feature)。注意：此标识只对output有效，对hidden state 和 cell state无效，默认为False
dorpout：如果为非0，在每个 LSTM 层（最后一层除外）的输出上引入一个dropout层，dropout概率等于此参数值。默认值：0，只对多层LSTM有效。
bidirectional：如果为True，则是双向LSTM，默认为False
proj_size：如果>0，将使用具有相应大小的投影的 LSTM。

2. forward入参

Input：单个样本(unbatched)输入，则形状为 $L, H_{in})$ ；batch_first=False，则形状为 $L,N,H_{in})$ ；batch_first=True，则形状为 $N,L,H_{in})$ 。输入也可以是打包的可变长度序列。参考packedtorch.nn.utils.rnn.pack_padded_sequence() h或者torch.nn.utils.rnn.pack_sequence() 方法。
h_0：单个样本(unbatched)输入，形状为 $D*num_layers, H_out)$ ；batch样本输入，则形状为 $D*num_layers, N, H_out)$ 也就是初始化的hidden state, 默认为0(h_0, c_0)。
c_0：单个样本(unbatched)输入，形状为 $D*num_layers, H_cell)$ ；batch样本输入，则形状为 $D*num_layers, N, H_cell)$ 也就是初始化的hidden state, 默认为0(h_0, c_0)。
其中：
N=batch size
L=sequence length
D=2 if bidirectional=True otherwise 1
$H_{in}$ =input_size
$H_{cell}$ =hidden_size
$H_{out}$ =hidden_size

3. 输出

output：单个样本(unbatched)输入，则形状为 $L, D*H_{out})$ ；batch_first=False，则形状为 $L,N,D*H_{out})$ ；batch_first=True，则形状为 $N,L,D*H_{out})$ 。输入也可以是打包的可变长度序列。参考packedtorch.nn.utils.rnn.pack_padded_sequence() h或者torch.nn.utils.rnn.pack_sequence() 方法。
h_0：单个样本(unbatched)输入，形状为 $D*num_layers, H_out)$ ；batch样本输入，则形状为 $D*num_layers, N, H_out)$ ,包含序列中每个元素的最终隐藏状态。当双向 = True 时，h_n将分别包含最终正向和反向隐藏状态的串联。
c_0：单个样本(unbatched)输入，形状为 $D*num_layers, H_cell)$ ；batch样本输入，则形状为 $D*num_layers, N, H_cell)$ ,包含序列中每个元素的最终cell状态。

三、实例

1）

import torch.nn as nn
import torch
rnn = nn.LSTM(10, 20, 2)# embedding_size, hidden_size, num_layer
input = torch.randn(5, 3, 10)# sequence length, batch size, embedding_size
h0 = torch.randn(2, 3, 20)# num_layer*dirc, batch size, hidden_size
c0 = torch.randn(2, 3, 20)# num_layer*dirc, batch size, hidden_size
output, (hn, cn) = rnn(input, (h0, c0))
output.shape
Out[8]: torch.Size([5, 3, 20])# # sequence length, batch size, hidden_size
hn.shape
Out[9]: torch.Size([2, 3, 20])# num_layer*dirc, batch size, hidden_size
c0.shape
Out[10]: torch.Size([2, 3, 20])# num_layer*dirc, batch size, hidden_size

2）

rnn = nn.LSTM(input_size=1, hidden_size=20, num_layers=2)
input = torch.tensor([[1,2,0], [3,0,0], [4,5,6]], dtype=torch.float)
lens = [2, 1, 3]
# 构建输入数据，维度为:torch.Size([3, 3, 1])， 即 bactch_size=3, sequence length=3, embedding size=1
input = input.unsqueeze(2)
input
Out[68]: 
tensor([[[1.],
         [2.],
         [0.]],
        [[3.],
         [0.],
         [0.]],
        [[4.],
         [5.],
         [6.]]])
# 第一维是 batch，则batch_first=True，
padded_seq = pack_padded_sequence(input, lens, batch_first=True, enforce_sorted=False)
# 将 padded_seq输入，并且不对hidden和cell进行初始化
output, (hn, cn) = rnn(padded_seq)
# 进行逆操作拆箱
output = pad_packed_sequence(output, batch_first=True)
# output[0] LSTM输出，output[1]为batch中样本长度
output[0].shape
Out[72]: torch.Size([3, 3, 20])
output[1]
Out[73]: tensor([2, 1, 3])
hn.shape
Out[76]: torch.Size([2, 3, 20])
cn.shape
Out[77]: torch.Size([2, 3, 20])