1. nn.LSTM()参数
input_size: The number of expected features in the input `x`
hidden_size: The number of features in the hidden state `h`
num_layers: Number of recurrent layers. E.g., setting ``num_layers=2``
would mean stacking two LSTMs together to form a `stacked LSTM`,
with the second LSTM taking in outputs of the first LSTM and
computing the final results. Default: 1
bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
Default: ``True``
batch_first: If ``True``, then the input and output tensors are provided
as (batch, seq, feature). Default: ``False``
dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
LSTM layer except the last layer, with dropout probability equal to
:attr:`dropout`. Default: 0
bidirectional: If ``True``, becomes a bidirectional LSTM. Default: ``False``
LSTM总共有7个参数:前面3个是必须输入的
- input_size:输入特征维数,即每一行输入元素的个数。输入是一维向量。如:[1,2,3,4,5,6,7,8,9],input_size 就是9
- hidden_size: 隐藏层状态的维数,即隐藏层节点的个数,这个和单层感知器的结构是类似的。这个维数值是自定义的,根据具体业务需要决定
- num_layers: LSTM 堆叠的层数,默认值是1层,如果设置为2,第二个LSTM接收第一个LSTM的计算结果。也就是第一层输入 [ X0 X1 X2 … Xt],计算出 [ h0 h1 h2 … ht ],第二层将 [ h0 h1 h2 … ht ] 作为 [ X0 X1 X2 … Xt] 输入再次计算,输出最后的 [ h0 h1 h2 … ht ]。
2. 官方给出的例子
# 首先导入LSTM需要的相关模块
import torch
import torch.nn as nn # 神经网络模块
# 数据向量维数10, 隐藏元维度20, 2个LSTM层串联(如果是1,可以省略,默认为1)
rnn = nn.LSTM(10, 20, 2)
# 序列长度seq_len=5, batch_size=3, 数据向量维数=10
input = torch.randn(5, 3, 10)
# 初始化的隐藏元和记忆元,通常它们的维度是一样的
# 2个LSTM层,batch_size=3,隐藏元维度20
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
# 这里有2层lstm,output是最后一层lstm的每个词向量对应隐藏层的输出,其与层数无关,只与序列长度相关
# hn,cn是所有层最后一个隐藏元和记忆元的输出
output, (hn, cn) = rnn(input, (h0, c0))
print(output.size(),hn.size(),cn.size())
torch.Size([5, 3, 20]) torch.Size([2, 3, 20]) torch.Size([2, 3, 20])
3. 实现RNNCell
import torch as t
from torch import nn
from torch.autograd import Variable as V
t.manual_seed(1000) #作用:每次得到的1000个随机数是固定的
input = V(t.randn(2,3,4))
# 一个LSTMCell对应的层数只能是一层
lstm = nn.LSTMCell(4,3)
hx = V(t.randn(3,3))
cx = V(t.randn(3,3))
out = []
for i in input:
hx,cx = lstm(i,(hx,cx))
out.append(hx)
print(t.stack(out))
实验结果:
tensor([[[-0.3610, -0.1643, 0.1631],
[-0.0613, -0.4937, -0.1642],
[ 0.5080, -0.4175, 0.2502]],
[[-0.0703, -0.0393, -0.0429],
[ 0.2085, -0.3005, -0.2686],
[ 0.1482, -0.4728, 0.1425]]], grad_fn=<StackBackward>)