LSTM的输入和输出尺寸
CLASS torch.nn.LSTM(*args, **kwargs)
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
For each element in the input sequence, each layer computes the following function:
对于一个输入序列实现多层长短期记忆的RNN网络,对于输入序列中的每一个元素,LSTM的每一层进行如下计算:
i
t
=
σ
(
W
i
i
x
t
+
b
i
i
+
W
h
i
h
t
−
1
+
b
h
i
)
f
t
=
σ
(
W
i
f
x
t
+
b
i
f
+
W
h
f
h
t
−
1
+
b
h
f
)
g
t
=
tanh
(
W
i
g
x
t
+
b
i
g
+
W
h
g
h
t
−
1
+
b
h
g
)
o
t
=
σ
(
W
i
o
x
t
+
b
i
o
+
W
h
o
h
t
−
1
+
b
h
o
)
c
t
=
f
t
⊙
c
t
−
1
+
i
t
⊙
g
t
h
t
=
o
t
⊙
tanh
(
c
t
)
i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi}) \\ f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg}) \\ o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho}) \\ c_t = f_t \odot c_{t-1} + i_t \odot g_t \\ h_t = o_t \odot \tanh(c_t) \\
it=σ(Wiixt+bii+Whiht−1+bhi)ft=σ(Wifxt+bif+Whfht−1+bhf)gt=tanh(Wigxt+big+Whght−1+bhg)ot=σ(Wioxt+bio+Whoht−1+bho)ct=ft⊙ct−1+it⊙gtht=ot⊙tanh(ct)
其中:
- h t : h_t: ht:时间步t的隐藏状态
- c t : c_t: ct:时间步t的细胞状态
- x t : x_t: xt:时间步t的输入
- h t − 1 : h_{t-1}: ht−1:时间步t-1的隐藏状态或者初始化的隐藏状态(时间步0)
- i t 、 f t 、 g t : i_t、f_t、g_t: it、ft、gt:分别是输入门,遗忘门,单元门和输出门
- σ : \sigma: σ:sigmoid函数
- ⊙ : \odot: ⊙:Hadamard积
其中的参数:
input_size :输入的维度
hidden_size:h的维度
num_layers:堆叠LSTM的层数,默认值为1
bias:偏置 ,默认值:True
batch_first: 如果是True,则input为(batch, seq, input_size)。默认值为:False(seq_len, batch, input_size)
bidirectional :是否双向传播,默认值为False
输入
Inputs: input, (h_0, c_0)
-
Input输入维度是(seq_len, batch, input_size),即(句子中字的数量,批量大小,每个字向量的长度)
-
h_0 的维度(num_layers * num_directions, batch, hidden_size),即(层数 ∗ * ∗LSTM方向数量(单向或者双向),批量大小,隐藏向量维度)
-
c_0 的维度 (num_layers * num_directions, batch, hidden_size),即(层数 ∗ * ∗LSTM方向数量,隐藏向量维度)
-
If (h_0, c_0) is not provided, both h_0 and c_0 default to zero,h_0和c_0的默认参数都是全0.
输出
Outputs: output, (h_n, c_n)
- output 输出维度 (seq_len, batch, num_directions * hidden_size),即(句子中字的数量,批量大小,LSTM方向数量 ∗ * ∗隐藏向量维度)
- h_n 维度 (num_layers * num_directions, batch, hidden_size)
- c_n 维度 (num_layers * num_directions, batch, hidden_size)
举个例子
- num_layers = 1
import torch.nn as nn
import torch
x = torch.rand(5,50,100)#(seq_len, batch, input_size)
lstm = nn.LSTM(100,20,num_layers=2)
output,(hidden,cell) = lstm(x)
print("output size:{} \nhidden size:{} \ncell size:{}".format(output.size(),hidden.size(),cell.size()))
输出:
output size:torch.Size([5, 50, 20])
hidden size:torch.Size([2, 50, 20])
cell size:torch.Size([2, 50, 20])
- bidirecrtional = True
import torch.nn as nn
import torch
x = torch.rand(5,50,100)
lstm = nn.LSTM(100,20,bidirectional=True)
output,(hidden,cell) = lstm(x)
print("output size:{} \nhidden size:{} \ncell size:{}".format(output.size(),hidden.size(),cell.size()))
输出:
output size:torch.Size([5, 50, 40])
hidden size:torch.Size([2, 50, 20])
cell size:torch.Size([2, 50, 20])