nn.LSTM(*args, **kwargs)
- Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.、 注意这里是可多层。
- For each element in the input sequence, each layer computes the following function:
i t = σ ( W i i x t + b i i + W h i h ( t − 1 ) + b h i ) f t = σ ( W i f x t + b i f + W h f h ( t − 1 ) + b h f ) g t = tanh ( W i g x t + b i g + W h g h ( t − 1 ) + b h g ) o t = σ ( W i o x t + b i o + W h o h ( t − 1 ) + b h o ) c t = f t ∗ c ( t − 1 ) + i t ∗ g t h t = o t ∗ tanh ( c t ) \begin{array}{ll} \\ i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{(t-1)} + b_{hg}) \\ o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \\ \end{array} it=σ(Wiixt+bii+Whih(t−1)+bhi)ft=σ(Wifxt+bif+Whfh(t−1)+bhf)gt=tanh(Wigxt+big+Whgh(t−1)+bhg)ot=σ(Wioxt+bio+Whoh(t−1)+bho)ct=ft∗c(t−1)+it∗gtht=ot∗tanh(ct)
where :
h
t
h_t
ht is the hidden state at time
t
t
t, :math:
c
t
c_t
ct is the cell
state at time
t
t
t,
x
t
x_t
xt is the input at time
t
t
t,
h
(
t
−
1
)
h_{(t-1)}
h(t−1)
is the hidden state of the layer at time
t
−
1
{t-1}
t−1 or the initial hidden
state at time
0
0
0, and
i
t
i_t
it,
f
t
f_t
ft, :
g
t
g_t
gt,
o
t
o_t
ot are the input, forget, cell, and output gates, respectively.
σ
\sigma
σ is the sigmoid function, and
∗
*
∗ is the Hadamard product.
- In a multilayer LSTM, the input
x
t
(
l
)
x^{(l)}_t
xt(l)of the
l
l
l -th layer
( l > = 2 l >= 2 l>=2) is the hidden state h t ( l − 1 ) h^{(l-1)}_t ht(l−1) of the previous layer multiplied by
dropout δ t ( l − 1 ) \delta^{(l-1)}_t δt(l−1) where each δ t ( l − 1 ) \delta^{(l-1)}_t δt(l−1) is a Bernoulli random
variable which is :math:0
with probability :attr:dropout
.
Args:
input_size: The number of expected features in the input x
hidden_size: The number of features in the hidden state h
num_layers: Number of recurrent layers. E.g., setting num_layers=2
would mean stacking two LSTMs together to form a stacked LSTM
,
with the second LSTM taking in outputs of the first LSTM and
computing the final results. Default: 1
bias: If False
, then the layer does not use bias weights b_ih
and b_hh
.
Default: True
batch_first: If True
, then the input and output tensors are provided
as (batch, seq, feature). Default: False
dropout: If non-zero, introduces a Dropout
layer on the outputs of each
LSTM layer except the last layer, with dropout probability equal to
:attr:dropout
. Default: 0
bidirectional: If True
, becomes a bidirectional LSTM. Default: False
Inputs: input, (h_0, c_0)
- input of shape ==(seq_len, batch, input_size)==
: tensor containing the features
of the input sequence.
The input can also be a packed variable length sequence.
See :func:torch.nn.utils.rnn.pack_padded_sequence
or
:func:torch.nn.utils.rnn.pack_sequence
for details.
- h_0 of shape (num_layers * num_directions,-h, hidden_size)
: tensor
containing the initial hidden state for each element in the batch.
If the LSTM is bidirectional, num_directions should be 2, else it should be 1.
- c_0 of shape (num_layers * num_directions, batch, hidden_size)
: tensor
containing the initial cell state for each element in the batch.
If (h_0, c_0)
is not provided, both h_0 and c_0 default to zero.
Outputs: output, (h_n, c_n)
- output of shape (seq_len, batch, num_directions * hidden_size)
: tensor
containing the output features (h_t)
from the last layer of the LSTM,
for each t
. If a :class:torch.nn.utils.rnn.PackedSequence
has been
given as the input, the output will also be a packed sequence.
For the unpacked case, the directions can be separated
using ``output.view(seq_len, batch, num_directions, hidden_size)``,
with forward and backward being direction `0` and `1` respectively.
Similarly, the directions can be separated in the packed case.
- **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
containing the hidden state for `t = seq_len`.
Like *output*, the layers can be separated using
``h_n.view(num_layers, num_directions, batch, hidden_size)`` and similarly for *c_n*.
- **c_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
containing the cell state for `t = seq_len`.
Attributes:
weight_ih_l[k] : the learnable input-hidden weights of the :math:\text{k}^{th}
layer
(W_ii|W_if|W_ig|W_io)
, of shape (4*hidden_size, input_size)
for k = 0
.
Otherwise, the shape is (4*hidden_size, num_directions * hidden_size)
weight_hh_l[k] : the learnable hidden-hidden weights of the :math:\text{k}^{th}
layer
(W_hi|W_hf|W_hg|W_ho)
, of shape (4*hidden_size, hidden_size)
bias_ih_l[k] : the learnable input-hidden bias of the :math:\text{k}^{th}
layer
(b_ii|b_if|b_ig|b_io)
, of shape (4*hidden_size)
bias_hh_l[k] : the learnable hidden-hidden bias of the :math:\text{k}^{th}
layer
(b_hi|b_hf|b_hg|b_ho)
, of shape (4*hidden_size)
… note::
All the weights and biases are initialized from
U
(
−
k
,
k
)
‘
w
h
e
r
e
:
m
a
t
h
:
‘
k
=
1
hidden_size
\mathcal{U}(-\sqrt{k}, \sqrt{k})` where :math:`k = \frac{1}{\text{hidden\_size}}
U(−k,k)‘where:math:‘k=hidden_size1
… include:: cudnn_persistent_rnn.rst