pytroch LSTM

最新推荐文章于 2024-03-28 21:43:53 发布

yjinyyzyq

最新推荐文章于 2024-03-28 21:43:53 发布

阅读量230

点赞数

分类专栏：应用理论

本文链接：https://blog.csdn.net/yjinyyzyq/article/details/91990095

版权

应用同时被 2 个专栏收录

185 篇文章 2 订阅

订阅专栏

理论

82 篇文章 0 订阅

订阅专栏

nn.LSTM(*args, **kwargs)

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.、注意这里是可多层。

For each element in the input sequence, each layer computes the following function:

$\begin{array}{ll} \\ i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{(t-1)} + b_{hg}) \\ o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \\ \end{array}$

where :
$h_t$ is the hidden state at time $t$ , :math: $c_t$ is the cell
state at time $t$ , $x_t$ is the input at time $t$ , $h_{(t-1)}$
is the hidden state of the layer at time ${t-1}$ or the initial hidden
state at time $0$ , and $i_t$ , $f_t$ , : $g_t$ , $o_t$ are the input, forget, cell, and output gates, respectively.
$\sigma$ is the sigmoid function, and $*$ is the Hadamard product.
在这里插入图片描述

In a multilayer LSTM, the input $x^{(l)}_t$ of the $l$ -th layer
( $l > = 2$ ) is the hidden state $h^{(l-1)}_t$ of the previous layer multiplied by
dropout $\delta^{(l-1)}_t$ where each $\delta^{(l-1)}_t$ is a Bernoulli random
variable which is :math:0 with probability :attr:dropout.

Args:
input_size: The number of expected features in the input x
hidden_size: The number of features in the hidden state h
num_layers: Number of recurrent layers. E.g., setting num_layers=2
would mean stacking two LSTMs together to form a stacked LSTM,
with the second LSTM taking in outputs of the first LSTM and
computing the final results. Default: 1
bias: If False, then the layer does not use bias weights b_ih and b_hh.
Default: True
batch_first: If True, then the input and output tensors are provided
as (batch, seq, feature). Default: False
dropout: If non-zero, introduces a Dropout layer on the outputs of each
LSTM layer except the last layer, with dropout probability equal to
:attr:dropout. Default: 0
bidirectional: If True, becomes a bidirectional LSTM. Default: False

Inputs: input, (h_0, c_0)
- input of shape ==(seq_len, batch, input_size)==: tensor containing the features
of the input sequence.
The input can also be a packed variable length sequence.
See :func:torch.nn.utils.rnn.pack_padded_sequence or
:func:torch.nn.utils.rnn.pack_sequence for details.
- h_0 of shape (num_layers * num_directions,-h, hidden_size): tensor
containing the initial hidden state for each element in the batch.
If the LSTM is bidirectional, num_directions should be 2, else it should be 1.
- c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor
containing the initial cell state for each element in the batch.

If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.

Outputs: output, (h_n, c_n)
- output of shape (seq_len, batch, num_directions * hidden_size): tensor
containing the output features (h_t) from the last layer of the LSTM,
for each t. If a :class:torch.nn.utils.rnn.PackedSequence has been
given as the input, the output will also be a packed sequence.

  For the unpacked case, the directions can be separated
  using ``output.view(seq_len, batch, num_directions, hidden_size)``,
  with forward and backward being direction `0` and `1` respectively.
  Similarly, the directions can be separated in the packed case.
- **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
  containing the hidden state for `t = seq_len`.

  Like *output*, the layers can be separated using
  ``h_n.view(num_layers, num_directions, batch, hidden_size)`` and similarly for *c_n*.
- **c_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
  containing the cell state for `t = seq_len`.

Attributes:
weight_ih_l[k] : the learnable input-hidden weights of the :math:\text{k}^{th} layer
(W_ii|W_if|W_ig|W_io), of shape (4*hidden_size, input_size) for k = 0.
Otherwise, the shape is (4*hidden_size, num_directions * hidden_size)
weight_hh_l[k] : the learnable hidden-hidden weights of the :math:\text{k}^{th} layer
(W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size, hidden_size)
bias_ih_l[k] : the learnable input-hidden bias of the :math:\text{k}^{th} layer
(b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
bias_hh_l[k] : the learnable hidden-hidden bias of the :math:\text{k}^{th} layer
(b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)

… note::
All the weights and biases are initialized from $hidden_size \mathcal{U}(-\sqrt{k}, \sqrt{k})` where :math:`k = \frac{1}{\text{hidden\_size}}$