pytroch LSTM

82 篇文章 0 订阅

nn.LSTM(*args, **kwargs)

  • Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.、 注意这里是可多层。

  • For each element in the input sequence, each layer computes the following function:

i t = σ ( W i i x t + b i i + W h i h ( t − 1 ) + b h i ) f t = σ ( W i f x t + b i f + W h f h ( t − 1 ) + b h f ) g t = tanh ⁡ ( W i g x t + b i g + W h g h ( t − 1 ) + b h g ) o t = σ ( W i o x t + b i o + W h o h ( t − 1 ) + b h o ) c t = f t ∗ c ( t − 1 ) + i t ∗ g t h t = o t ∗ tanh ⁡ ( c t ) \begin{array}{ll} \\ i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{(t-1)} + b_{hg}) \\ o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \\ \end{array} it=σ(Wiixt+bii+Whih(t1)+bhi)ft=σ(Wifxt+bif+Whfh(t1)+bhf)gt=tanh(Wigxt+big+Whgh(t1)+bhg)ot=σ(Wioxt+bio+Whoh(t1)+bho)ct=ftc(t1)+itgtht=ottanh(ct)

where :
h t h_t ht is the hidden state at time t t t, :math: c t c_t ct is the cell
state at time t t t, x t x_t xt is the input at time t t t, h ( t − 1 ) h_{(t-1)} h(t1)
is the hidden state of the layer at time t − 1 {t-1} t1 or the initial hidden
state at time 0 0 0, and i t i_t it, f t f_t ft, : g t g_t gt, o t o_t ot are the input, forget, cell, and output gates, respectively.
σ \sigma σ is the sigmoid function, and ∗ * is the Hadamard product.
在这里插入图片描述
在这里插入图片描述

  • In a multilayer LSTM, the input x t ( l ) x^{(l)}_t xt(l)of the l l l -th layer
    ( l > = 2 l >= 2 l>=2) is the hidden state h t ( l − 1 ) h^{(l-1)}_t ht(l1) of the previous layer multiplied by
    dropout δ t ( l − 1 ) \delta^{(l-1)}_t δt(l1) where each δ t ( l − 1 ) \delta^{(l-1)}_t δt(l1) is a Bernoulli random
    variable which is :math:0 with probability :attr:dropout.

Args:
input_size: The number of expected features in the input x
hidden_size: The number of features in the hidden state h
num_layers: Number of recurrent layers. E.g., setting num_layers=2
would mean stacking two LSTMs together to form a stacked LSTM,
with the second LSTM taking in outputs of the first LSTM and
computing the final results. Default: 1
bias: If False, then the layer does not use bias weights b_ih and b_hh.
Default: True
batch_first: If True, then the input and output tensors are provided
as (batch, seq, feature). Default: False
dropout: If non-zero, introduces a Dropout layer on the outputs of each
LSTM layer except the last layer, with dropout probability equal to
:attr:dropout. Default: 0

bidirectional: If True, becomes a bidirectional LSTM. Default: False

Inputs: input, (h_0, c_0)
- input of shape ==(seq_len, batch, input_size)==: tensor containing the features
of the input sequence.
The input can also be a packed variable length sequence.
See :func:torch.nn.utils.rnn.pack_padded_sequence or
:func:torch.nn.utils.rnn.pack_sequence for details.
- h_0 of shape (num_layers * num_directions,-h, hidden_size): tensor
containing the initial hidden state for each element in the batch.
If the LSTM is bidirectional, num_directions should be 2, else it should be 1.
- c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor
containing the initial cell state for each element in the batch.

If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.

Outputs: output, (h_n, c_n)
- output of shape (seq_len, batch, num_directions * hidden_size): tensor
containing the output features (h_t) from the last layer of the LSTM,
for each t. If a :class:torch.nn.utils.rnn.PackedSequence has been
given as the input, the output will also be a packed sequence.

  For the unpacked case, the directions can be separated
  using ``output.view(seq_len, batch, num_directions, hidden_size)``,
  with forward and backward being direction `0` and `1` respectively.
  Similarly, the directions can be separated in the packed case.
- **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
  containing the hidden state for `t = seq_len`.

  Like *output*, the layers can be separated using
  ``h_n.view(num_layers, num_directions, batch, hidden_size)`` and similarly for *c_n*.
- **c_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
  containing the cell state for `t = seq_len`.

Attributes:
weight_ih_l[k] : the learnable input-hidden weights of the :math:\text{k}^{th} layer
(W_ii|W_if|W_ig|W_io), of shape (4*hidden_size, input_size) for k = 0.
Otherwise, the shape is (4*hidden_size, num_directions * hidden_size)
weight_hh_l[k] : the learnable hidden-hidden weights of the :math:\text{k}^{th} layer
(W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size, hidden_size)
bias_ih_l[k] : the learnable input-hidden bias of the :math:\text{k}^{th} layer
(b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
bias_hh_l[k] : the learnable hidden-hidden bias of the :math:\text{k}^{th} layer
(b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)

… note::
All the weights and biases are initialized from U ( − k , k ) ‘ w h e r e : m a t h : ‘ k = 1 hidden_size \mathcal{U}(-\sqrt{k}, \sqrt{k})` where :math:`k = \frac{1}{\text{hidden\_size}} U(k ,k )where:math:k=hidden_size1

… include:: cudnn_persistent_rnn.rst

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值