Pytorch学习笔记——nn.RNN()

Marilynmontu

已于 2022-08-12 08:51:59 修改

阅读量4.6k

点赞数 2

分类专栏：开发小记文章标签： pytorch

于 2020-07-05 21:34:09 首次发布

本文链接：https://blog.csdn.net/Marilynmontu/article/details/107144667

版权

本文详细介绍了PyTorch中nn.RNN类的使用，包括参数解释、输入输出形状以及计算过程。nn.RNN主要参数为input_size和hidden_size，用于构建基于序列的循环神经网络。在前向传播后，返回各时间步的隐藏状态输出和最后时间步的隐藏状态。通过实例展示了维度变化，并建议查阅源代码以进一步理解。

摘要由CSDN通过智能技术生成

pytorch 中使用 nn.RNN 类来搭建基于序列的循环神经网络，其构造函数如下：
nn.RNN(input_size, hidden_size, num_layers=1, nonlinearity=tanh, bias=True, batch_first=False, dropout=0, bidirectional=False)

RNN的结构如下：

RNN 可以被看做是同一神经网络的多次赋值，每个神经网络模块会把消息传递给下一个，我们将这个图的结构展开
参数解释如下：

input_size：The number of expected features in the input x，即输入特征的维度，一般rnn中输入的是词向量，那么 input_size 就等于一个词向量的维度。
hidden_size：The number of features in the hidden state h，即隐藏层神经元个数，或者也叫输出的维度（因为rnn输出为各个时间步上的隐藏状态）。
num_layers：Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN,with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
即网络的层数。
nonlinearity：The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'，即激活函数。
bias：If False, then the layer does not use bias weights b_ih and b_hh. Default: True，即是否使用偏置。
batch_first：If True, then the input and output tensors are provided as (batch, seq, feature). Default: False，即输入数据的形式，默认是 False，如果设置成True，则格式为(seq(num_step), batch, input_dim)，也就是将序列长度放在第一位，batch 放在第二位。
dropout：If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to :attr:dropout. Default: 0，即是否应用dropout, 默认不使用，如若使用将其设置成一个0-1的数字即可。
bidirectional：If True, becomes a bidirectional RNN. Default: False，是否使用双向的 rnn，默认是 False。

nn.RNN() 中最主要的参数是 input_size 和 hidden_size，这两个参数务必要搞清楚。其余的参数通常不用设置，采用默认值就可以了。

RNN输入输出的shape

Inputs: input, h_0
- input of shape (seq_len, batch, input_size): tensor containing the features
of the input sequence. The input can also be a packed variable length
sequence. See :func:torch.nn.utils.rnn.pack_padded_sequence
or :func:torch.nn.utils.rnn.pack_sequence
for details.
- h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor
containing the initial hidden state for each element in the batch.
Defaults to zero if not provided. If the RNN is bidirectional,
num_directions should be 2, else it should be 1.
Outputs: output, h_n
- output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN,
- for each t. If a :class:torch.nn.utils.rnn.PackedSequence has
been given as the input, the output will also be a packed sequence.
For the unpacked case, the directions can be separated
using output.view(seq_len, batch, num_directions, hidden_size),with forward and backward being direction 0 and 1 respectively.
Similarly, the directions can be separated in the packed case.
- h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len.
Like output, the layers can be separated using
h_n.view(num_layers, num_directions, batch, hidden_size).
Shape:
- Input1: :math: $L, N, H_{in})$ tensor containing input features where
:math: