pytorch nn.GRU(),RNN详细代码

GRU,LSTM,RNN等模型网络在pytorch中的定义均在torch/nn/modules/rnn,py中
其中GRU,RNN,LSTM均是继承的父类RNNBase
其中关于RNNBase类的定义:

    def __init__(self, mode, input_size, hidden_size,
                 num_layers=1, bias=True, batch_first=False,
                 dropout=0., bidirectional=False):
        super(RNNBase, self).__init__()
        self.mode = mode
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.bias = bias
        self.batch_first = batch_first
        self.dropout = float(dropout)
        self.bidirectional = bidirectional
        num_directions = 2 if bidirectional else 1

        if not isinstance(dropout, numbers.Number) or not 0 <= dropout <= 1 or \
                isinstance(dropout, bool):
            raise ValueError("dropout should be a number in range [0, 1] "
                             "representing the probability of an element being "
                             "zeroed")
        if dropout > 0 and num_layers == 1:
            warnings.warn("dropout option adds dropout after all but last "
                          "recurrent layer, so non-zero dropout expects "
                          "num_layers greater than 1, but got dropout={} and "
                          "num_layers={}".format(dropout, num_layers))

        if mode == 'LSTM':
            gate_size = 4 * hidden_size
        elif mode == 'GRU':
            gate_size = 3 * hidden_size
        elif mode == 'RNN_TANH':
            gate_size = hidden_size
        elif mode == 'RNN_RELU':
            gate_size = hidden_size
        else:
            raise ValueError("Unrecognized RNN mode: " + mode)

        self._all_weights = []
        for layer in range(num_layers):
            for direction in range(num_directions):
                layer_input_size = input_size if layer == 0 else hidden_size * num_directions

                w_ih = Parameter(torch.Tensor(gate_size, layer_input_size))
                w_hh = Parameter(torch.Tensor(gate_size, hidden_size))
                b_ih = Parameter(torch.Tensor(gate_size))
                # Second bias vector included for CuDNN compatibility. Only one
                # bias vector is needed in standard definition.
                b_hh = Parameter(torch.Tensor(gate_size))
                layer_params = (w_ih, w_hh, b_ih, b_hh)

                suffix = '_reverse' if direction == 1 else ''
                param_names = ['weight_ih_l{}{}', 'weight_hh_l{}{}']
                if bias:
                    param_names += ['bias_ih_l{}{}', 'bias_hh_l{}{}']
                param_names = [x.format(layer, suffix) for x in param_names]

                for name, param in zip(param_names, layer_params):
                    setattr(self, name, param)
                self._all_weights.append(param_names)

        self.flatten_parameters()
        self.reset_parameters()

其中关于mode定义了模型是GRU,LSTM…

  • input_size:输入数据X的特征值的数目。
  • hidden_size:隐藏层的神经元数量,也就是隐藏层的特征数量。
  • num_layers:循环神经网络的层数,默认值是 1。
  • bias:默认为 True,如果为 false 则表示神经元不使用 bias 偏移参数。
  • batch_first:如果设置为 True,则输入数据的维度中第一个维度就 是 batch 值,默认为 False。默认情况下第一个维度是序列的长度, 第二个维度才是 - - batch,第三个维度是特征数目。
  • dropout:如果不为空,则表示最后跟一个 dropout 层抛弃部分数据,抛弃数据的比例由该参数指定。默认为0。
  • bidirectional : 如果为True, 则是双向的网络,分为前向和后向。默认为false

关于GRU的输入输出,具体形式和介绍如下:
INPUTS:

  • input:(seq_len,batch,input_size)
  • h_0:(num_layers*num_directions,batch,hidden_size)

OUTPUTS:

  • output:(seq_len,batch,num_directions*hidden_size)
  • h_n:(num_layers*num_directions,batch,hidden_size)

Inputs: input, h_0
- input of shape (seq_len, batch, input_size): tensor containing the features
of the input sequence. The input can also be a packed variable length
sequence. See :func:torch.nn.utils.rnn.pack_padded_sequence
or :func:torch.nn.utils.rnn.pack_sequence
for details.
- h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor
containing the initial hidden state for each element in the batch.
Defaults to zero if not provided. If the RNN is bidirectional,
num_directions should be 2, else it should be 1.

Outputs: output, h_n
- output of shape (seq_len, batch, num_directions * hidden_size): tensor
containing the output features (h_t) from the last layer of the RNN,
for each t. If a :class:torch.nn.utils.rnn.PackedSequence has
been given as the input, the output will also be a packed sequence.
For the unpacked case, the directions can be separated
using output.view(seq_len, batch, num_directions, hidden_size),
with forward and backward being direction 0 and 1 respectively.
Similarly, the directions can be separated in the packed case.
- h_n of shape (num_layers * num_directions, batch, hidden_size): tensor
containing the hidden state for t = seq_len.
Like output, the layers can be separated using
h_n.view(num_layers, num_directions, batch, hidden_size).

RNN,LSTM输入输出的形式同上。其中如果网络为双向的,则num_directions=2;否则为1。

代码参考:
>>> import torch.nn as nn
>>> gru = nn.GRU(input_size=50, hidden_size=50, batch_first=True)
>>> embed = nn.Embedding(3, 50)
>>> x = torch.LongTensor([[0, 1, 2]])
>>> x_embed = embed(x)
>>> x.size()
torch.Size([1, 3])
>>> x_embed.size()
torch.Size([1, 3, 50])
>>> out, hidden = gru(x_embed)
>>> out.size()
torch.Size([1, 3, 50])
>>> hidden.size()
torch.Size([1, 1, 50])
2020.05.05补充:

关于GRU的输入输出维度,当初始化是batch_first=True时,具体的形式如下:

#input:batch_size,seq_length,input_size
#hidden: numlayers*num_directions,batch_size,hidden_size

#output: batch_size,seq_length,num_directions*hidden_size
#h_n: num_layers*num_directions,batch_size,hidden_size
参考链接

**pytorch中RNN,LSTM,GRU使用详解
torch.nn.GRU()函数解读

  • 8
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
`torch.nn.GRU` 是 PyTorch 中的一个循环神经网络(Recurrent Neural Network, RNN)模块,用于实现长短期记忆(Gated Recurrent Unit, GRU)模型。 GRU 是一种常用的循环神经网络模型,它通过门控机制来控制信息的流动和记忆。与传统的循环神经网络相比,GRU 具有更强的建模能力和更好的梯度传播性质。 可以通过创建 `torch.nn.GRU` 的实例来使用 GRU 模型。下面是一个简单的示例代码: ```python import torch import torch.nn as nn input_size = 100 hidden_size = 50 num_layers = 2 # 创建一个 GRU 模型 gru = nn.GRU(input_size, hidden_size, num_layers) # 定义输入数据 input = torch.randn(10, 32, input_size) # 输入序列长度为 10,批次大小为 32 # 初始化隐藏状态 h0 = torch.randn(num_layers, 32, hidden_size) # 隐藏状态的形状为 (num_layers, batch_size, hidden_size) # 前向传播 output, hn = gru(input, h0) ``` 在上述代码中,我们首先创建了一个 `nn.GRU` 的实例 `gru`,并指定了输入大小 `input_size`、隐藏状态大小 `hidden_size` 和堆叠层数 `num_layers`。然后,我们定义了一个大小为 10x32x100 的输入张量 `input`,其中 10 表示序列长度,32 表示批次大小,100 表示输入特征维度。接下来,我们初始化了隐藏状态 `h0`,其形状为 (2, 32, 50)。最后,我们通过调用 `gru` 对输入进行前向传播,得到输出 `output` 和最后一个时间步的隐藏状态 `hn`。 `torch.nn.GRU` 还提供了许多其他的参数和方法,如双向 GRU、批次优先模式、自定义初始权重等。通过这些功能,我们可以方便地构建和训练 GRU 模型。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值