pytorch中RNN,LSTM,GRU使用详解

最新推荐文章于 2023-03-29 20:41:35 发布

lkangkang

最新推荐文章于 2023-03-29 20:41:35 发布

阅读量2.7w

点赞数 39

分类专栏： Python 文章标签： RNN LSTM GRU pytorch

本文链接：https://blog.csdn.net/lkangkang/article/details/89814697

版权

Python 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

RNNCell

nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity=‘tanh’)
$h^{\prime}=\tanh \left(W_{i h} x+b_{i h}+W_{h h} h+b_{h h}\right)$

input_size：输入数据X的特征值的数目。
hidden_size：隐藏层的神经元数量，也就是隐藏层的特征数量。
bias：默认为 True，如果为 false 则表示神经元不使用 bias 偏移参数。
nonlinearity：默认为tanh，可选relu

输入：

input：[batch,input_size]
hidden：[batch，hidden_size]

输出：

$h^{'}$ ：[batch,hidden_size]

参数：

RNNCell.weight_ih: [hidden_size, input_size]
RNNCell.weight_hh: [hidden_size, hidden_size]
RNNCell.bias_ih: [hidden_size]
RNNCell.bias_hh: [hidden_size]

#输入特征维度5，输出维度10
rnn_cell = torch.nn.RNNCell(5,10)
#Batch_size=2
input = torch.randn(2,5)
h_0 = torch.randn(2,10)
h = rnn_cell(input,h_0)
h.shape
>>torch.Size([2, 10])

[(para[0],para[1].shape) for para in list(rnn_cell.named_parameters())]
>>[('weight_ih', torch.Size([10, 5])),
 ('weight_hh', torch.Size([10, 10])),
 ('bias_ih', torch.Size([10])),
 ('bias_hh', torch.Size([10]))]

RNN

torch.nn.RNN(args, kwargs)*
$h_{t}=\tanh \left(W_{i h} x_{t}+b_{i h}+W_{h h} h_{(t-1)}+b_{h h}\right)$

input_size：输入数据X的特征值的数目。
hidden_size：隐藏层的神经元数量，也就是隐藏层的特征数量。
num_layers：循环神经网络的层数，默认值是 1。
nonlinearity：默认为tanh，可选relu
bias：默认为 True，如果为 false 则表示神经元不使用 bias 偏移参数。

batch_first：如果设置为 True，则输入数据的维度中第一个维度就是 batch 值，默认为 False。默认情况下第一个维度是序列的长度，第二个维度才是 - - batch，第三个维度是特征数目。
dropout：如果不为空，则表示最后跟一个 dropout 层抛弃部分数据，抛弃数据的比例由该参数指定。默认为0。
bidirectional : If True, becomes a bidirectional RNN. Default: False

输入：

input: [seq_len, batch, input_size]
$h_{0}$ : [(num_layers * num_directions, batch, hidden_size)]

输出：

out: [seq_len, batch, num_directions * hidden_size]
$h_{n}$ : [num_layers * num_directions, batch, hidden_size]

参数：

RNN.weight_ih_l[k]: 第0层[hidden_size, input_size]，之后为[hidden_size, num_directions * hidden_size]
RNN.weight_hh_l[k]: [hidden_size, hidden_size]
RNN.bias_ih_l[k]: [hidden_size]
RNN.bias_hh_l[k]: [hidden_size]

#输入特征维度5，输出维度10, 层数2
rnn = torch.nn.RNN(5, 10, 2)
#seq长度4，batch_size=2
input = torch.randn(4 , 2 , 5)
h_0 =torch.randn(2 , 2 , 10)
output,hn=rnn(input ,h_0) 

print(output.size(),hn.size())
>>torch.Size([4, 2, 10]) torch.Size([2, 2, 10])

[(para[0],para[1].shape) for para in list(rnn.named_parameters())]
>>[('weight_ih_l0', torch.Size([10, 5])),
 ('weight_hh_l0', torch.Size([10, 10])),
 ('bias_ih_l0', torch.Size([10])),
 ('bias_hh_l0', torch.Size([10])),
 ('weight_ih_l1', torch.Size([10, 10])),
 ('weight_hh_l1', torch.Size([10, 10])),
 ('bias_ih_l1', torch.Size([10])),
 ('bias_hh_l1', torch.Size([10]))]

rnn = torch.nn.RNN(5, 10, 2,bidirectional=True)
>>[('weight_ih_l0', torch.Size([10, 5])),
 ('weight_hh_l0', torch.Size([10, 10])),
 ('bias_ih_l0', torch.Size([10])),
 ('bias_hh_l0', torch.Size([10])),
 ('weight_ih_l0_reverse', torch.Size([10, 5])),
 ('weight_hh_l0_reverse', torch.Size([10, 10])),
 ('bias_ih_l0_reverse', torch.Size([10])),
 ('bias_hh_l0_reverse', torch.Size([10])),
 ('weight_ih_l1', torch.Size([10, 20])),
 ('weight_hh_l1', torch.Size([10, 10])),
 ('bias_ih_l1', torch.Size([10])),
 ('bias_hh_l1', torch.Size([10])),
 ('weight_ih_l1_reverse', torch.Size([10, 20])),
 ('weight_hh_l1_reverse', torch.Size([10, 10])),
 ('bias_ih_l1_reverse', torch.Size([10])),
 ('bias_hh_l1_reverse', torch.Size([10]))]

LSTMCell

torch.nn.LSTMCell(input_size, hidden_size, bias=True)
$\begin{array}{l}{i=\sigma\left(W_{i i} x+b_{i i}+W_{h i} h+b_{h i}\right)} \\ {f=\sigma\left(W_{i f} x+b_{i f}+W_{h f} h+b_{h f}\right)} \\ {g=\tanh \left(W_{i g} x+b_{i g}+W_{h g} h+b_{h g}\right)} \\ {o=\sigma\left(W_{i o} x+b_{i o}+W_{h o} h+b_{h o}\right)} \\ {c^{\prime}=f * c+i * g} \\ {h^{\prime}=o * \tanh \left(c^{\prime}\right)}\end{array}$

input_size：输入数据X的特征值的数目。
hidden_size：隐藏层的神经元数量，也就是隐藏层的特征数量。
bias：默认为 True，如果为 false 则表示神经元不使用 $bias_{ih}$ 和 $bias_{hh}$ 偏移参数。

输入：input,(h0,c0) ,后两个默认为全0

input: [batch, input_size]
$h_{0}$ : [batch, hidden_size]
$c_{0}$ : [batch, hidden_size]

输出：

$h_{1}$ : [batch, hidden_size]
$c_{1}$ : [batch, hidden_size]

参数：

LSTMCell.weight_ih:包括(W_ii|W_if|W_ig|W_io), [4*hidden_size, input_size]
LSTMCell.weight_hh: 包括((W_hi|W_hf|W_hg|W_ho)), [4*hidden_size, hidden_size]
LSTMCell.bias_ih: 包括(b_ii|b_if|b_ig|b_io), [4*hidden_size]
LSTMCell.bias_hh: 包括(b_hi|b_hf|b_hg|b_ho), [4*hidden_size]

lstm_cell = torch.nn.LSTMCell(5,10)
input = torch.randn(2,5)
h_0 = torch.randn(2,10)
c_0 = torch.randn(2,10)
h1,c1 = lstm_cell(input,(h_0,c_0))
print(h1.shape,c1.shape)
>>torch.Size([2, 10]) torch.Size([2, 10])

[(para[0],para[1].shape) for para in list(lstm_cell.named_parameters())]
>>[('weight_ih', torch.Size([40, 5])),
 ('weight_hh', torch.Size([40, 10])),
 ('bias_ih', torch.Size([40])),
 ('bias_hh', torch.Size([40]))]

LSTM

torch.nn.LSTM(*args, **kwargs)
$\begin{array}{l}{i_{t}=\sigma\left(W_{i i} x_{t}+b_{i i}+W_{h i} h_{(t-1)}+b_{h i}\right)} \\ {f_{t}=\sigma\left(W_{i f} x_{t}+b_{i f}+W_{h f} h_{(t-1)}+b_{h f}\right)} \\ {g_{t}=\tanh \left(W_{i g} x_{t}+b_{i g}+W_{h g} h_{(t-1)}+b_{h g}\right)} \\ {o_{t}=\sigma\left(W_{i o} x_{t}+b_{i o}+W_{h g} h_{(t-1)}+b_{h o}\right)} \\ {c_{t}=f_{t} * c_{(t-1)}+i_{t} * g_{t}} \\ {h_{t}=o_{t} * \tanh \left(c_{t}\right)}\end{array}$

input_size：输入数据X的特征值的数目。
hidden_size：隐藏层的神经元数量，也就是隐藏层的特征数量。
num_layers：循环神经网络的层数，默认值是 1。
bias：默认为 True，如果为 false 则表示神经元不使用 $bias_{ih} $和$ bias_{hh}$偏移参数。
batch_first：如果设置为 True，则输入数据的维度中第一个维度就是 batch 值，默认为 False。默认情况下第一个维度是序列的长度，第二个维度才是 - - batch，第三个维度是特征数目。
dropout：如果不为空，则表示最后跟一个 dropout 层抛弃部分数据，抛弃数据的比例由该参数指定。默认为0。
bidirectional : If True, becomes a bidirectional RNN. Default: False

输入：input,(h0,c0) ,后两个默认为全0

input: [seq_len, batch, input_size]
$h_{0}$ : [num_layers* num_directions, batch, hidden_size]
$c_{0}$ : [num_layers* num_directions, batch, hidden_size]

输出：

output: [seq_len, batch, num_directions * hidden_size]

$h_{n}$ : [num_layers * num_directions, batch, hidden_size]
$c_{n}$ : [num_layers * num_directions, batch, hidden_size]

参数：

LSTM.weight_ih_l[k]: 包括(W_ii|W_if|W_ig|W_io), 第0层[4*hidden_size, input_size]，之后为[4*hidden_size, num_directions * hidden_size]
LSTM.weight_hh_l[k]: 包括((W_hi|W_hf|W_hg|W_ho)), [4*hidden_size, hidden_size]
LSTM.bias_ih_l[k]: 包括(b_ii|b_if|b_ig|b_io), [4*hidden_size]
LSTM.bias_hh_l[k]: 包括(b_hi|b_hf|b_hg|b_ho), [4*hidden_size]

lstm = torch.nn.LSTM(5, 10, 2)
#seq长度4，batch_size=2
input = torch.randn(4 , 2 , 5)
h_0 =torch.randn(2 , 2 , 10)
c_0 =torch.randn(2 , 2 , 10)
output,(hn,cn)=lstm(input ,(h_0,c_0))
output.shape,hn.shape,cn.shape
>>(torch.Size([4, 2, 10]), torch.Size([2, 2, 10]), torch.Size([2, 2, 10]))

[(para[0],para[1].shape) for para in list(lstm.named_parameters())]
>>[('weight_ih_l0', torch.Size([40, 5])),
 ('weight_hh_l0', torch.Size([40, 10])),
 ('bias_ih_l0', torch.Size([40])),
 ('bias_hh_l0', torch.Size([40])),
 ('weight_ih_l1', torch.Size([40, 10])),
 ('weight_hh_l1', torch.Size([40, 10])),
 ('bias_ih_l1', torch.Size([40])),
 ('bias_hh_l1', torch.Size([40]))]

GRUCell

torch.nn.GRUCell(input_size, hidden_size, bias=True)
$\begin{array}{l}{r=\sigma\left(W_{i r} x+b_{i r}+W_{h r} h+b_{h r}\right)} \\ {z=\sigma\left(W_{i r} x+b_{i z}+W_{h z} h+b_{h z}\right)} \\ {n=\tanh \left(W_{i n} x+b_{i n}+r *\left(W_{h n} h+b_{h n}\right)\right)} \\ {h^{\prime}=(1-z) * n+z * h}\end{array}$

input_size：输入数据X的特征值的数目。
hidden_size：隐藏层的神经元数量，也就是隐藏层的特征数量。
bias：默认为 True，如果为 false 则表示神经元不使用 $bias_{ih}$ 和 $bias_{hh}$ 偏移参数。

输入：

input: [batch, input_size]
hidden: [batch, hidden_size]

输出：

$h^{'}$ ：[batch,hidden_size]

参数：

GRUCell.weight_ih: [3*hidden_size, input_size]
GRUCell.weight_hh: [3*hidden_size, hidden_size]
GRUCell.bias_ih: [3*hidden_size]
GRUCell.bias_hh: [3*hidden_size]

gru_cell = torch.nn.GRUCell(5,10)
input = torch.randn(2,5)
h_0 = torch.randn(2,10)
h1= gru_cell(input,h_0)
print(h1.shape)
>>torch.Size([2, 10])

[(para[0],para[1].shape) for para in list(gru_cell.named_parameters())]
>>[('weight_ih', torch.Size([30, 5])),
 ('weight_hh', torch.Size([30, 10])),
 ('bias_ih', torch.Size([30])),
 ('bias_hh', torch.Size([30]))]

GRU

torch.nn.GRU(*args,**kwargs)
$\begin{aligned} r_{t} &=\sigma\left(W_{i r} x_{t}+b_{i r}+W_{h r} h_{(t-1)}+b_{h r}\right) \\ z_{t} &=\sigma\left(W_{i z} x_{t}+b_{i z}+W_{h z} h_{(t-1)}+b_{h z}\right) \\ n_{t} &=\tanh \left(W_{i n} x_{t}+b_{i n}+r_{t} *\left(W_{h n} h_{(t-1)}+b_{h n}\right)\right) \\ h_{t} &=\left(1-z_{t}\right) * n_{t}+z_{t} * h_{(t-1)} \end{aligned}$

input_size：输入数据X的特征值的数目。
hidden_size：隐藏层的神经元数量，也就是隐藏层的特征数量。
num_layers：循环神经网络的层数，默认值是 1。
bias：默认为 True，如果为 false 则表示神经元不使用 $bias_{ih}$ 和 $bias_{hh}$ 偏移参数。
batch_first：如果设置为 True，则输入数据的维度中第一个维度就是 batch 值，默认为 False。默认情况下第一个维度是序列的长度，第二个维度才是 - - batch，第三个维度是特征数目。
dropout：如果不为空，则表示最后跟一个 dropout 层抛弃部分数据，抛弃数据的比例由该参数指定。默认为0。
bidirectional : If True, becomes a bidirectional RNN. Default: False

输入：

input: [seq_len, batch, input_size]
$h_{0}$ : [num_layers* num_directions, batch, hidden_size]

输出：

output: [seq_len, batch, num_directions * hidden_size]
$h_{n}$ : [num_layers * num_directions, batch, hidden_size]

参数：

GRU.weight_ih_l[k]: 包括(W_ir|W_iz|W_in), 第0层[3*hidden_size, input_size]，之后为[3*hidden_size, num_directions * hidden_size]
GRU.weight_hh_l[k]: 包括(W_hr|W_hz|W_hn), [3*hidden_size, hidden_size]
GRU.bias_ih_l[k]: 包括(b_ir|b_iz|b_in), [3*hidden_size]
GRU.bias_hh_l[k]: 包括(b_hr|b_hz|b_hn), [3*hidden_size]

gru = torch.nn.GRU(5,10,2)
input = torch.randn(4,2,5)
h_0 = torch.randn(2,2,10)
output,h1= gru(input,h_0)
print(output.shape,h1.shape)
>>torch.Size([4, 2, 10]) torch.Size([2, 2, 10])

[(para[0],para[1].shape) for para in list(gru.named_parameters())]
>>[('weight_ih_l0', torch.Size([30, 5])),
 ('weight_hh_l0', torch.Size([30, 10])),
 ('bias_ih_l0', torch.Size([30])),
 ('bias_hh_l0', torch.Size([30])),
 ('weight_ih_l1', torch.Size([30, 10])),
 ('weight_hh_l1', torch.Size([30, 10])),
 ('bias_ih_l1', torch.Size([30])),
 ('bias_hh_l1', torch.Size([30]))]