torch.nn.RNN
输入参数:
input_size:即对数据做embedding的数据维度feature_len
hidden_size:RNN的隐层维度
num_layers:RNN网络的层数,默认为1层
RNN的前向传播
out, h_t = RNN(x, h_0)
x:输入数据,维度为(seq_len, batch_size, feature_len)
h_0/h_t:隐藏输出,维度为(num_layers, batch_size, hidden_size)
out:为每一时刻隐层输出的列表集合,形如[h_1, h_2, …, h_t],维度为(seq_len, batch_size, hidden_size)
每一层RNN有两个共享权重矩阵WihW_{ih}Wih和WhhW_{hh}Whh(我们不用关心偏置矩阵b)
W_ih(W_xh):是对于输入数据x_t(batch_size, feature_len)相关的权重矩阵,维度为(hidden_size, feature_len)^T
W_hh:是关于前面传来数据的权重矩阵,维度为(hidden_size, hidden_size)
更新:
ht=tanh(xtWih+htWhh)h_t = tanh(x_tW_{ih}+h_tW_{hh})ht=tanh(xtWih+htWhh)
单层RNN代码验证
import torch
rnn = torch.nn.RNN(input_size=100, hidden_size=10, num_layers=1)
print(rnn._parameters.keys())
odict_keys([‘weight_ih_l0’, ‘weight_hh_l0’, ‘bias_ih_l0’,‘bias_hh_l0’])
print(rnn.weight_ih_l0.shape, rnn.weight_hh_l0.shape)
torch.Size([10, 100]) torch.Size([10, 10])
weight_ih_l0,l0代表第0层
可以看到WihW_{ih}Wih的维度为(hidden_size, feature_len)= (10,100)
WhhW_{hh}Whh维度为(hidden_size, hidden_size)= (10, 10)
# 输入数据x:(seq_len, batch_size, feature_len)
x = torch.randn(8, 5, 100)
# h_0的维度,(num_layers, batch_size, hidden_size)
h_0 = torch.zeros(1, 5, 10)
out, h_t = rnn.forward(x, h_0)
print(out.shape, h_t.shape)
torch.Size([8, 5, 10]) torch.Size([1, 5, 10])
可以自行和前面对比,没有问题
多层RNN代码验证
import torch
rnn = torch.nn.RNN(input_size=100, hidden_size=10, num_layers=2)
print(rnn._parameters.keys())
print(rnn.weight_ih_l0.shape, rnn.weight_hh_l0.shape)
print(rnn.weight_ih_l1.shape, rnn.weight_hh_l1.shape)
odict_keys([‘weight_ih_l0’, ‘weight_hh_l0’, ‘bias_ih_l0’, ‘bias_hh_l0’, ‘weight_ih_l1’, ‘weight_hh_l1’, ‘bias_ih_l1’ ‘bias_hh_l1’])
torch.Size([10, 100]) torch.Size([10, 10])
torch.Size([10, 10]) torch.Size([10, 10])
# 输入数据x:(seq_len, batch_size, feature_len)
x = torch.randn(8, 5, 100)
# h_0的维度,(num_layers, batch_size, hidden_size)
h_0 = torch.zeros(2, 5, 10)
out, h_t = rnn.forward(x, h_0)
print(out.shape, h_t.shape)
torch.Size([8, 5, 10]) torch.Size([2, 5, 10])
torch.nn.RNNCell
输入与RNN参数相同,但没有了num_layers这个参数
对于RNN我们是一次性将所有数据送进去并行处理,而对于RNNCell每轮都需要我们送入数据
即对于RNN,输入数据x维度(seq_len, batch_sizem feature_len),而对于RNNCell是x维度(bach_size, feature_len),但是我们手动的送入数据十次
RNNCell的前向传播
h_t = RNNCell(x_t, ht_1)
x_t:输入数据,维度为(batch_size, feature_len)
ht_1/h_t:隐层输出,维度为( batch_size, hidden_size)
out:隐藏输出集合,即形如[h_1, h_2, …, h_t],维度为(seq_len, batch_size, hidden_size)
权重与RNN相同
单层RNNCell代码验证
import torch
cell1 = torch.nn.RNNCell(input_size=100, hidden_size=10)
x = torch.randn(10, 5, 100)
h1 = torch.zeros(5, 10)
for x_t in x:
h1 = cell1(x_t, h1)
print(h1.shape)
torch.Size([5, 10])
多层RNNCell代码验证
import torch
cell1 = torch.nn.RNNCell(input_size=100, hidden_size=30)
cell2 = torch.nn.RNNCell(input_size=30, hidden_size=10)
x = torch.randn(10, 5, 100)
h1 = torch.zeros(5, 30)
h2 = torch.zeros(5, 10)
for x_t in x:
h1 = cell1(x_t, h1)
h2 = cell2(h1, h2)
print(h2.shape)
torch.Size([5, 10])