1. RNN结构
2. RNN 核心
RNN的几个核心公式如下:
3. Karpathy’s code
Karpathy仅使用numpy实现了min-char-rnn,值得一学。
代码中关于数据读入、处理以及算法的前向传播过程代码比较直观,比较难以理解的是反向传播的过程;
4. Pytorch API
4.1 Parameters
- input_size:输入x的特征数
- hidden_size:隐藏状态h的特征数
- num_layers:recurrent layer的层数
- nonlinearity:’tanh‘或’relu‘,默认是’tanh‘
- bias:True即使用bias weights
- batch_first:影响input的输入shape
- bidirectional:True则双向RNN
4.2 Inputs
- input:对于不以batch形式输入,input为
(sequence_length, input_size)
的shape的tensor;batch形式输入,若batch_first为False,则input为(sequence_length, batch_size, input_size)
的shape的tensor,否则为(batch_size, sequence_length, input_size)
的shape的tensor; - h_0:默认为全是0的tensor,包含初始隐藏状态参数;batch形式输入,h_0为
(D * num_layers, batch_size, hidden_size)
的tensor,否则为(D * num_layers, hidden_size)
的tensor(D=2 if bidirectional=True otherwise 1)
4.3 Outputs
- output:非batch形式输入,为
(sequence_length, D*hidden_size)
的tensor,否则为(sequence_length, D * hidden_size)
【batch_first=False】或者(batch_size, sequence_length, D * hidden_size)
【batch_first=True】的tensor - h_n:对于batch中的每个元素,对batch中每个元素包含其最终的隐藏状态。对于batch形式的输入,其为shape是
(D * num_layers, hidden_size)
的tensor,否则为shape是(D * num_layers, hidden_size)
的tensor。 - 在将rnn应用于具体任务的时候,在得到rnn的隐状态后,一般不会将其直接与任务的label相关联,而是使用一个全连接网络接在隐状态h_n后面,将全连接网络的输出与label对应
4.4 Test
使用MNIST数据集来测试rnn,设定sequence_length为28,num_layers为2,batch_size为100,hidden_size为128,input_size为28,测试代码如下:
for images, labels in train_loader:
images = images.reshape(batch_size, sequence_length, input_size)
labels = labels.to(device)
model = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
h0 = torch.randn((num_layers, batch_size, hidden_size))
output, hn = model(images, h0)
break
观察各个变量的形状:
# images.shape->(batch_size * sequence_length * input_size)
torch.Size([100, 28, 28])
# output.shape->(batch_size * input_size * hidden_size)
torch.Size([100, 28, 128])
# hn.shape->(sequence_length, batch_size, hidden_size)
torch.Size([2, 100, 128])
在利用rnn解决实际问题时,如之前所说,不将隐状态直接与label相关联,而是加一个全连接网络解码,实验代码如下:
"""
模型搭建,最后一个时刻的隐状态输出至全连接层
"""
class rnn(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(rnn, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True) # 选用LSTM RNN结构
self.fc = nn.Linear(hidden_size, num_classes) # 最后一层为全连接层,将隐状态转为分类
def forward(self, x):
h0 = torch.randn((self.num_layers, batch_size, self.hidden_size)).to(device)
# 前向传播RNN
_, hn = self.rnn(x, h0)
# 解码最后一个时刻的隐状态
out = self.fc(hn[-1])
return out
"""
实例化一个模型
"""
model_rnn = rnn(input_size, hidden_size, num_layers, num_classes).to(device)
# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# 训练
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
images = images.reshape(-1, sequence_length, input_size).to(device) # 注意维度
labels = labels.to(device)
# 前向传播
outputs = model(images)
loss = criterion(outputs, labels)
# 反向传播和优化,注意梯度每次清零
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i+1) % 100 == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, num_epochs, i+1, total_step, loss.item()))
训练记录如下,效果很差:
Epoch [1/2], Step [100/600], Loss: 2.3821
Epoch [1/2], Step [200/600], Loss: 2.3016
Epoch [1/2], Step [300/600], Loss: 2.4099
Epoch [1/2], Step [400/600], Loss: 2.3305
Epoch [1/2], Step [500/600], Loss: 2.3753
Epoch [1/2], Step [600/600], Loss: 2.3711
Epoch [2/2], Step [100/600], Loss: 2.3520
Epoch [2/2], Step [200/600], Loss: 2.3180
Epoch [2/2], Step [300/600], Loss: 2.3457
Epoch [2/2], Step [400/600], Loss: 2.3717
Epoch [2/2], Step [500/600], Loss: 2.3852
Epoch [2/2], Step [600/600], Loss: 2.4222