Recurrent Neural Network from Scratch

1. RNN结构

在这里插入图片描述

2. RNN 核心

RNN的几个核心公式如下:
在这里插入图片描述

3. Karpathy’s code

Karpathy仅使用numpy实现了min-char-rnn,值得一学。
代码中关于数据读入、处理以及算法的前向传播过程代码比较直观,比较难以理解的是反向传播的过程;

4. Pytorch API

4.1 Parameters

  1. input_size:输入x的特征数
  2. hidden_size:隐藏状态h的特征数
  3. num_layers:recurrent layer的层数
  4. nonlinearity:’tanh‘或’relu‘,默认是’tanh‘
  5. bias:True即使用bias weights
  6. batch_first:影响input的输入shape
  7. bidirectional:True则双向RNN

4.2 Inputs

  1. input:对于不以batch形式输入,input为(sequence_length, input_size)的shape的tensor;batch形式输入,若batch_first为False,则input为(sequence_length, batch_size, input_size)的shape的tensor,否则为(batch_size, sequence_length, input_size)的shape的tensor;
  2. h_0:默认为全是0的tensor,包含初始隐藏状态参数;batch形式输入,h_0为(D * num_layers, batch_size, hidden_size)的tensor,否则为(D * num_layers, hidden_size)的tensor(D=2 if bidirectional=True otherwise 1)

4.3 Outputs

  1. output:非batch形式输入,为(sequence_length, D*hidden_size)的tensor,否则为(sequence_length, D * hidden_size)【batch_first=False】或者(batch_size, sequence_length, D * hidden_size)【batch_first=True】的tensor
  2. h_n:对于batch中的每个元素,对batch中每个元素包含其最终的隐藏状态。对于batch形式的输入,其为shape是(D * num_layers, hidden_size)的tensor,否则为shape是(D * num_layers, hidden_size)的tensor。
  3. 在将rnn应用于具体任务的时候,在得到rnn的隐状态后,一般不会将其直接与任务的label相关联,而是使用一个全连接网络接在隐状态h_n后面,将全连接网络的输出与label对应

4.4 Test

使用MNIST数据集来测试rnn,设定sequence_length为28,num_layers为2,batch_size为100,hidden_size为128,input_size为28,测试代码如下:

for images, labels in train_loader:
    images = images.reshape(batch_size, sequence_length, input_size)
    labels = labels.to(device)
    model = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
    h0 = torch.randn((num_layers, batch_size, hidden_size))
    output, hn = model(images, h0)
    break

观察各个变量的形状:

# images.shape->(batch_size * sequence_length * input_size)
torch.Size([100, 28, 28])
# output.shape->(batch_size * input_size * hidden_size)
torch.Size([100, 28, 128])
# hn.shape->(sequence_length, batch_size, hidden_size)
torch.Size([2, 100, 128])

在利用rnn解决实际问题时,如之前所说,不将隐状态直接与label相关联,而是加一个全连接网络解码,实验代码如下:

"""
模型搭建,最后一个时刻的隐状态输出至全连接层
"""
class rnn(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(rnn, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True) # 选用LSTM RNN结构
        self.fc = nn.Linear(hidden_size, num_classes) # 最后一层为全连接层,将隐状态转为分类
    
    def forward(self, x):
        h0 = torch.randn((self.num_layers, batch_size, self.hidden_size)).to(device)
        
        # 前向传播RNN
        _, hn = self.rnn(x, h0)
        # 解码最后一个时刻的隐状态
        out = self.fc(hn[-1])
        return out

"""
实例化一个模型
"""
model_rnn = rnn(input_size, hidden_size, num_layers, num_classes).to(device)

# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# 训练
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, sequence_length, input_size).to(device) # 注意维度
        labels = labels.to(device)
        
        # 前向传播
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # 反向传播和优化,注意梯度每次清零
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

训练记录如下,效果很差:

Epoch [1/2], Step [100/600], Loss: 2.3821
Epoch [1/2], Step [200/600], Loss: 2.3016
Epoch [1/2], Step [300/600], Loss: 2.4099
Epoch [1/2], Step [400/600], Loss: 2.3305
Epoch [1/2], Step [500/600], Loss: 2.3753
Epoch [1/2], Step [600/600], Loss: 2.3711
Epoch [2/2], Step [100/600], Loss: 2.3520
Epoch [2/2], Step [200/600], Loss: 2.3180
Epoch [2/2], Step [300/600], Loss: 2.3457
Epoch [2/2], Step [400/600], Loss: 2.3717
Epoch [2/2], Step [500/600], Loss: 2.3852
Epoch [2/2], Step [600/600], Loss: 2.4222

5. RNN Types

在这里插入图片描述

6. References

  1. Minimalist Recurrent Neural Network
  2. Backpropagation Through Time and Vanishing Gradients
  3. The Unreasonable Effectiveness of Recurrent Neural Networks
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值