Pytorch实现简单的循环神经网络学习笔记

最新推荐文章于 2024-02-06 09:14:31 发布

卡塞尔学院临时校长

最新推荐文章于 2024-02-06 09:14:31 发布

阅读量743

点赞数

分类专栏：动手学深度学习pytorch笔记文章标签：神经网络深度学习

本文链接：https://blog.csdn.net/weixin_43901214/article/details/105074908

版权

动手学深度学习pytorch笔记专栏收录该内容

9 篇文章 1 订阅

订阅专栏

Pytorch实现简单的RNN

此内容还未涉及LSTM以及深度RNN

定义模型

我们使用Pytorch中的nn.RNN来构造循环神经网络。在本节中，我们主要关注nn.RNN的以下几个构造函数参数：

input_size - The number of expected features in the input x (应该是类似于词典大小)
hidden_size – The number of features in the hidden state h
nonlinearity – The non-linearity to use. Can be either ‘tanh’ or ‘relu’. Default: ‘tanh’
batch_first – If True, then the input and output tensors are provided as (batch_size, num_steps, input_size). Default: False

这里的batch_first决定了输入的形状，我们使用默认的参数False，对应的输入形状是 (num_steps, batch_size, input_size)。

forward函数(类似于前边学习中自定义的rnn函数)的参数为：

input of shape (num_steps, batch_size, input_size): tensor containing the features of the input sequence.
h_0 of shape (num_layers * num_directions, batch_size, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

这里对官方文档中h_0里的参数作补充说明（~~怕自己忘~~）：
h_0对应于前边从零创建rnn中的state，一个元组，前边这个元组我们只设定里边有一个元素，因为后面涉及LSTM时会有多个state）
num_layers与后边的深度循环网络有关、num_directions与后边的双向循环网络有关。而在这我们把它们设为1

forward函数的返回值是：

output of shape (num_steps, batch_size, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t.
h_n of shape (num_layers * num_directions, batch_size, hidden_size): tensor containing the hidden state for t = num_steps.
对pytorch官方文档的补充说明：pytorch中rnn做的是隐层的计算，所以这个output实际上是各个时间步隐藏状态的值，第三个维度是num_directions * hidden_size，num_directions 为1，hidden_size是隐藏状态的大小。返回值h_n是最后一个时间步返回的隐藏状态的值

这里rnn_layer的输入形状为(时间步数, 批量大小, 输入个数)。其中输入个数即one-hot向量长度（词典大小）。此外，rnn_layer作为nn.RNN实例，在前向计算后会分别返回输出和隐藏状态h，其中输出指的是隐藏层在各个时间步上计算并输出的隐藏状态，它们通常作为后续输出层的输入。需要强调的是，该“输出”本身并不涉及输出层计算，形状为(时间步数, 批量大小, 隐藏单元个数)。而nn.RNN实例在前向计算返回的隐藏状态指的是隐藏层在最后时间步的隐藏状态：当隐藏层有多层时，每一层的隐藏状态都会记录在该变量中；对于像长短期记忆（LSTM），隐藏状态是一个元组(h, c)，即hidden state和cell state。

后面会学到长短期记忆和深度循环神经网络。关于循环神经网络（以LSTM为例）的输出，可以参考下图（图片来源）
在这里插入图片描述

现在我们构造一个nn.RNN实例，并用一个简单的例子来看一下输出的形状。

rnn_layer = nn.RNN(input_size=vocab_size, hidden_size=num_hiddens) #生成实例 hid隐藏单元个数
num_steps, batch_size = 35, 2
X = torch.rand(num_steps, batch_size, vocab_size) #时间步、批量大小、字典大小(输入单元个数) 三维
state = None  #初始隐藏状态
Y, state_new = rnn_layer(X, state) # 返回Y和新的状态state 
print(Y.shape, state_new.shape)  
#返回 torch.Size([35, 2, 256])    torch.Size([1, 2, 256])
#（时间步数、批量大小、隐藏单元个数）（num_layers * num_directions(固定值1*1)、批量大小、隐藏单元个数）

我们定义一个完整的基于循环神经网络的语言模型。

'''参数rnn_layer可以理解为是pytorch中的一个rnn实例，后续可能还会用到这个类，到时候传进来的也可以是一个LSTM实例，vocab_size字典大小
2 if rnn_layer.bidirectional else 1 表示 rnnlayer若是双向的就乘2，单向的就乘1 ，这里先用单向的
'''
class RNNModel(nn.Module): #定义一个完整的基于rnn的语言模型
    def __init__(self, rnn_layer, vocab_size): 
        super(RNNModel, self).__init__()
        self.rnn = rnn_layer
        self.hidden_size = rnn_layer.hidden_size * (2 if rnn_layer.bidirectional else 1) 
        self.vocab_size = vocab_size
        self.dense = nn.Linear(self.hidden_size, vocab_size) #定义线性层
        #rnn——layer只是输出各个时间步的隐藏状态，而对于语言模型我们需要在每个时间步给出一个输出
 
    def forward(self, inputs, state): 
        # inputs.shape: (batch_size, num_steps)
        X = to_onehot(inputs, vocab_size) # X 列表里面有num_steps个为batch_size*vocab_size的元素
        X = torch.stack(X)  # X.shape: (num_steps, batch_size, vocab_size) 默认dim=0
        hiddens, state = self.rnn(X, state) # rnn前向计算
        hiddens = hiddens.view(-1, hiddens.shape[-1])  # hiddens.shape: (num_steps * batch_size, hidden_size)
        output = self.dense(hiddens) # 输出层的计算
        return output, state

类似的，我们需要实现一个预测函数，与前面的区别在于前向计算和初始化隐藏状态。

def predict_rnn_pytorch(prefix, num_chars, model, vocab_size, device, idx_to_char,
                      char_to_idx):
    state = None
    output = [char_to_idx[prefix[0]]]  # output记录prefix加上预测的num_chars个字符
    for t in range(num_chars + len(prefix) - 1):
        X = torch.tensor([output[-1]], device=device).view(1, 1)
        (Y, state) = model(X, state)  # 前向计算不需要传入模型参数
        if t < len(prefix) - 1:
            output.append(char_to_idx[prefix[t + 1]])
        else:
            output.append(Y.argmax(dim=1).item())
    return ''.join([idx_to_char[i] for i in output])

使用权重为随机值的模型来预测一次。

model = RNNModel(rnn_layer, vocab_size).to(device)
predict_rnn_pytorch('分开', 10, model, vocab_size, device, idx_to_char, char_to_idx)
# '分开胸呵以轮轮轮轮轮轮轮'

接下来实现训练函数，这里只使用了相邻采样。

def train_and_predict_rnn_pytorch(model, num_hiddens, vocab_size, device,
                                corpus_indices, idx_to_char, char_to_idx,
                                num_epochs, num_steps, lr, clipping_theta,
                                batch_size, pred_period, pred_len, prefixes):
    loss = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    model.to(device)
    for epoch in range(num_epochs): 
        l_sum, n, start = 0.0, 0, time.time() #先维护这几个量
        data_iter = d2l.data_iter_consecutive(corpus_indices, batch_size, num_steps, device) # 相邻采样，在每个epoch开始的时候要初始化隐藏状态，在pytorch中的rnn，如果没有提供state，会直接当成0来处理
        state = None
        for X, Y in data_iter:
            if state is not None: #不是第一个batch
                # 使用detach函数从计算图分离隐藏状态, 这是为了
                # 使模型参数的梯度计算只依赖一次迭代读取的小批量序列(防止梯度计算开销太大)
                if isinstance (state, tuple): # LSTM, state:(h, c)  
                    state[0].detach_()
                    state[1].detach_()
                else: 
                    state.detach_()
            (output, state) = model(X, state) # output.shape: (num_steps * batch_size, vocab_size)
            y = torch.flatten(Y.T)
            l = loss(output, y.long())
            
            optimizer.zero_grad()
            l.backward()
            grad_clipping(model.parameters(), clipping_theta, device)
            optimizer.step()
            l_sum += l.item() * y.shape[0]
            n += y.shape[0]
        

        if (epoch + 1) % pred_period == 0:
            print('epoch %d, perplexity %f, time %.2f sec' % (
                epoch + 1, math.exp(l_sum / n), time.time() - start))
            for prefix in prefixes:
                print(' -', predict_rnn_pytorch(
                    prefix, pred_len, model, vocab_size, device, idx_to_char,
                    char_to_idx))

训练模型。

num_epochs, batch_size, lr, clipping_theta = 250, 32, 1e-3, 1e-2
pred_period, pred_len, prefixes = 50, 50, ['分开', '不分开']
train_and_predict_rnn_pytorch(model, num_hiddens, vocab_size, device,
                            corpus_indices, idx_to_char, char_to_idx,
                            num_epochs, num_steps, lr, clipping_theta,
                            batch_size, pred_period, pred_len, prefixes)