PyTorch中利用LSTMCell搭建多层LSTM实现时间序列预测

Cyril_KI

已于 2024-04-05 22:39:59 修改

阅读量7.2k

点赞数 19

分类专栏：时间序列预测文章标签： pytorch lstm 深度学习 lstmcell

于 2022-12-14 11:45:06 首次发布

本文链接：https://blog.csdn.net/cyril_ki/article/details/128312874

版权

时间序列预测专栏收录该内容

50 篇文章 565 订阅

订阅专栏

前言

前面已经写过不少时间序列预测的文章：

这些文章中LSTM的模型都采用以下方法搭建：

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, batch_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.output_size = output_size
        self.num_directions = 1 # 单向LSTM
        self.batch_size = batch_size
        self.lstm = nn.LSTM(self.input_size, self.hidden_size, self.num_layers, batch_first=True)
        self.linear = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input_seq):
        batch_size, seq_len = input_seq.shape[0], input_seq.shape[1]
        h_0 = torch.randn(self.num_directions * self.num_layers, self.batch_size, self.hidden_size).to(device)
        c_0 = torch.randn(self.num_directions * self.num_layers, self.batch_size, self.hidden_size).to(device)
        # output(batch_size, seq_len, num_directions * hidden_size)
        output, _ = self.lstm(input_seq, (h_0, c_0)) # output(5, 30, 64)
        pred = self.linear(output)  # (5, 30, 1)
        pred = pred[:, -1, :]  # (5, 1)
        return pred

其中LSTM模型的定义语句为：

self.lstm = nn.LSTM(self.input_size, self.hidden_size, self.num_layers, batch_first=True, dropout=0.5)

如果num_layers=2, hidden_size=64，那么两层LSTM的hidden_size都为64，并且最后一层也就是第二层结束后不会执行dropout策略。

如果我们需要让两层LSTM的hidden_size不一样，并且每一层后都执行dropout，就可以采用LSTMCell来实现多层的LSTM。

LSTMCell

关于nn.LSTMCell的参数，官方文档给出的解释为：
在这里插入图片描述
参数一共三个，意义和之前文章讲的一样，不再重复。

利用LSTMCell搭建一个两层的LSTM如下所示：

class LSTM(nn.Module):
    def __init__(self, args):
        super().__init__()
        self.args = args
        self.input_size = args.input_size
        self.output_size = args.output_size
        self.num_directions = 1
        self.batch_size = args.batch_size
        self.lstm0 = nn.LSTMCell(args.input_size, hidden_size=128)
        self.lstm1 = nn.LSTMCell(input_size=128, hidden_size=32)
        self.dropout = nn.Dropout(p=0.4)
        self.linear = nn.Linear(32, self.output_size)

    def forward(self, input_seq):
        batch_size, seq_len = input_seq.shape[0], input_seq.shape[1]
        # batch_size, hidden_size
        h_l0 = torch.zeros(batch_size, 128).to(device)
        c_l0 = torch.zeros(batch_size, 128).to(device)
        h_l1 = torch.zeros(batch_size, 32).to(device)
        c_l1 = torch.zeros(batch_size, 32).to(device)
        output = []
        for t in range(seq_len):
            h_l0, c_l0 = self.lstm0(input_seq[:, t, :], (h_l0, c_l0))
            h_l0, c_l0 = self.dropout(h_l0), self.dropout(c_l0)
            h_l1, c_l1 = self.lstm1(h_l0, (h_l1, c_l1))
            h_l1, c_l1 = self.dropout(h_l1), self.dropout(c_l1)
            output.append(h_l1)

        pred = self.linear(output[-1])

        return pred

可以发现，我们定义了两个LSTMCell，分别对应两层：

self.lstm0 = nn.LSTMCell(args.input_size, hidden_size=128)
self.lstm1 = nn.LSTMCell(input_size=128, hidden_size=32)

第一层的input_size就为初始数据的input_size，第二层的input_size应当为第一层的hidden_size，这样才能实现数据传递。

使用LSTMCell时我们需要手动对每个时间步进行计算与传递：

for t in range(seq_len):
    h_l0, c_l0 = self.lstm0(input_seq[:, t, :], (h_l0, c_l0))
    h_l0, c_l0 = self.dropout(h_l0), self.dropout(c_l0)
    h_l1, c_l1 = self.lstm1(h_l0, (h_l1, c_l1))
    h_l1, c_l1 = self.dropout(h_l1), self.dropout(c_l1)
    output.append(h_l1)

input_seq的维度为：

input_seq(batch_size, seq_len, input_size)

每次取出其中一个步长参与运算：

h_l0, c_l0 = self.lstm0(input_seq[:, t, :], (h_l0, c_l0))

第一个LSTMCell的结果将被送入第二个LSTMCell：

h_l1, c_l1 = self.lstm1(h_l0, (h_l1, c_l1))

此时得到的是一个时间步的输出，维度大小为(batch_size, hidden_size)。重复执行多次，就可以得到所有步长的输出。最后，我们再取最后一个时间步（这里不懂请看第一篇文章）的输出进行映射以得到最终的输出：

pred = self.linear(output[-1])

可以发现，在每一个LSTMCell执行结束后，我们都可以手动添加dropout层：

h_l0, c_l0 = self.dropout(h_l0), self.dropout(c_l0)

反观LSTM的执行过程：

output, _ = self.lstm(input_seq, (h_0, c_0))

此时output的shape为：

output(batch_size, seq_len, hidden_size)

实际上就是一步到位，直接得到所有seq_len个(batch_size, hidden_size)。

训练/测试

这里没啥可说的，与前面一模一样。

Cyril_KI

关注

19
点赞
踩
53

收藏

觉得还不错? 一键收藏
打赏
10
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录