Deep Learning学习: Pytorch实现LSTM-火灾温度预测

qq_42967973

已于 2024-10-04 00:40:46 修改

阅读量900

点赞数 24

文章标签：深度学习学习 pytorch

于 2024-10-04 00:35:54 首次发布

本文链接：https://blog.csdn.net/qq_42967973/article/details/142697763

版权

一，前言(引用)

>- **🍨 本文为[🔗365天深度学习训练营](https://mp.weixin.qq.com/s/Z9yL_wt7L8aPOr9Lqb1K3w) 中的学习记录博客**
>- **🍖 原作者：[K同学啊](https://mtyjkh.blog.csdn.net/)**

二，训练准备

1.训练环境

Framework: Pytorch, matplotlib,numpy
Compiler：Jupyter Lab
Cpu: AMD Ryzen 5600H

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

device(type='cpu')

2.数据集

5948 行 × 4 列数据可视化如下：

使用 .iloc 方法按位置选择数据。: 表示选择所有行，1: 表示从第二列开始选择所有列（Python中的索引从0开始）。

这段代码的作用是去掉数据框 data 的第一列，创建一个新的数据框 dataFrame。

dataFrame = data.iloc[:,1:]

3.LSTM

介绍

长短期记忆神经网络（LSTM）是一种特殊的循环神经网络(RNN)。原始的RNN在训练中，随着训练时间的加长以及网络层数的增多，很容易出现梯度爆炸或者梯度消失的问题，导致无法处理较长序列数据，从而无法获取长距离数据的信息。

LSTM应用的领域包括：文本生成、机器翻译、语音识别、生成图像描述和视频标记等。

LSTM是一种特殊的RNN，主要是为了解决长序列训练过程中的梯度消失和梯度爆炸问题。

将时序数据按照可视化实现：

nn.lstm()

torch.nn.LSTM(input_size, hidden_size, num_layers=1, bias=True, batch_first=False, dropout=0.0, bidirectional=False, proj_size=0, device=None, dtype=None)

在时间步t时，
h_t是隐藏状态，
c_t是细胞状态，
x_t是输入，
h_(t-1)是时间t-1时的隐藏状态，或者在时间0时的初始隐藏状态，
i_t, f_t, g_t, o_t分别是输入门、遗忘门、细胞门和输出门。
σ是sigmoid函数，⊙是Hadamard乘积。

在多层LSTM中，第l层(l≥2)的输入x_t(l)是上一层l-1的隐藏状态h_t(l-1)乘以dropout δ_t(l-1)，其中每个δ_t(l-1)是一个Bernoulli随机变量，其值为0的概率等于dropout。

如果指定proj_size > 0，将使用带有投影的LSTM。这将以如下方式改变LSTM单元：首先，h_t的维度将从hidden_size改变为proj_size（W_hi的维度将相应改变）。其次，每层的输出隐藏状态将乘以一个可学习的投影矩阵：h_t = W_hr h_t。需要注意的是，作为这一变化的结果，LSTM网络的输出形状也会有所不同。有关所有变量的确切维度，请参见下面的输入/输出部分。更多细节可以在https://arxiv.org/abs/1402.1128中找到。

参数：

- input_size – 输入x中期望的特征数量
- hidden_size – 隐藏状态h中的特征数量
- num_layers – 循环层的数量。例如，设置num_layers=2意味着将两个LSTM堆叠在一起，形成一个堆叠LSTM，第二个LSTM将接收第一个LSTM的输出并计算最终结果。默认值：1
- bias – 如果为False，则该层不使用偏置权重b_ih和b_hh。默认值：True
- batch_first – 如果为True，则输入和输出张量将以(batch, seq, feature)的形式提供，而不是(seq, batch, feature)。注意，这不适用于隐藏状态或细胞状态。详细信息请参见下面的输入/输出部分。默认值：False
- dropout – 如果不为零，则在每个LSTM层的输出上（除最后一层外）引入一个dropout层，dropout的概率等于dropout。默认值：0
- bidirectional – 如果为True，则成为双向LSTM。默认值：False
- proj_size – 如果> 0，则使用对应大小的LSTM投影。默认值：0

双向LSTM：

将LSTM设置为双向的优点和缺点：

优点：

* 更全面的上下文信息：双向LSTM可以同时考虑序列中前面的和后面的信息，从而获得更全面的上下文信息。这对于自然语言处理任务（如情感分析、命名实体识别）尤其有用，因为这些任务往往需要考虑整个句子或段落的语义。

* 提高模型性能：在许多任务中，双向LSTM可以显著提高模型的性能，例如提高分类准确率、降低错误率等。

* 捕捉更复杂的模式：双向LSTM可以捕捉序列中更复杂的模式和依赖关系，例如长距离的依赖关系。

缺点：

* 计算量更大：相比单向LSTM，双向LSTM的计算量更大，需要更多的计算资源。

* 模型复杂度更高：双向LSTM的模型结构更加复杂，需要更多的参数进行训练，这可能导致过拟合的问题。

* 实时性较差：由于需要同时考虑序列中的前后信息，双向LSTM在处理实时数据时可能存在一定的延迟。

总结：

将LSTM设置为双向可以显著提高模型的性能，但同时也增加了模型的复杂度和计算量。因此，是否采用双向LSTM需要根据具体的任务和数据集来进行权衡。如果任务对上下文信息的依赖性较强，并且计算资源充足，那么采用双向LSTM是一个不错的选择。

何时考虑使用双向LSTM：

* 自然语言处理任务：如情感分析、命名实体识别、机器翻译等。

* 时间序列预测：当需要考虑历史数据和未来数据对当前预测的影响时。

* 语音识别：当需要考虑语音信号中的上下文信息时。

何时不考虑使用双向LSTM：

* 实时性要求较高：如果对模型的响应速度有较高的要求，则不适合使用双向LSTM。

* 数据量较小：如果训练数据量较小，使用双向LSTM可能会导致过拟合。

* 计算资源有限：如果计算资源有限，无法支撑双向LSTM的训练，则需要考虑其他模型。

三，训练过程

初始化:

import torch.nn.functional as F
import numpy as np
import pandas as pd
import torch
from torch import nn
import matplotlib.pyplot as plt
import seaborn as sns

导入数据:

data = pd.read_csv("woodpine2.csv")

dataFrame = data.iloc[:,1:]

预处理：

对特定列的数据进行归一化处理，对`dataFrame`中指定的列（'CO 1'、'Soot 1'、'Tem1'）进行归一化处理，将这些列的值缩放到0到1之间，以便在后续处理或建模时，这些特征的数值范围保持一致。

具体步骤如下：

1. 导入库：

from sklearn.preprocessing import MinMaxScaler

导入`MinMaxScaler`，这是一个用于将数据按最小最大值进行缩放的类。`MinMaxScaler`将数据缩放到给定的范围内（默认为0到1）。

2.复制数据：

   dataFrame = data.iloc[:,1:].copy()

`data.iloc[:, 1:]`表示从`data`数据框中提取除第一列之外的所有列（假设第一列可能是索引或无关列）。然后，用`.copy()`方法创建一个新的数据框`dataFrame`，以免修改原始数据。

3. 初始化MinMaxScaler：

sc = MinMaxScaler(feature_range=(0,1))

初始化`MinMaxScaler`，指定特征缩放的范围为0到1。这个缩放器会将数据的最小值映射为0，最大值映射为1，中间的数值按比例缩放。

4. **逐列缩放数据**：

 for i in ['CO 1','Soot 1','Tem1']:
       dataFrame[i] = sc.fit_transform(dataFrame[i].values.reshape(-1,1))

这段代码对`dataFrame`中的三列数据——'CO 1'、'Soot 1'、'Tem1'——进行归一化处理。具体过程：
- `dataFrame[i].values`提取出第`i`列的数据，以NumPy数组的形式表示。
- `reshape(-1,1)`将数组从一维变为二维（这是因为`MinMaxScaler`需要二维数组作为输入，数据通常按行进行操作）。
- `sc.fit_transform()`会根据每一列的数据计算最小值和最大值，并将这些数据缩放到0到1之间。
- 缩放后的数据会替换原数据框`dataFrame`中的对应列。

5. 输出数据框形状：

 print("dataFrame",dataFrame.shape)

dataFrame (5948, 3)

然后将时间序列数据转换为模型训练所需的输入（X）和输出（y），以便能够在机器学习模型（例如神经网络）中进行训练。

width_X = 8
width_y = 1

X = []
y =[]

in_start = 0
for _,_ in data.iterrows():
    in_end = in_start + width_X
    out_end = in_end + width_y

    if out_end<len(dataFrame):
        X_ = np.array(dataFrame.iloc[in_start:in_end,])
        y_ = np.array(dataFrame.iloc[in_end:out_end,0])
        X.append(X_)
        y.append(y_)
    
    in_start = in_start+1

X = np.array(X)
y = np.array(y).reshape(-1,1,1)

print("X", X.shape, "y", y.shape)

得到 X (5939, 8, 3) y (5939, 1, 1)。前8个时间段的数据给X,后面一个给y。

检查是否有空值，得到False：

print(np.any(np.isnan(X)))
print(np.any(np.isnan(y)))

划分数据集：

X_train = torch.tensor(np.array(X[:5000]),dtype=torch.float32)
y_train = torch.tensor(np.array(y[:5000]),dtype= torch.float32)
X_test = torch.tensor(np.array(X[:5000]),dtype=torch.float32)
y_test = torch.tensor(np.array(y[:5000]),dtype= torch.float32)
print("X_train", X_train.shape, "y_train", y_train.shape)

X_train torch.Size([5000, 8, 3]) y_train torch.Size([5000, 1, 1])

加载：

from torch.utils.data import TensorDataset,DataLoader
train_dl = DataLoader(TensorDataset(X_train,y_train),batch_size=64,shuffle=False)
test_dl = DataLoader(TensorDataset(X_test,y_test),batch_size=64,shuffle=False)

LSTM：

class model_lstm(nn.Module):
    def __init__(self):
        super(model_lstm,self).__init__()
        self.lstm0 = nn.LSTM(input_size=3,hidden_size=320,num_layers=1,batch_first=True)
        self.lstm1 = nn.LSTM(input_size=320,hidden_size=320,num_layers=1,batch_first=True)
        self.fc0 = nn.Linear(320,1)

    def forward(self,x):
        out,hidden1 = self.lstm0(x)
        out,_ = self.lstm1(out,hidden1)
        out = self.fc0(out)
        return out[:,-1:,:]  # Keeping only the last time step output
    
model = model_lstm()

LSTM层：
- self.lstm0 = nn.LSTM(input_size=3, hidden_size=320, num_layers=1, batch_first=True)：
  - 这是模型的第一个LSTM层，接受的输入特征维度为3，隐藏层的特征维度为320，LSTM层的层数为1。
  - batch_first=True 指定输入数据的维度顺序为 (batch_size, seq_len, input_size)，也就是以batch为第一维度。
- self.lstm1 = nn.LSTM(input_size=320, hidden_size=320, num_layers=1, batch_first=True)：
  - 第二个LSTM层，接受前一个LSTM层的输出（大小为320），继续处理。
全连接层：
- self.fc0 = nn.Linear(320, 1)：
  - 全连接层，将第二个LSTM层的输出从320维的特征映射到1维，用于输出最终预测结果。

前向传播 (forward method)：

第一个LSTM层：
- out, hidden1 = self.lstm0(x)：将输入x传入第一个LSTM层。out是LSTM层的输出，hidden1是LSTM的隐藏状态（包括h_n和c_n）。
第二个LSTM层：
- out, _ = self.lstm1(out, hidden1)：将第一个LSTM层的输出和其隐藏状态传递给第二个LSTM层，继续处理。
全连接层：
- out = self.fc0(out)：将第二个LSTM层的输出传递给全连接层，将输出维度从320变成1，得到每个时间步的预测值。
只保留最后一个时间步的输出：
- return out[:, -1:, :]：LSTM模型通常会输出整个序列的预测，但在这段代码中，只取最后一个时间步的预测作为最终输出，这种方法常用于预测下一个时间步的值。

结构：

model_lstm(
  (lstm0): LSTM(3, 320, batch_first=True)
  (lstm1): LSTM(320, 320, batch_first=True)
  (fc0): Linear(in_features=320, out_features=1, bias=True)
)

也可用torch.summary实现：

summary(model, input_size=(8, 3))

测试模型：

output = model(torch.rand(30, 8, 3))
print(output.shape)

torch.Size([30, 1, 1])

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            LSTM-1          [-1, 8, 320]             414,720
            LSTM-2          [-1, 8, 320]             820,480
            Linear-3         [-1, 1, 1]                321
================================================================
Total params: 1,235,521
Trainable params: 1,235,521
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.06
Params size (MB): 4.71
Estimated Total Size (MB): 4.77
----------------------------------------------------------------

训练循环：

import copy
def train(train_dl, model, loss_fn, opt, lr_scheduler=None):
    size = len(train_dl.dataset)
    num_batches = len(train_dl)
    train_loss = 0
    for x, y in train_dl:
        x, y = x.to(device), y.to(device)

        pred = model(x)
        loss = loss_fn(pred, y)

        opt.zero_grad()
        loss.backward()
        opt.step()

        train_loss += loss.item()

    if lr_scheduler is not None:
        lr_scheduler.step()
        print("learning rate = {:.5f}".format(opt.param_groups[0]['lr']), end=" ")

    train_loss /= num_batches
    return train_loss

学习率调度器(可选)：

如果 lr_scheduler 不为 None，则调用调度器的 step() 方法更新学习率。

测试循环：

def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss = 0

    with torch.no_grad():
        for x, y in dataloader:  
            x, y = x.to(device), y.to(device)

            y_pred = model(x)
            loss = loss_fn(y_pred, y)

            test_loss += loss.item()

    test_loss /= num_batches
    return test_loss

训练过程：

model = model_lstm()
model = model.to(device)
loss_fn = nn.MSELoss()
learn_rate = 1e-1
opt = torch.optim.SGD(model.parameters(),lr=learn_rate,weight_decay=1e-4)
epochs = 50
train_loss = []
test_loss = []
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(opt,epochs,last_epoch=-1)

for epoch in range(epochs):
                   model.train()
                   epoch_train_loss = train(train_dl,model,loss_fn,opt,lr_scheduler)

                   model.eval()
                   epoch_test_loss = test(test_dl,model,loss_fn)

                   train_loss.append(epoch_train_loss)
                   test_loss.append(epoch_test_loss)

                   template = ('Epoch:{:2d},Train_loss:{:.5f},Test_loss:{:.5f}')
                   print(template.format(epoch+1,epoch_train_loss,epoch_test_loss))

print("="*20,'Done',"="*20)

learning rate = 0.09990 Epoch: 1,Train_loss:0.00127,Test_loss:0.18627
learning rate = 0.09961 Epoch: 2,Train_loss:0.01406,Test_loss:0.17888
learning rate = 0.09911 Epoch: 3,Train_loss:0.01368,Test_loss:0.17160
learning rate = 0.09843 Epoch: 4,Train_loss:0.01326,Test_loss:0.16411
learning rate = 0.09755 Epoch: 5,Train_loss:0.01277,Test_loss:0.15613
learning rate = 0.09649 Epoch: 6,Train_loss:0.01221,Test_loss:0.14741
learning rate = 0.09524 Epoch: 7,Train_loss:0.01155,Test_loss:0.13776
learning rate = 0.09382 Epoch: 8,Train_loss:0.01078,Test_loss:0.12705
learning rate = 0.09222 Epoch: 9,Train_loss:0.00991,Test_loss:0.11526
learning rate = 0.09045 Epoch:10,Train_loss:0.00894,Test_loss:0.10248
learning rate = 0.08853 Epoch:11,Train_loss:0.00789,Test_loss:0.08898
learning rate = 0.08645 Epoch:12,Train_loss:0.00680,Test_loss:0.07519
learning rate = 0.08423 Epoch:13,Train_loss:0.00570,Test_loss:0.06165
learning rate = 0.08187 Epoch:14,Train_loss:0.00465,Test_loss:0.04895
learning rate = 0.07939 Epoch:15,Train_loss:0.00369,Test_loss:0.03759
learning rate = 0.07679 Epoch:16,Train_loss:0.00285,Test_loss:0.02794
learning rate = 0.07409 Epoch:17,Train_loss:0.00215,Test_loss:0.02013
learning rate = 0.07129 Epoch:18,Train_loss:0.00159,Test_loss:0.01411
learning rate = 0.06841 Epoch:19,Train_loss:0.00117,Test_loss:0.00966
learning rate = 0.06545 Epoch:20,Train_loss:0.00085,Test_loss:0.00650
learning rate = 0.06243 Epoch:21,Train_loss:0.00063,Test_loss:0.00432
learning rate = 0.05937 Epoch:22,Train_loss:0.00047,Test_loss:0.00286
learning rate = 0.05627 Epoch:23,Train_loss:0.00036,Test_loss:0.00190
learning rate = 0.05314 Epoch:24,Train_loss:0.00028,Test_loss:0.00128
learning rate = 0.05000 Epoch:25,Train_loss:0.00023,Test_loss:0.00088
...
learning rate = 0.00039 Epoch:48,Train_loss:0.00015,Test_loss:0.00015
learning rate = 0.00010 Epoch:49,Train_loss:0.00015,Test_loss:0.00015
learning rate = 0.00000 Epoch:50,Train_loss:0.00015,Test_loss:0.00015
==================== Done ====================

可视化：

import matplotlib.pyplot as plt

plt.figure(figsize=(5,3),dpi=120)

plt.plot(train_loss,label='LSTM Training Loss')
plt.plot(test_loss,label= 'LSTM Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.show()

验证示例：

predicted_y_lstm= sc.inverse_transform(model(X_test).detach().numpy().reshape(-1,1))
y_test_1 = sc.inverse_transform(y_test.reshape(-1,1))
y_test_one = [i[0] for i in y_test_1]
predicted_y_lstm_one= [i[0] for i in predicted_y_lstm]

plt.figure(figsize=(5,3), dpi=120)
plt.plot(y_test_one[:2000],color='red',label='real_temp')
plt.plot(predicted_y_lstm_one[:2000],color='blue',label='prediction')

plt.title('Title')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()

from sklearn import metrics

RMSE_lstm = metrics.mean_squared_error(predicted_y_lstm_one,y_test_1)**0.5
R2_lstm = metrics.r2_score(predicted_y_lstm_one,y_test_1)

print('均方根误差:%.5f' % RMSE_lstm)
print('R2:%.5f' % R2_lstm)

均方根误差:3.49072 R2:0.99709

四，总结

在本次实验中，通过使用 PyTorch 进行数据处理、模型构建、训练与测试，深入了解了时间序列预测任务中的 LSTM 网络模型训练过程，重点包括以下几个方面：

1. 数据预处理：
我们首先对数据进行了归一化处理，确保模型训练时不同特征的数值范围一致。然后使用滑动窗口机制，将时间序列数据转换为模型的输入特征 `X` 和输出目标 `y`。输入数据形状为 `(batch_size, seq_len, feature_dim)`，通过多步滑动生成样本。

2. 模型构建：
使用了两层 LSTM 叠加网络，结合全连接层实现时间序列的回归预测。通过 PyTorch 的 `nn.LSTM` 和 `nn.Linear` 实现网络结构，重点介绍了如何通过设置 `input_size`、`hidden_size` 和 `batch_first` 构建合适的 LSTM 网络。

3. 前向传播与反向传播：
前向传播中，LSTM 网络逐步处理输入序列的每个时间步数据，最后通过全连接层得到每个序列的预测值。通过反向传播计算梯度并更新参数，使得模型逐步学习训练数据中的模式。

4. 学习率调度器：
通过使用 `torch.optim.lr_scheduler.CosineAnnealingLR` 对学习率进行动态调整，有效改善了模型的收敛性能。学习率从初始值逐渐衰减，避免了在训练中期因学习率过大导致的训练不稳定。

5. 损失函数与优化器：
使用均方误差损失函数 (`nn.MSELoss`) 计算预测值与真实标签之间的误差，并通过随机梯度下降 (`SGD`) 优化器进行参数更新。通过可视化训练和测试损失的变化，验证了模型的收敛性和泛化能力。

6. 模型评估与验证：
训练结束后，我们通过可视化训练损失和验证损失，观察模型在不同训练阶段的表现。同时，模型对测试集进行了预测，并将预测结果与真实值进行对比，通过绘制对比图清晰展示模型的预测效果。

7. 总结：
本实验通过调整 LSTM 网络的层数、隐藏单元数、学习率等超参数，提升了模型对时间序列的预测能力。在动态学习率的帮助下，训练过程更加稳定，模型性能也得到了进一步提升。实验中还演示了如何通过数据的预处理、滑动窗口机制构建时间序列样本，提升模型对时序特征的学习能力。

通过这次实验，掌握了在训练深度学习模型中的完整流程，包括从数据预处理、模型设计、训练优化到最终评估的全过程操作，深入理解了 PyTorch 在时间序列预测中的应用。