Air Passengers(time series)

Mayese

已于 2022-04-23 16:37:19 修改

阅读量3.1k

点赞数

分类专栏： # Kaggle 文章标签： python

于 2022-04-23 16:36:19 首次发布

原文链接：https://www.kaggle.com/code/ohseokkim/predicting-future-by-lstm-prophet-neural-prophet#Predicting-by-Prophet

版权

Kaggle 专栏收录该内容

2 篇文章

订阅专栏

数据集介绍：

数据集名称：AirPassengers.csv;大小：1.75kB

df.shape:144*3
144行：从1949，January 到1960，December
3列：year,month,passengers
df.head()
df.tail()
简单绘制一下数据的图像
（1）绘制月份图

import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(df['month'])

在这里插入图片描述
(2)绘制乘客列的图像

plt.figure(figsize=(12,5))
plt.title("Month vs Passenger",fontsize = 20) #fontsize指字体大小
plt.ylabel("Total Passengers",fontsize = 20)
plt.xlabel("Months",fontsize = 20)
plt.grid(True)
plt.autoscale(axis = 'x',tight = True) #autoscale：自动缩放
plt.xticks(fontsize = 20)
plt.yticks(fontsize = 20)
plt.plot(df['Passengers']

在这里插入图片描述
与plt.plot(df[‘Passengers’])的对比
少了网格，少了横纵坐标标记还有就是标题

时间序列分析

通过时间序列分解（decompose）将时间序列数据分解为Trend(趋势）、Seasonality（季节）和Residual

时间序列：周期性、季节性、平稳性、趋势、不规则变动
调取的包：statsmodel
statsmodels.api,
statsmodels.tsa.stattools: acf（自相关）
statsmodels.tsa.seasonal：pacf（偏自相关）
statsmodels.tsa.seasonal:seasonal_decompose（季节性）

import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.stattools import acf
from statsmodels.tsa.stattools import pacf
from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(df['Passengers'],period = 12)
plot_decompose(decomposition)

在这里插入图片描述

LSTM

A common LSTM unit is composed of a cell, an input gate（输入门）, an output gate （输出门）and a forget gate（遗忘门）. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.
LSTM 网络非常适合基于时间序列数据进行分类、处理和预测，因为时间序列中的重要事件之间可能存在未知持续时间的滞后。 LSTM 的开发是为了解决在训练传统 RNN 时可能遇到的梯度消失问题。对间隙长度的相对不敏感是 LSTM 在众多应用中优于 RNN、隐马尔可夫模型和其他序列学习方法的优势。

模型简单介绍
建模是使用一个简单的 LTSM 层完成的。
（1） Input_size：对应输入序列的个数。序列长度为12，但每个月只有1个值，即乘客总数，所以输入大小为1。
（2）Hidden_layer_size：指定隐藏层数。
（3）output_size：输出大小为1，因为输出中的物品数量预测了下个月的乘客数量。

#加载包
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F

#定义LSTM：输入，输出为1；隐藏层有128；层数为2
class LSTM(nn.Module):
    def __init__(self, input_size=1, hidden_layer_size=128, num_layers=2, output_size=1):
        super().__init__()
        self.hidden_layer_size = hidden_layer_size
        self.lstm = nn.LSTM(input_size, hidden_layer_size, num_layers=num_layers)
        self.linear = nn.Linear(hidden_layer_size, output_size)

    def forward(self, input_seq):
        lstm_out, _ = self.lstm(input_seq.view(len(input_seq) ,1, -1))
        predictions = self.linear(lstm_out[:,-1,:])
        return predictions[-1]
        
#模型、损失函数、优化
model = LSTM()
loss_function = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),lr=0.001)

print(model)

在这里插入图片描述

training

epochs = 500
for  i in range(epochs):
    for seq,labels in train_inout_seq:
         optimizer.zero_grad()
         
         y_pred = model(seq)
         
         single_loss = loss_function(y_pred,labels)
         single_loss.backward()
         optimizer.step()
     
     if i%25 == 1:
          print(f'epoch:{i:3} loss:{signal_loss.item():10.8f}')
   print(f'epoch:{i:3}loss: {single_loss.item():10.8f}')

在这里插入图片描述

predicting

测试集中含有的是接下来12个月的乘客数

fut_pred = 12
test_inputs = train_data_normalized[-train_window:].tolist()
print(test_inputs)
[0.12527473270893097, 0.04615384712815285, 0.3274725377559662, 0.2835164964199066, 0.3890109956264496, 0.6175824403762817, 0.9516483545303345, 1.0, 0.5780220031738281, 0.33186814188957214, 0.13406594097614288, 0.32307693362236023]
len(test_inputs)
12

model.eval()

for i in range(fut_pred):
    seq = torch.FloatTensor(test_inputs[-train_window:])
    with torch.no_grad():
        test_inputs.append(model(seq).item())

test_inputs[fut_pred:]

在这里插入图片描述

转换为实际值（converting to real values）

scaler.inverse_transform()

由于我们对训练数据集进行了归一化（normalized），因此预测值也被归一化。我们需要将归一化的预测值转换为实际的预测值。使用用于规范化数据集的最小/最大缩放器对象的 inverse_transform 将其转换为其原始值。

actual_predictions = scaler.inverse_transform(np.array(test_input[train_window:]).reshape(-1,1))
print(actual_predictions)
[[437.15221213]
 [426.91397609]
 [429.98806806]
 [418.25631382]
 [494.82071275]
 [542.00735056]
 [550.70401156]
 [512.5313943 ]
 [476.29762107]
 [467.82860631]
 [475.12440497]
 [462.12389046]]

x = np.arange(132,144,1)
print(x)
[132 133 134 135 136 137 138 139 140 141 142 143]

checking results（检查结果）

验证：LSTM 的预测由橙色线表示。虽然结果并不准确，但可以根据过去 12 个月旅行的乘客总数的波动发现上升趋势。通过在 LSTM 层中使用更多的 epoch 和更多的神经元可以实现更好的性能。

plt.figure(figsize=(12,5))
plt.title('Month vs Passenger',fontsize = 20)
plt.ylabel('Total Passengers',fontsize = 20)
plt.xlabel('Montha',fontsize = 20)
plt.grid(True)
plt.autoscale(axis='x',tight=True)
plt.xticks(fontsize = 20)
plt.yticks(fontsize = 20)
plt.plot(df['passengers'])
plt.plot(x,actual_predictions)

在这里插入图片描述
截取后12个月（1960年数据）

plt.title('Month vs Passenger')
plt.ylabel('Total Passengers')
plt.grid(True)
plt.autoscale(axis='x', tight=True)

plt.plot(flight_data['passengers'][-train_window:])
plt.plot(x,actual_predictions)
plt.show()

在这里插入图片描述

具有预测结果的时间序列分析

检查预测结果是否学习到，同时保留了原始时间序列所具有的时间序列的趋势、季节性和残差。

df['passengers'][:-train_window]
train_df = pd.DataFrame(df['passengers'][:-train_window])
actual_df = pd.DataFrame(actual_predictions)
actual_df.columns = ['passengers']
new_predict = pd.concat([train_df,actual_df]).reset_index(drop=True)

plt.figure(figsize=(12,5))
plt.title('Month vs Passenger',fontsize = 20)
plt.ylabel('Total Passengers',fontsize = 20)
plt.xlabel('Months',fontsize = 20)
plt.grid(True)
plt.autoscale(axis='x',tight=True)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.plot(new_predict)
plt.plot(df['passengers'])

在这里插入图片描述
季节分解

decomposition = seasonal_decompose(new_predict, period=12) 
plot_decompose(decomposition)

在这里插入图片描述

predicting by Prophet

Prophet 是一种基于加法模型预测时间序列数据的程序，其中非线性趋势与年、周和日季节性以及假日效应相匹配。它最适用于具有强烈季节性影响和几个季节历史数据的时间序列。 Prophet 对缺失数据和趋势变化具有鲁棒性，并且通常可以很好地处理异常值。

预处理:如果您尝试使用 Prophet 进行训练，则需要根据 Prophet 要求的条件更改列和数据类型。

flight_data = df.copy()

month2int = {
'January':1,
'February':2,
'March':3,
'April':4,
'May':5,
'June':6,
'July':7,
'August':8,
'September':9,
'October':10,
'November':11,
'December':12
}
flight_data['month'] = flight_data['month'].map(month2int)

在这里插入图片描述

df['day'] = 1
df['ds'] = pd.to_datetime(df[['year','month','day']])
df_new = df.drop(columns=['year','month','day'])
df_new.rename(columns={"passengers": "y"},inplace=True)
df_new.head()

在这里插入图片描述

训练

from fbprophet import Prophet
m = Prophet()
m.fit(df_new)

在这里插入图片描述

预测

future = m.make_future_dataframe(periods = 500)
forecast = m.predict(future)

在这里插入图片描述

检查结果

fig2 = m.plot_components(forecast)

在这里插入图片描述

from fbprophet.plot import plot_plotly, plot_components_plotly

plot_plotly(m, forecast)

在这里插入图片描述

plot_components_plotly(m,forecast)

在这里插入图片描述

NeuralProphet

NeuralProphet is a Neural Network based PyTorch implementation of a user-friendly time series forecasting tool for practitioners. This is heavily inspired by Prophet, which is the popular forecasting tool developed by Facebook. NeuralProphet is developed in a fully modular architecture which makes it scalable to add any additional components in the future. Our vision is to develop a simple to use forecasting tool for users while retaining the original objectives of Prophet such as interpretability, configurability and providing much more such as the automatic differencing capabilities by using PyTorch as the backend.NeuralProphet 是一个基于神经网络的 PyTorch 实现，为从业者提供了一个用户友好的时间序列预测工具。这在很大程度上受到了 Prophet 的启发，Prophet 是 Facebook 开发的流行预测工具。 NeuralProphet 是在一个完全模块化的架构中开发的，这使得它可以扩展以在未来添加任何额外的组件。我们的愿景是为用户开发一个简单易用的预测工具，同时保留 Prophet 的原始目标，如可解释性、可配置性，并通过使用 PyTorch 作为后端提供更多的自动差分功能。