8 时间序列相关工具介绍

汀沿河

已于 2024-09-26 17:38:42 修改

阅读量1.1k

点赞数 12

分类专栏： # 5时间序列文章标签：人工智能时间序列

于 2024-09-26 15:02:29 首次发布

本文链接：https://blog.csdn.net/qq_28611929/article/details/142554713

版权

5时间序列专栏收录该内容

9 篇文章

订阅专栏

1 背景

发现最近做的任务都是有关于时间序列任务的，做的方法全部偏向于如何对数据进行清洗、提取周期特征然后构造相关特征，这些工作都是比较偏向于传统时间序列模型的方案。

现在深度学习这个火，比如循环神经网络分支：LSTM、GRU、Self_attention分支：Transformer、InFormer、AutoInformer等等。但是这些网络不好构建，如何使用需要门槛。那么今日我为大家推荐一个方法：NeuralForecast，这个方法我测试了一下挺不错，我们看看包含什么模型：下面只是展示了部分，这些都是近期的SOTA，那么如何使用呢，让我们一起看看。

2 案例

完整代码在下面呢！

Quickstart - Nixtla

我直接拿别人的数据进行测试吧，国内什么时候能达到这样的水平呢，加油，同胞们！

Forecasting Models - Nixtla

我的环境：Python 3.10.0

matplotlib 3.9.2

neuralforecast 1.7.5

optuna 4.0.0

numpy 1.26.2

pandas 2.2.3

scikit-learn 1.5.2

torch 2.2.2

2.1 单变量TS

uniuqe_id: 可能是一个机场唯一的标识。

ds: 就是日期了

y: 单变量预测

数据类型：
---  ------     --------------  -----         
 0   unique_id  144 non-null    float64       
 1   ds         144 non-null    datetime64[ns]
 2   y          144 non-null    float32

horizon = 12 ：这个就是我们需要预测的一个窗口（未来多少天的）；

先把药使用的模型统统放在一个list中，然后通过NeuralForecast调用。

models

类型: 列表
描述: 包含一个或多个模型实例的列表。每个模型实例可以是 PatchTST、NBEATS、NHITS 等。

freq

类型: 字符串
描述: 时间序列的频率，例如 'D' 表示每天，'H' 表示每小时。

from neuralforecast import NeuralForecast
from neuralforecast.models import LSTM, NHITS, RNN
horizon = 12

# Try different hyperparmeters to improve accuracy.
models = [LSTM(h=horizon,                    # Forecast horizon
               max_steps=500,                # Number of steps to train
               scaler_type='standard',       # Type of scaler to normalize data
               encoder_hidden_size=64,       # Defines the size of the hidden state of the LSTM
               decoder_hidden_size=64,),     # Defines the number of hidden units of each layer of the MLP decoder
          NHITS(h=horizon,                   # Forecast horizon
                input_size=2 * horizon,      # Length of input sequence
                max_steps=100,               # Number of steps to train
                n_freq_downsample=[2, 1, 1]) # Downsampling factors for each stack output
          ]
nf = NeuralForecast(models=models, freq='M')
nf.fit(df=Y_df)

从下面的图可以看出，可以预测到数据趋势。

开箱子即用很方便的，气质单变量预测的场景非常少，那么多变量预测呢？

2.2 多变量TS

看到这里说明你还是很感兴趣的，那么继续介绍！先声明三个名词：

静态外生变量：往往是一些标识符，未来的数据可见

静态外生变量携带每个时间序列的时间不变信息。当模型使用全局参数来预测多个时间序列时，这些变量允许在具有相似静态变量水平的时间序列组内共享信息。静态变量的例子包括标识符，如地区标识符、产品组标识符等。

历史外生变量：未来的数据不可见

这种时间依赖的外生变量仅限于过去的观测值。

未来外生变量：未来的数据可见

与历史外生变量不同，未来外生变量在预测时是可用的。例子包括日历变量、天气预报和已知事件，这些事件可能会导致大的波动，如计划中的促销活动。

我们引入的数据往往是未来不可见的。

unique_id: 唯一标识符，未来可见。
ds ：时间
gen_forecast：未来不可见的
system_load：未来不可见的
week_day： 未来可见的
---  ------        --------------  -----         
 0   unique_id     32160 non-null  object        
 1   ds            32160 non-null  datetime64[ns]
 2   y             32160 non-null  float64       
 3   gen_forecast  32160 non-null  float64       
 4   system_load   32160 non-null  float64       
 5   week_day      32160 non-null  int64

数据包含两个标识符：一个是FR，那么我们看看数据什么样子：有趋势但是数据存在异常情况；

我们需要预测未来24个小时的数据，所以horizon = 24，定义了两个模型，我注释了一个，那么看看：

futr_exog_list = ['gen_forecast', 'week_day'], 未来可见的数据，只不过特征gen_forecast是模型生成的# <- Future exogenous variables
hist_exog_list = ['system_load'], 这个数据未来不可见
stat_exog_list = ['market_0', 'market_1'], 不随时间变化的，唯一标识符号

horizon = 24 # day-ahead daily forecast
models = [NHITS(h = horizon,
                input_size = 5*horizon,
                futr_exog_list = ['gen_forecast', 'week_day'], # <- Future exogenous variables
                hist_exog_list = ['system_load'], # <- Historical exogenous variables
                stat_exog_list = ['market_0', 'market_1'], # <- Static exogenous variables
                scaler_type = 'robust'),
          # BiTCN(h = horizon,
          #       input_size = 5*horizon,
          #       futr_exog_list = ['gen_forecast', 'week_day'], # <- Future exogenous variables
          #       hist_exog_list = ['system_load'], # <- Historical exogenous variables
          #       stat_exog_list = ['market_0', 'market_1'], # <- Static exogenous variables
          #       scaler_type = 'robust',
          #       ),                
                ]
nf = NeuralForecast(models=models, freq='H')
nf.fit(df=df,
       static_df=static_df)

其实我有个疑问，为什么数据不用关联，直接在fit方法里面static_df=static_df 指定下就能运行了。

2.3 自己的数据

这是我之前参加厦门空气质量预测的数据，最终获得了一个三等奖。我们看看怎么使用。

这一块其实也没什么，但是我的电脑不大行，必须上服务器，后面有机会我在测试吧，先给个案例。

# 定义回调函数来记录和打印损失
class LossCallback:
    def __init__(self):
        self.train_losses = []
        self.val_losses = []

    def on_epoch_end(self, epoch, logs):
        train_loss = logs.get('loss')
        val_loss = logs.get('val_loss')
        self.train_losses.append(train_loss)
        self.val_losses.append(val_loss)
        print(f"Epoch {epoch + 1}/{logs['epochs']} - Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}")

# 创建回调实例
loss_callback = LossCallback()



horizon = 7*24 # day-ahead daily forecast
models = [NHITS(h = horizon,
                input_size = 1*horizon,
                futr_exog_list = ['month', 'day', 'hour', 'dayofweek', 'dayofyear', 'sin_time','cos_time'], # <- Future exogenous variables
                hist_exog_list = ['so', 'no', 'pm10', 'co', 'o3', 'pm2.5'], # <- Historical exogenous variables
                stat_exog_list = [], # <- Static exogenous variables
                scaler_type = 'robust',
                max_steps=550,
                learning_rate=0.001,
               ),
          # BiTCN(h = horizon,
          #       input_size = 5*horizon,
          #       futr_exog_list = ['gen_forecast', 'week_day'], # <- Future exogenous variables
          #       hist_exog_list = ['system_load'], # <- Historical exogenous variables
          #       stat_exog_list = ['market_0', 'market_1'], # <- Static exogenous variables
          #       scaler_type = 'robust',
          #       ),                
                ]

feats = ['so', 'no', 'pm10', 'co', 'o3', 'pm2.5','month', 'day', 'hour', 'dayofweek', 'dayofyear', 'sin_time',
       'cos_time']
label = 'pm2.5'

nf = NeuralForecast(models=models, freq='H')
history = nf.fit(df=train_df_only[feats+['id','ds']],
       val_size = int(len(train_df_only) * 0.2),
       id_col = 'id',
    time_col = 'ds',
    target_col = label,
       verbose=1,
       #callbacks=[loss_callback]
    )