在R网格搜索与自动Arima中设置Arima模型参数

总览 (Overview)

One of the most popular ways of representing and forecasting time series data is through Autoregressive Integrated Moving Average (ARIMA) models. These models are defined by three parameters:

表示和预测时间序列数据的最流行的方法之一是通过自回归综合移动平均(ARIMA)模型。 这些模型由三个参数定义:

  • p: the lag order (number of lag observations included)

    p :滞后阶数(包括滞后观测值的数量)

  • d: the degree of differencing needed for stationarity (number of times the data is differenced)

    d :平稳性所需的差异程度(数据差异的次数)

  • q: the order of the moving average

    q :移动平均线的顺序

If you’re new to ARIMA modeling, there are lots of great instructional webpages and step-by-step guides such as this one from otexts.com and this one on oracle.com which give more thorough overviews.

如果您不熟悉 ARIMA建模,则有很多很棒的说明性网页和分步指南,例如otexts.com上的 指南和

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
首先,我们需要导入需要的库和数据集。假设我们使用的是一个名为 `data` 的 Pandas DataFrame,其包含时间序列数据。 ```python import pandas as pd import numpy as np from sklearn.metrics import mean_squared_error from sklearn.model_selection import ParameterGrid, TimeSeriesSplit from statsmodels.tsa.arima_model import ARIMA import xgboost as xgb # 导入数据集 data = pd.read_csv('data.csv') ``` 接下来,我们可以将数据集分成训练集和测试集。在这里,我们将使用前80%的数据作为训练集,后20%的数据作为测试集。我们还需要定义时间序列分割的参数,以便用于交叉验证。 ```python # 将数据集分成训练集和测试集 train_size = int(len(data) * 0.8) train_data, test_data = data[:train_size], data[train_size:] # 定义时间序列分割参数 n_splits = 5 window_size = int(len(train_data) / n_splits) tscv = TimeSeriesSplit(n_splits=n_splits) ``` 接下来,我们将使用ARIMA模型对训练数据进行拟合,以便将其用作XGBoost模型的特征。我们使用ARIMA模型的残差作为特征,因为残差包含了时间序列未被模型捕捉到的信息。我们将使用ARIMA模型的残差和原始数据的移动平均值和标准差作为XGBoost模型的特征。 ```python # 定义ARIMA模型参数和范围 p = [0, 1, 2] d = [0, 1] q = [0, 1, 2] arima_params = list(ParameterGrid({'p': p, 'd': d, 'q': q})) # 定义XGBoost模型参数和范围 xgb_params = {'max_depth': [3, 5, 7], 'n_estimators': [50, 100, 150]} # 定义特征变量和目标变量 train_features = [] train_targets = [] # 对每个时间序列分割进行ARIMA模型拟合和特征提取 for train_index, test_index in tscv.split(train_data): # 定义训练集和测试集 train, test = train_data.iloc[train_index], train_data.iloc[test_index] # 对训练集进行ARIMA模型拟合 arima_model = ARIMA(train, order=(1, 1, 0)) arima_fit = arima_model.fit(disp=0) # 提取ARIMA模型残差 arima_residuals = pd.DataFrame(arima_fit.resid) arima_residuals.index = train.index[1:] # 提取移动平均值和标准差 rolling_mean = train.rolling(window=window_size).mean()[window_size:] rolling_std = train.rolling(window=window_size).std()[window_size:] # 将ARIMA模型残差、移动平均值和标准差作为特征 train_features.append(pd.concat([arima_residuals, rolling_mean, rolling_std], axis=1)) train_targets.append(train.iloc[1:]) # 将特征和目标变量转换为NumPy数组 train_features = np.concatenate(train_features) train_targets = np.concatenate(train_targets).ravel() ``` 现在我们可以使用网格搜索来找到最佳的XGBoost模型。我们将使用均方根误差(RMSE)作为评估指标。 ```python # 定义网格搜索参数 grid = ParameterGrid({'xgb_params': xgb_params, 'arima_params': arima_params}) # 定义最佳模型参数和最小RMSE best_params = None min_rmse = float('inf') # 对每个参数组合进行交叉验证 for params in grid: # 定义XGBoost模型ARIMA模型 xgb_model = xgb.XGBRegressor(**params['xgb_params']) arima_model = ARIMA(train_data, order=(params['arima_params']['p'], params['arima_params']['d'], params['arima_params']['q'])) arima_fit = arima_model.fit(disp=0) # 提取ARIMA模型残差 arima_residuals = pd.DataFrame(arima_fit.resid) arima_residuals.index = train_data.index[1:] # 提取移动平均值和标准差 rolling_mean = train_data.rolling(window=window_size).mean()[window_size:] rolling_std = train_data.rolling(window=window_size).std()[window_size:] # 将ARIMA模型残差、移动平均值和标准差作为特征 features = pd.concat([arima_residuals, rolling_mean, rolling_std], axis=1) targets = train_data.iloc[1:] # 对每个时间序列分割进行交叉验证 rmse_scores = [] for train_index, test_index in tscv.split(train_data): # 定义训练集和测试集 X_train, X_test = features.iloc[train_index], features.iloc[test_index] y_train, y_test = targets.iloc[train_index], targets.iloc[test_index] # 拟合XGBoost模型 xgb_model.fit(X_train, y_train) # 预测测试集 y_pred = xgb_model.predict(X_test) # 计算RMSE rmse = np.sqrt(mean_squared_error(y_test, y_pred)) rmse_scores.append(rmse) # 计算平均RMSE mean_rmse = np.mean(rmse_scores) # 更新最佳模型参数和最小RMSE if mean_rmse < min_rmse: best_params = params min_rmse = mean_rmse # 输出最佳模型参数和最小RMSE print('Best params:', best_params) print('Min RMSE:', min_rmse) ``` 最后,我们可以使用最佳参数训练XGBoost模型,并在测试集上进行预测。 ```python # 定义XGBoost模型ARIMA模型 xgb_model = xgb.XGBRegressor(**best_params['xgb_params']) arima_model = ARIMA(train_data, order=(best_params['arima_params']['p'], best_params['arima_params']['d'], best_params['arima_params']['q'])) arima_fit = arima_model.fit(disp=0) # 提取ARIMA模型残差 arima_residuals = pd.DataFrame(arima_fit.resid) arima_residuals.index = train_data.index[1:] # 提取移动平均值和标准差 rolling_mean = train_data.rolling(window=window_size).mean()[window_size:] rolling_std = train_data.rolling(window=window_size).std()[window_size:] # 将ARIMA模型残差、移动平均值和标准差作为特征 train_features = pd.concat([arima_residuals, rolling_mean, rolling_std], axis=1) train_targets = train_data.iloc[1:] # 拟合XGBoost模型 xgb_model.fit(train_features, train_targets) # 对测试集进行预测 test_arima_residuals = pd.DataFrame(arima_fit.forecast(steps=len(test_data))[0]).diff()[1:] test_rolling_mean = test_data.rolling(window=window_size).mean()[window_size:] test_rolling_std = test_data.rolling(window=window_size).std()[window_size:] test_features = pd.concat([test_arima_residuals, test_rolling_mean, test_rolling_std], axis=1) test_targets = test_data.iloc[1:] test_pred = xgb_model.predict(test_features) # 计算测试集RMSE test_rmse = np.sqrt(mean_squared_error(test_targets, test_pred)) print('Test RMSE:', test_rmse) ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值