Step 6. 模型预测(样本内)
下面比较ARIMA和SARIMA模型的预测效果,可以看出SARIMA模型要更好一些。
我们将使用训练集最后90天的数据作为验证数据。
我们将使用 MAE (Mean Absolute Error) 和 MAPE (Mean Absolute Percentage Error)作为评价指标。
我们实现了三个模型:arima_results,sarima_01_results,sarima_02_results。这三个模型的残差都没有发现相关性,ARIMA模型拒绝了Jarque-Bera原假设,说明残差并不是正态分布。
# Create ARIMA mean forecast
arima_pred = arima_results.get_prediction(start=-90, dynamic=True)
arima_mean = arima_pred.predicted_mean
# Create SARIMA mean forecast
sarima_01_pred = sarima_01_results.get_prediction(start=-90, dynamic=True)
sarima_01_mean = sarima_01_pred.predicted_mean
# Create SARIMA mean forecast
sarima_02_pred = sarima_02_results.get_prediction(start=-90, dynamic=True)
sarima_02_mean = sarima_02_pred.predicted_mean
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error
metrics_arima = [round(mean_absolute_error(df_store_2_item_28_time[-90:],arima_mean),3),
round(mean_absolute_percentage_error(df_store_2_item_28_time[-90:],arima_mean),3)]
metrics_sarima_01 = [round(mean_absolute_error(df_store_2_item_28_time[-90:],sarima_01_mean),3),
round(mean_absolute_percentage_error(df_store_2_item_28_time[-90:],sarima_01_mean),3)]
metrics_sarima_02 = [round(mean_absolute_error(df_store_2_item_28_time[-90:],sarima_02_mean),3),
round(mean_absolute_percentage_error(df_store_2_item_28_time[-90:],sarima_02_mean),3)]
df_arima_results = pd.DataFrame({'metrics':['MAE','MAPE'],
'ARIMA(4,1,5)':metrics_arima,
'SARIMA(0,1,6)(0,1,1)7':metrics_sarima_01,
'SARIMA(6,1,1)(6,1,0)7':metrics_sarima_02,
})
df_arima_results
从上表开出,第二个SARIMA模型,也就是自动选择出的模型的评价指标最好。
下面我们从图像上观察一下,
dates = df_store_2_item_28_time.index
# Plot mean ARIMA and SARIMA predictions and observed
plt.figure(figsize=(15,10))
plt.title('Comparing forecasting in sample of all models', size = 16)
plt.plot(arima_mean.index, arima_mean, label='ARIMA(4,1,5)')
plt.plot(sarima_01_mean.index, sarima_01_mean, label='SARIMA(0,1,6)(0,1,1)7')
plt.plot(sarima_02_mean.index, sarima_02_mean, label='SARIMAX(6,1,1)(6,1,0)7')
plt.plot(df_store_2_item_28_time[-90:], label='observed')
plt.legend()
plt.show()
可仔细观察和比较三种模型与观测值的差异。
plt.figure(figsize=(15,5))
plt.title('Forecasting in sample - Observed values vs ARIMA(4,1,5)', size = 16)
plt.plot(df_store_2_item_28_time[-90:], label='observed', color='red')
plt.plot(arima_mean.index, arima_mean, label='ARIMA(4,1,5)', color='blue')
plt.figure(figsize=(15,5))
plt.title('Forecasting in sample - Observed values vs SARIMA(0,1,6)(0,1,1)7', size = 16)
plt.plot(df_store_2_item_28_time[-90:], label='observed', color='red')
plt.plot(sarima_01_mean.index, sarima_01_mean, label='SARIMA(0,1,6)(0,1,1)7', color='orange')
plt.figure(figsize=(15,5))
plt.title('Forecasting in sample - Observed values vs SARIMA(6,1,1)(6,1,0)7 (obtained by using automated selection)', size = 16)
plt.plot(df_store_2_item_28_time[-90:], label='observed', color='red')
plt.plot(sarima_02_mean.index, sarima_02_mean, label='SARIMA(6,1,1)(6,1,0)7', color='green')
可以看到,绿色的线也就是自动选择出的模型更接近于实际观测值。
Step 7. 样本外预测
这次我们向前预测90天,只比较ARIMA模型和SARIMA2。
# Create ARIMA mean forecast
arima_pred = arima_results.get_forecast(steps=90)
arima_mean = arima_pred.predicted_mean
# Create SARIMA mean forecast
sarima_02_pred = sarima_02_results.get_forecast(steps=90)
sarima_02_mean = sarima_02_pred.predicted_mean
dates = df_store_2_item_28_time.index
# Plot mean ARIMA and SARIMA predictions and observed
plt.title("Comparing Forecasting 90 days ahead - ARIMA vs SARIMA", size =16)
plt.plot(df_store_2_item_28_time['2017':], label='observed')
plt.plot(arima_mean.index, arima_mean, label='ARIMA(4,1,5)')
plt.plot(sarima_02_mean.index, sarima_02_mean, label='SARIMA(6,1,1)(6,1,0)7')
plt.legend()
plt.show()
从图中可明显看出,SARIMA模型的预测效果更接近于时序的季节趋势,这当然要归功于SARIMA模型融入了季节性因素。
未完待续…