我们要使用的是statsmodels
包的SARIMAX
这个模型,这并不是单一的模型,而是个大合集的模型,可以设置多个参数,如:
- 如果我们只想用ARIMA的模型,那么只需要设置order参数即可
- 如果想使用 ARIMA+季节性,即(SARIMA)模型,则设置order与seasonal_order参数即可
- 如果想使用 ARIMA+衍生变量,即(ARIMAX)模型,则设置order与exog参数即可
- 如果想使用 ARIMA+季节性+衍生变量,即(SARIMAX)模型,则设置order,seasonal_order与exog参数
官方文档:https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html
其中:
- endog:训练的时序数据,也是第一个位置的参数对应的值
- order:ARIMA的参数,输入格式是(p,d,q)
- seasonal_order:季节性的参数
- exog:衍生变量的矩阵
示例代码
from sklearn.metrics import r2_score, mean_squared_error
import statsmodels.api as sm
def load_data(samples=1000):
""" 用来生成训练、测试数据
:param samples: 数据量
"""
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
data_x, data_y = make_regression(n_samples=samples, n_features=10)
x_train, x_test, y_train, y_test = train_test_split(data_x, data_y, test_size=0.2, random_state=0,
shuffle=True)
return x_train, x_test, y_train, y_test
def main():
train_x, test_x, train_y, test_y = load_data() # 由sklearn生成回归任务的数据,通常效果会特别好
# 开始建模,把训练的x输入到exog参数中,我们使用ARIMA(1,0,1)的模型,这里对应order参数
# disp=0是“静默训练”,即不打印训练过程中迭代的中间信息,这里我们删除disp=0的话可以发现训练过程中会打印许多迭代的中间结果
arimax_model = sm.tsa.statespace.SARIMAX(endog=train_y, exog=train_x, order=(1, 0, 1)).fit(disp=0)
print(arimax_model.summary()) # 得到模型的结果
pred_result = arimax_model.forecast(test_x.shape[0], exog=test_x) # 得到预测结果
r2 = r2_score(test_y, pred_result) # 使用R2指标评价预测结果
mse = mean_squared_error(test_y, pred_result) # 使用MSE评价训练结果
print("预测结果 R2:", r2, " MSE:", mse)
if __name__ == '__main__':
main()
得到结果:
SARIMAX Results
==============================================================================
Dep. Variable: y No. Observations: 800
Model: SARIMAX(1, 0, 1) Log Likelihood 8536.644
Date: Mon, 26 Sep 2022 AIC -17047.288
Time: 09:17:27 BIC -16986.388
Sample: 0 HQIC -17023.893
- 800
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
x1 98.4567 7.38e-07 1.33e+08 0.000 98.457 98.457
x2 76.3143 7.98e-07 9.56e+07 0.000 76.314 76.314
x3 40.8672 8.21e-07 4.98e+07 0.000 40.867 40.867
x4 33.0903 7.95e-07 4.16e+07 0.000 33.090 33.090
x5 56.4743 7.92e-07 7.13e+07 0.000 56.474 56.474
x6 32.4059 8.01e-07 4.05e+07 0.000 32.406 32.406
x7 82.9344 7.96e-07 1.04e+08 0.000 82.934 82.934
x8 36.9715 8.33e-07 4.44e+07 0.000 36.971 36.971
x9 3.7463 8.76e-07 4.28e+06 0.000 3.746 3.746
x10 79.4727 7.63e-07 1.04e+08 0.000 79.473 79.473
ar.L1 -0.0222 3.64e-13 -6.11e+10 0.000 -0.022 -0.022
ma.L1 -0.0282 3.54e-13 -7.98e+10 0.000 -0.028 -0.028
sigma2 7.445e-11 8.85e-11 0.841 0.400 -9.9e-11 2.48e-10
===================================================================================
Ljung-Box (L1) (Q): 0.39 Jarque-Bera (JB): 4.54
Prob(Q): 0.53 Prob(JB): 0.10
Heteroskedasticity (H): 0.91 Skew: 0.00
Prob(H) (two-sided): 0.43 Kurtosis: 2.63
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 1.06e+26. Standard errors may be unstable.
预测结果 R2: 0.9999999999999998 MSE: 1.0602913971787695e-11
最后得到预测结果,完美的一塌糊涂,也是因为数据比较简单吧