R语言中auto.arima函数计算步骤和参数

南瓜派三蔬

已于 2024-02-05 13:56:50 修改

阅读量2.8w

点赞数 13

于 2019-10-23 14:24:21 首次发布

本文链接：https://blog.csdn.net/qq_36810398/article/details/102700416

版权

时间序列分析专栏收录该内容

17 篇文章

订阅专栏

本文介绍了Hyndman-Khandakar算法在自动选择ARIMA模型参数中的应用，包括使用KPSS测试确定差分阶数，通过AICc选择最优p、q值，以及在python中的实现方式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

说明：整理自 Forecast：Principe and Practice chapter8.7 。python里的pmdarima.arima.auto_arima也是在R语言auto.arima的基础上写的。

1.计算步骤

R语言里的auto.arima是Hyndman-Khandakar算法（Hyndman & Khandakar, 2008）的一个变种，它结合了单位根检验，最小化AICc和MLE等评价标准来获得一个ARIMA模型。

Hyndman-Khandakar自动ARIMA建模算法步骤如下：

Step1：通过重复地KPSS测试来确定差分阶数d:0≤d≤2。

Step2：对数据差分d次之后，通过最小化AICc来选择最优的p，q。
算法通过stepwise search而不是遍历所有可能的p，q组合来寻找最优的p，q组合。
　　　（2.a）拟合四个初始模型：
　　　　　　ARIMA(0,d,0)，
　　　　　　ARIMA(2,d,2)，
　　　　　　ARIMA(1,d,0)，
　　　　　　ARIMA(0,d,1)，
　　　　如果d=2，模型中包含一个常数（constant）；如果d≤1,另外
　　　　一个不含常数项的模型也被拟合 ARIMA(0,d,0)；
　　　（2.b）步骤（2.a）中拟合出的最好的模型（AICc最小的）称为
　　　“current model”；
　　　（2.c）考察current model的以下变种模型：
　　　　　　——对p 和/或 q的值改变±1，
　　　　　　——包含/不包含常数项c，
　　　　　　将上述变种和原来的current model中AICc最小的模型即为
　　　　　　最新的current model。
　　　（2.d）重复（2.c）直到没有更小的AICc的模型。

【说明】上述的过程其实还有几个疑点没有解释，比如KPSS检验计算步骤和原理，模型中考虑常数项和不考虑常数项的区别，这些都是比较复杂的问题，需要单独去分析，可以参考原资料。

2. auto.arima参数简述

以python中的pyramid.arima.auto_arima为例简述参数，更详细的请参考原函数。

auto_arima(y, exogenous=None,
start_p=2, d=None, start_q=2, max_p=5, max_d=2, max_q=5, start_P=1, D=None, start_Q=1, max_P=2, max_D=1, max_Q=2, max_order=10, m=1, seasonal=True, stationary=False, information_criterion=‘aic’, alpha=0.05, test=‘kpss’, seasonal_test=‘ch’, stepwise=True, n_jobs=1, start_params=None, trend=‘c’, method=None, transparams=True, solver=‘lbfgs’, maxiter=50, disp=0, callback=None, offset_test_args=None, seasonal_test_args=None, suppress_warnings=False, error_action=‘warn’, trace=False, random=False, random_state=None, n_fits=10, return_valid_fits=False, out_of_sample_size=0, scoring=‘mse’, scoring_args=None, **fit_args)

y : 要拟合的时间序列，需要是一维的浮点型数组。不能包含‘np.nan’ 或者‘np.inf’；

exogenous : 可以在给定时间序列数据之外，给定额外的特征来帮助预测，需要注意的是，对于预测未来的时序数据的时候，也要提供未来的特征数据。

start_p : int, 默认2，算法自动选择p时的下界。

d : int, 默认None,非周期的差分阶数，如果是None，则自动选择，此时，运行时间会显著增加。

start_q : int, 默认2，算法自动选择q时的下界。

max_p : int, 默认5，算法自动选择p时的上界，必须≥start_p。

max_d : int, 默认2，算法自动选择d（非周期差分阶数）时的上界，必须≥d。

max_q : int, 默认5，算法自动选择q时的上界，必须≥start_q。

start_P : int,默认1，周期模型自动选择P时的下界。

D : int,默认None，周期差分的阶数，如果是None，则自动选择。

start_Q : int, 默认1，周期模型自动选择Q时的下界。

max_P : int,默认2，周期模型自动选择P时的上界。

max_D : int, 默认1，周期差分阶数的最大值，必须≥D。

max_Q : int,默认2，周期模型自动选择Q时的上界。

max_order : int,默认10，如果p+q≥max_order，该组合对应的模型将不会被拟合。

m : int, 默认1，周期数，例如季度数据m=4,月度数据m=12；如果m=1,则seasonal会被设置为False。

seasonal : bool, 默认True，是否进行周期ARIMA拟合。需要注意的是，如果seasonal=True同时m=1,seasonal会被设置为False。

stationary : bool, 默认False，标志该序列是否是平稳序列。

information_criterion : str, 默认’aic’，模型评价指标，‘aic’, ‘bic’, ‘hqic’,'oob’之一。

alpha : float,默认0.05，test的显著性水平。

test : str, 默认’kpss’，单位根检验的类型，当非平稳且d=None才会进行检验。

seasonal_test : str, 默认’ch’，周期单位根检验方法的标志。

stepwise : bool,默认True，如果为True，模型搜寻范围扩大，耗时显著增加。

n_jobs : int,默认1，并行拟合模型的数目，如果为-1，则尽可能多的并行。

start_params : array-like, 默认None，ARMA(p,q)的起始参数。

transparams : bool,默认True，如果为True，则进行变换确保平稳性，如果为False，不检验平稳性和可逆性。

method : str, 似然函数的类型，{‘css-mle’,‘mle’,‘css’}之一。

trend : str or iterable, 多项式趋势的多项式的系数。

solver : str or None, 默认’lbfgs’，模型求解器。其它选项如’bfgs’、‘newton’ 等等。

maxiter : int, 默认50，The maximum number of function evaluations。

disp : int, 默认0，收敛信息的打印控制。disp<0表示不打印任何信息。

～～以下这些参数感觉用处不是很大，用的相对也少，就不翻译了～～

callback : callable, optional (default=None)
Called after each iteration as callback(xk) where xk is the current
parameter vector. This is only used in non-seasonal ARIMA models.

offset_test_args : dict, optional (default=None)
The args to pass to the constructor of the offset (d) test. See pyramid.arima.stationarity　 for more details.

seasonal_test_args : dict, optional (default=None)
The args to pass to the constructor of the seasonal offset (D) test. See pyramid.arima.seasonality for more details.

suppress_warnings : bool, optional (default=False)
Many warnings might be thrown inside of statsmodels. If suppress_warnings is True, all of the warnings coming from
ARIMA will be squelched.

error_action : str, optional (default=‘warn’)
If unable to fit an ARIMA due to stationarity issues, whether to
warn (‘warn’), raise the ValueError (‘raise’) or ignore (‘ignore’).Note that the default behavior is to warn, and fits that fail will be returned as None. This is the recommended behavior, as statsmodels ARIMA and SARIMAX models hit bugs periodically that can cause an otherwise healthy parameter combination to fail for reasons not related to pyramid.

trace : bool, optional (default=False)
Whether to print status on the fits. Note that this can be very verbose…

random : bool, optional (default=False)
Similar to grid searches, auto_arima provides the capability to
perform a “random search” over a hyper-parameter space. If random is True, rather than perform an exhaustive search or stepwise search, only n_fits ARIMA models will be fit (stepwise must be False for this option to do anything).

random_state : int, long or numpy RandomState, optional (default=None) The PRNG for when random=True. Ensures replicable testing and results.

n_fits : int, optional (default=10)
If random is True and a “random search” is going to be performed,n_iter is the number of ARIMA models to be fit.

return_valid_fits : bool, optional (default=False)
If True, will return all valid ARIMA fits in a list. If False (by default), will only return the best fit.

out_of_sample_size : int, optional (default=0)
The ARIMA class can fit only a portion of the data if specified,
in order to retain an “out of bag” sample score. This is the
number of examples from the tail of the time series to hold out
and use as validation examples. The model will not be fit on these
samples, but the observations will be added into the model’s endog and exog arrays so that future forecast values originate from the　end of the endogenous vector.

For instance::
    y = [0, 1, 2, 3, 4, 5, 6]
    out_of_sample_size = 2
    
    > Fit on: [0, 1, 2, 3, 4]
    > Score on: [5, 6]
    > Append [5, 6] to end of self.arima_res_.data.endog values

scoring : str, optional (default=‘mse’)
If performing validation (i.e., if out_of_sample_size > 0), the metric to use for scoring the out-of-sample data. One of {‘mse’,‘mae’}

scoring_args : dict, optional (default=None)
A dictionary of key-word arguments to be passed to the scoring metric.

**fit_args : dict, optional (default=None)
A dictionary of keyword arguments to pass to the :func:‘ARIMA.fit’ method.