时序数据分析-2

Sarah_07

已于 2022-06-08 23:36:51 修改

阅读量930

点赞数 1

文章标签：数据挖掘机器学习人工智能数据分析

于 2022-06-08 23:33:48 首次发布

本文链接：https://blog.csdn.net/Sarah_07/article/details/125194720

版权

本文介绍了如何利用ARIMA模型进行时序数据预测，包括ARIMA模型的概念，如AR、MA和ARMA模型，并展示了通过ACF和PACF确定模型参数的方法。通过实例展示了使用Python的pmdarima库自动识别ARIMA模型的过程，并进行了预测结果的可视化。

摘要由CSDN通过智能技术生成

时序数据分析-2

承接上篇文章，如何分析疫情对销量的影响。在对销售数据进行分解后，可以针对trend进行相应的预测。上篇链接https://blog.csdn.net/Sarah_07/article/details/124976510

本文介绍下如何通过ARIMA进行时序预测，以及pmdarima的使用。我们先介绍下时序预测ARIMA的一些基本概念。

Auto-Regressive(AR) Model $\hat y_t = \alpha_1y_{t-1} + \cdots + \alpha_p y_{t-p}$
模型假设当前值 $y_t$ 与历史时序值相关，一般用PACF来识别lag order
Moving Average(MA) Model $\hat y_t = \omega + \epsilon_t + \beta_t\epsilon_{t-1} + \cdots + \beta_q\epsilon_{t-q}$
其中 $\omega$ 是均值， $\epsilon$ 是每一项的误差，模型假设当前值是均值与误差的结合，一般用ACF来识别lag order
ARMA Model $\hat y_t = \omega + \sum_{l=1}^p\beta_l\epsilon_{t-l} + \epsilon_t + \sum_{l=1}^q\alpha_l y_{t-l}$
ARMA 模型综合了AR和MA，其中P,Q的定阶由ACF和PACF决定
ARIMA Model，ARMA模型基本假设是时序数据是stationary的，但一般真实的数据可能不是stationary的，就需要通过difference的操作，比如计算与前一项的差得到新的时序序列，这个操作我们就需要用到ARIMA模型来进行时序分析了

我们通过一个case,具体看下如何用ARIMA来实现时序分析

判断时序数据是否是stationary的

from statsmodels.tsa.stattools import adfuller

def perform_adf_test(series):
    result = adfuller(series)
    print('ADF:%f'%result[0])
    print('P value:%.3f'%result[1])
    return result[1]

perform_adf_test(trend_df['trend'])

ADF:-3.252922
P value:0.017





0.01711854757353457

ADF test可以用来判断数据是否是stationary的，如果P value <0.05,则拒绝原假设，即数据是稳定的

通过ACF和PACF来判断q和p

from statsmodels.tsa.stattools import acf,pacf
from statsmodels.tsa.arima_model import ARIMA

num_lags = 20
plt.figure(figsize=(15,8))
acf_vals = acf(trend_df['trend'])
plt.bar(range(num_lags),acf_vals[:num_lags])
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(trend_df['trend'])),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(trend_df['trend'])),linestyle='--',color='gray')
plt.title('ACF')

Text(0.5, 1.0, 'ACF')

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fJ6hkHE8-1654702237892)(output_12_1.png)]

pacf_vals = pacf(trend_df['trend'],nlags=20)
plt.figure(figsize=(15,8))
plt.bar(range(10),pacf_vals[:10])
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(trend_df['trend'])),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(trend_df['trend'])),linestyle='--',color='gray')
plt.title('PACF')

在这里插入图片描述

这里share一下通过ACF和PACF确定模型和参数的基本原则。

	ACF	PACF
AR	逐渐下降，有较长的滞后项	急剧截断
MA	急剧截断	逐渐下降，有较长的滞后项
ARMA	逐渐下降，有较长的滞后项	逐渐下降，有较长的滞后项

ACF和PACF图中两个灰色虚线内为erra bar，相关系数落在虚线范围内表名相关系数为0，无意义。
通过ACF和PACF图可以得知，模型为AR(1)

建立ARIMA模型

model = ARIMA(trend_df['trend'],order=(1,0,0))
model_fit = model.fit()

prediction_train = model_fit.predict(start = train_start, end = train_end)
prediction_test = model_fit.predict(start = train_end, end = test_end)

plt.figure(figsize=(15,8))
plt.plot(trend_df['trend'],'blue',label='original_trend')
plt.plot(prediction_train,'r',label='generate_trend')
plt.plot(prediction_test,'green',label='predict_trend')
plt.title('trend',fontsize=16)
plt.legend()

在这里插入图片描述

auto-arima

如果要在工程项目中应用ARIMA来实现时序预测，靠每次人工识别模型结构是不行的。这里介绍一个自动定阶的library,pmdarima

from pmdarima.arima import auto_arima

autoarima = auto_arima(trend_df,start_p=0,start_q=0,test='adf',max_p=4,max_q=4,m=52,start_P=0,start_Q=0,max_P=0,max_Q=0,seasonal=False,trace=False,error_action='ignore',suppress_warnings=True,approximation=True,stepwise=True)
n_period = len(timeindex)
fitted = pd.DataFrame(autoarima.predict(n_periods=n_period,return_conf_int=False),index=timeindex)

plt.figure(figsize=(15,8))
plt.plot(trend_df['trend'],'blue',label='original_trend')
plt.plot(prediction_train,'r',label='generate_trend')
plt.plot(prediction_test,'green',label='predict_trend')
plt.plot(fitted,'yellow',label='predict_trend_auto')
plt.title('trend',fontsize=16)
plt.legend()