ARIMA(Auto Regressive integrated Moving Average) in Time Series Modelling

ARIMA

An observed time series can be decomposed into three main components: the trend i.e the long cycle, the seasonal systematic or calendar related movements, and the irregular unsystematic or short term fluctuations.

ARIMA 模型由三个部分构成: Auto-Regressive(AR), Integrated(I), and Moving Averages(MA).
AR:指autoregression,使用自己的历史值来回归自己,emphasizes the dependent relationship between an observation and its preceding or ‘lagged’ observations.
I:指integrated,通过差分来保证数据的平稳性,差分的次数,It typically involves subtracting an observation from its preceding observation.
MA:指moving average,使用移动平均给历史变量建模导致的残差和当前变量之间的关系,This component zeroes in on the relationship between an observation and the residual error from a moving average model based on lagged observations.

  • p: lag order,是lag observation的数目,
  • d:degree of defference,是raw observation are differentiated的次数
  • q: order of moving average, 是moving average window的大小

在建模ARIMA模型前,假设时序数据是stationary和单变量的,所以完整的建模流程包括:

  • Load the data and preprocess the data.
  • Check the stationarity of the data by making a dickey-fuller test(from statsmodels.tsa.stattools import adfuller).- if stationary then proceed for the further steps and if not then make it stationary.
  • determine the degree of differencing(d).
  • Determine the order of lag( p) and moving average(q), which can be done by making a PACF(partial autocorrelation function) and ACF(autocorrelation function) plot.
  • Fitting the model and making the prediction.
  • Check the performance of the model by calculating RMSE(root mean square error) between the actual and predicted values.

AR模型

A pure Auto Regressive (AR only) model:
在这里插入图片描述
where, Y{t-1} is the lag1 of the series, beta1 is the coefficient of lag1 that the model estimates and alpha is the intercept term, also estimated by the model.

MA模型

a pure Moving Average (MA only) model is one where Yt depends only on the lagged forecast errors:
在这里插入图片描述
where the error terms are the errors of the autoregressive models of the respective lags. The errors Et and E(t-1) are the errors from the following equations :
在这里插入图片描述

ARIMA模型

An ARIMA model is one where the time series was differenced at least once to make it stationary and you combine the AR and the MA terms. So the equation becomes:ARIMA模型是变量经历过至少一次差分后将AR和MA建模结合使用的结果
在这里插入图片描述
Predicted Yt = Constant + Linear combination Lags of Y (upto p lags) + Linear Combination of Lagged forecast errors (upto q lags)

确定pdq参数取值

确定d

平稳性的定义?
平稳分为严平稳和宽平稳,严平稳的是一种条件很严格的平稳性定义,是所有统计性质都不会随着时间的推移而变化的;宽平稳的条件就比较宽松了,只要保证序列低阶矩平稳。
序列平稳的两个重要性质: 1、序列的均值为常数。 2、自协方差函数自相关函数仅与时间平移长度有关而与时间的起止点无关。

差分的目的是确保数据是平稳的,因此要注意不要过度差分(虽然此时数据still是平稳的但会影响模型参数确定)。差分的正确阶数是得到一个近乎平稳(围绕定义的平均值漫游,并且 ACF 图很快达到0)的时序的最小差分数
当D选择地不合适会有:

  • autocorrelations对于lags大于10时仍然呈现出正值,那么这个时序需要进一步差分
  • autocorrelations的lag=1是个很负的负值,那么时序被过度差分了
  • 在实际情况中,很难在两个d中选择一个,那么就选择标准差更小的差分结果对应的d

example

  • 首先使用Augmented Dickey Fuller test(from statsmodels.tsa.stattools import adfuller)判断数据的平稳性,The null hypothesis of the ADF test is that the time series is non-stationary. So, if the p-value of the test is less than the significance level (0.05) then you reject the null hypothesis and infer that the time series is indeed stationary.

  • 在这里插入图片描述在这个例子中,p value >0.05,所以数据不平稳,需要进行差分

  • 如果平稳,则d=0;如果不平稳,则进行差分,在逐渐增大D的过程中,分别绘制自相关图和偏相关图来帮助我们判断合适的D的取值

- import numpy as np, pandas as pd
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
plt.rcParams.update({
   'figure.figsize':(9,7), 'figure.dpi':120})

# Import data
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/wwwusage.csv', names=['value'], header=0)

# Original Series
fig, axes = plt.subplots(3, 2, sharex=True)
axes[0, 0].plot(df.value); 
axes[0, 0].set_title('Original Series')
plot_acf(df.value, ax=axes[0, 1])

# 1st Differencing
axes[1, 0].plot(df.value.diff()); 
axes[1, 0].set_title('1st Order Differencing')
plot_acf(df.value.diff().dropna(), ax=axes[1, 1])

# 2nd Differencing
axes[2, 0].plot(df.value.diff().diff()); 
axes[2, 0].set_title('2nd Order Differencing')
plot_acf(df.value.diff().diff().dropna(), ax=axes[2, 1])

plt
  • 13
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Kiki酱。

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值