Chapter 2 Difference Equations and Their Solutions


This solution y t = A ϕ 1 t y_{t}=A \phi_{1}^{t} yt=Aϕ1t to the homogeneous equation is called the homogeneous solution.

  • If ∣ ϕ 1 ∣ < 1 \left|\phi_{1}\right|<1 ϕ1<1, ==the homogeneous solution converges to zero as t → ∞ t \rightarrow \infty t.==Convergence is direct if 0 < ϕ 1 < 1 0<\phi_{1}<1 0<ϕ1<1 and oscillatory if − 1 < ϕ 1 < 0 -1<\phi_{1}<0 1<ϕ1<0
  • If ∣ ϕ 1 ∣ > 1 \left|\phi_{1}\right|>1 ϕ1>1, the homogeneous solution is divergent. If ϕ 1 > 1 \phi_{1}>1 ϕ1>1, the solution approaches ∞ \infty as t t t increases. If ϕ 1 < − 1 \phi_{1}<-1 ϕ1<1, the solution oscillates explosively.
  • If ϕ 1 = 1 \phi_{1}=1 ϕ1=1, any arbitrary constant A A A satisfies the homogeneous equation and y t = y t − 1 y_{t}=y_{t-1} yt=yt1. If ϕ 1 = − 1 , y t = A \phi_{1}=-1, y_{t}=A ϕ1=1,yt=A for even values of t t t and y t = − A y_{t}=-A yt=A for odd values of t t t, and y t = − y t − 1 y_{t}=-y_{t-1} yt=yt1.

odd:奇数 even: 偶数


  1. 找到齐次方程并求解
  2. 找到特解
  3. 得到通解
  4. 代入初始条件,得到一些常数参数的取值(如果有的话)

lag operators

For ∣ a ∣ < 1 |a|<1 a<1, the infinite sum
( 1 + a L + a 2 L 2 + a 3 L 3 + ⋯   ) y t = y t 1 − a L \left(1+a L+a^{2} L^{2}+a^{3} L^{3}+\cdots\right) y_{t}=\frac{y_{t}}{1-a L} (1+aL+a2L2+a3L3+)yt=1aLyt


It is straightforward to use lag operators to solve linear difference equations. If ∣ ϕ 1 ∣ < 1 \left|\phi_{1}\right|<1 ϕ1<1, we obtain
y t = ( 1 − ϕ 1 L ) − 1 ( ϕ 0 + x t ) = ( 1 + ϕ 1 L + ϕ 1 2 L 2 + ⋯   ) ( ϕ 0 + x t ) = ϕ 0 1 − ϕ 1 + ∑ j = 0 ∞ ϕ 1 j x t − j \begin{aligned} y_{t} &=\left(1-\phi_{1} L\right)^{-1}\left(\phi_{0}+x_{t}\right)=\left(1+\phi_{1} L+\phi_{1}^{2} L^{2}+\cdots\right)\left(\phi_{0}+x_{t}\right) \\ &=\frac{\phi_{0}}{1-\phi_{1}}+\sum_{j=0}^{\infty} \phi_{1}^{j} x_{t-j} \end{aligned} yt=(1ϕ1L)1(ϕ0+xt)=(1+ϕ1L+ϕ12L2+)(ϕ0+xt)=1ϕ1ϕ0+j=0ϕ1jxtj


characteristic equation and inverse characteristic equation

A α t = ϕ 1 A α t − 1 + ϕ 2 A α t − 2 ⇒ α 2 − ϕ 1 α − ϕ 2 = 0 ⏟ characteristic equation  A \alpha^{t}=\phi_{1} A \alpha^{t-1}+\phi_{2} A \alpha^{t-2} \Rightarrow \underbrace{\alpha^{2}-\phi_{1} \alpha-\phi_{2}=0}_{\text {characteristic equation }} Aαt=ϕ1Aαt1+ϕ2Aαt2characteristic equation  α2ϕ1αϕ2=0
Case 1: If ϕ 1 2 + 4 ϕ 2 > 0 , α 1 , α 2 = ϕ 1 ± ϕ 1 2 + 4 ϕ 2 2 \phi_{1}^{2}+4 \phi_{2}>0, \alpha_{1}, \alpha_{2}=\frac{\phi_{1} \pm \sqrt{\phi_{1}^{2}+4 \phi_{2}}}{2} ϕ12+4ϕ2>0,α1,α2=2ϕ1±ϕ12+4ϕ2
y t = A 1 α 1 t + A 2 α 2 t y_{t}=A_{1} \alpha_{1}^{t}+A_{2} \alpha_{2}^{t} yt=A1α1t+A2α2t
Case 2: If ϕ 1 2 + 4 ϕ 2 = 0 , α 1 = α 2 = ϕ 1 2 \phi_{1}^{2}+4 \phi_{2}=0, \alpha_{1}=\alpha_{2}=\frac{\phi_{1}}{2} ϕ12+4ϕ2=0,α1=α2=2ϕ1
y t = A 1 ( ϕ 1 2 ) t + A 2 t ( ϕ 1 2 ) t y_{t}=A_{1}\left(\frac{\phi_{1}}{2}\right)^{t}+A_{2} t\left(\frac{\phi_{1}}{2}\right)^{t} yt=A1(2ϕ1)t+A2t(2ϕ1)t
Case 3: If ϕ 1 2 + 4 ϕ 2 < 0 , α 1 , α 2 = ϕ 1 ± i − ϕ 1 2 − 4 ϕ 2 2 \phi_{1}^{2}+4 \phi_{2}<0, \alpha_{1}, \alpha_{2}=\frac{\phi_{1} \pm i \sqrt{-\phi_{1}^{2}-4 \phi_{2}}}{2} ϕ12+4ϕ2<0,α1,α2=2ϕ1±iϕ124ϕ2 .
α 1 = ϕ 1 + i − ϕ 1 2 − 4 ϕ 2 2 = r ( cos ⁡ θ + i sin ⁡ θ ) α 2 = ϕ 1 − i − ϕ 1 2 − 4 ϕ 2 2 = r ( cos ⁡ θ − i sin ⁡ θ ) \begin{aligned} &\alpha_{1}=\frac{\phi_{1}+i \sqrt{-\phi_{1}^{2}-4 \phi_{2}}}{2}=r(\cos \theta+i \sin \theta) \\ &\alpha_{2}=\frac{\phi_{1}-i \sqrt{-\phi_{1}^{2}-4 \phi_{2}}}{2}=r(\cos \theta-i \sin \theta) \end{aligned} α1=2ϕ1+iϕ124ϕ2 =r(cosθ+isinθ)α2=2ϕ1iϕ124ϕ2 =r(cosθisinθ)
where r = − ϕ 2 r=\sqrt{-\phi_{2}} r=ϕ2 is the modulus of α 1 \alpha_{1} α1 and α 2 \alpha_{2} α2,

and θ = arccos ⁡ ( ϕ 1 2 − ϕ 2 ) \theta=\arccos \left(\frac{\phi_{1}}{2 \sqrt{-\phi_{2}}}\right) θ=arccos(2ϕ2 ϕ1) is the argument of α 1 \alpha_{1} α1 and α 2 \alpha_{2} α2.
y t = A 1 α 1 t + A 2 α 2 t ≡ B 1 r t cos ⁡ ( θ t + B 2 ) y_{t}=A_{1} \alpha_{1}^{t}+A_{2} \alpha_{2}^{t} \equiv B_{1} r^{t} \cos \left(\theta t+B_{2}\right) yt=A1α1t+A2α2tB1rtcos(θt+B2)

stability conditions

Stability requires that all characteristic roots (defined in Eq. (10)) lie within the unit circle, i.e. ∣ α j ∣ < 1 \left|\alpha_{j}\right|<1 αj<1 for all j j j.

  • a necessary condition for stability : ∑ j = 1 p ϕ j < 1 \sum_{j=1}^{p} \phi_{j}<1 j=1pϕj<1.
  • a sufficient condition for stability : ∑ j = 1 p ∣ ϕ j ∣ < 1 \sum_{j=1}^{p}\left|\phi_{j}\right|<1 j=1pϕj<1.
  • At least one characteristic root equals unity if

∑ j = 1 p ϕ j = 1 \sum_{j=1}^{p} \phi_{j}=1 j=1pϕj=1

Chapter 3 Univariate Time Series

介绍MA(q), AR§, ARMA(p,q)

white noise

{ ϵ t } \left\{\epsilon_{t}\right\} {ϵt} is called a white noise process if for all t t t
E ( ϵ t ) = 0  mean zero  E ( ϵ t 2 ) = var ⁡ ( ϵ t ) = σ 2  variance  σ 2 E ( ϵ t ϵ τ ) = cov ⁡ ( ϵ t , ϵ τ ) = 0 ,  for all  τ ≠ t  uncorrelated across time  \begin{aligned} E\left(\epsilon_{t}\right) &=0 \quad \text { mean zero } \\ E\left(\epsilon_{t}^{2}\right) &=\operatorname{var}\left(\epsilon_{t}\right)=\sigma^{2} \quad \text { variance } \sigma^{2} \\ E\left(\epsilon_{t} \epsilon_{\tau}\right) &=\operatorname{cov}\left(\epsilon_{t}, \epsilon_{\tau}\right)=0, \text { for all } \tau \neq t \quad \text { uncorrelated across time } \end{aligned} E(ϵt)E(ϵt2)E(ϵtϵτ)=0 mean zero =var(ϵt)=σ2 variance σ2=cov(ϵt,ϵτ)=0, for all τ=t uncorrelated across time 
If in addition, { ϵ t } \left\{\epsilon_{t}\right\} {ϵt} is independent across time, then it is called an independent white noise process.
If furthermore, ϵ t ∼ N ( 0 , σ 2 ) \epsilon_{t} \sim N\left(0, \sigma^{2}\right) ϵtN(0,σ2), then we have the Gaussian white noise process.


Strict or strong stationarity :

  • distributions are time-invariant. This is a very strong condition that is hard to verify empirically.
  • 均值和方差不一定有限

Weak stationarity( covariance stationary) :

  • first 2 moments are time-invariant. In this course, we are mainly concerned with weakly stationary series.

A stochastic process { y t } \left\{y_{t}\right\} {yt} having a finite mean and variance is covariance stationary (weakly stationary) if
(1) Mean (or expectation) is the same for each period:
E ( y t ) = μ  for all  t E\left(y_{t}\right)=\mu \text { for all } t E(yt)=μ for all t
(2) Variance (variability) is the same for each period:
var ⁡ ( y t ) = E [ ( y t − μ ) 2 ] = σ y 2  for all  t \operatorname{var}\left(y_{t}\right)=E\left[\left(y_{t}-\mu\right)^{2}\right]=\sigma_{y}^{2} \text { for all } t var(yt)=E[(ytμ)2]=σy2 for all t
(3) Lag-k autocovariance :
γ k = cov ⁡ ( y t , y t − k ) = E [ ( y t − μ ) ( y t − k − μ ) ]    f o r   a l l   t   a n d   a n y   k \gamma_{k}=\operatorname{cov}\left(y_{t}, y_{t-k}\right)=E\left[\left(y_{t}-\mu\right)\left(y_{t-k}-\mu\right)\right]~~ for ~all~ t ~and ~any~k γk=cov(yt,ytk)=E[(ytμ)(ytkμ)]  for all t and any k

Lag-k autocorrelation (or serial correlation)
ρ k ≡ cov ⁡ ( y t , y t − k ) var ⁡ ( y t ) = γ k γ 0 \rho_{k} \equiv \frac{\operatorname{cov}\left(y_{t}, y_{t-k}\right)}{\operatorname{var}\left(y_{t}\right)}=\frac{\gamma_{k}}{\gamma_{0}} ρkvar(yt)cov(yt,ytk)=γ0γk
在多元模型行中,自协方差是指 y t y_t yt和其滞后项之间的协方差,而协方差是指一个序列和另一个序列之间的协方差。在一元时间序列模型中不会产生歧义。

MA(q) models

x t = μ + ∑ j = 0 q β j ϵ t − j M A ( q )   m o d e l s x_{t}=\mu+\sum_{j=0}^{q} \beta_{j} \epsilon_{t-j}\qquad M A(q)~ models xt=μ+j=0qβjϵtjMA(q) models

β 0 \beta_{0} β0 is always set to be unity for normalization.



AR§ models

y t = ϕ 0 + ∑ j = 1 p ϕ j y t − j + ϵ t , A R ( p )  model.  y_{t}=\phi_{0}+\sum_{j=1}^{p} \phi_{j} y_{t-j}+\epsilon_{t}, \quad \mathrm{AR}(\mathrm{p}) \text { model. } yt=ϕ0+j=1pϕjytj+ϵt,AR(p) model. 

Without initial conditions, the general solution to Eq.(6) is:
y t = A ϕ 1 t + ϕ 0 1 − ϕ 1 + ∑ j = 0 ∞ ϕ 1 j ϵ t − j ,  if  ∣ ϕ 1 ∣ < 1. y_{t}=A \phi_{1}^{t}+\frac{\phi_{0}}{1-\phi_{1}}+\sum_{j=0}^{\infty} \phi_{1}^{j} \epsilon_{t-j}, \quad \text { if }\left|\phi_{1}\right|<1 . yt=Aϕ1t+1ϕ1ϕ0+j=0ϕ1jϵtj, if ϕ1<1.

  • The characteristic root ϕ 1 \phi_{1} ϕ1 must be less than unity in absolute value.
  • The homogeneous solution A ϕ 1 t A \phi_{1}^{t} Aϕ1t must be zero. Either the sequence must have started infinitely far in the past (so that ϕ 1 t ≈ 0 \phi_{1}^{t} \approx 0 ϕ1t0 ) or the process must always be in equilibrium (so that A = 0 ) A=0) A=0)


Stationarity conditions for AR§ processes

  • ∣ α j ∣ < 1 \left|\alpha_{j}\right|<1 αj<1 for all j = 1 , ⋯   , p j=1, \cdots, p j=1,,p.
  • The homogeneous solution must be zero. Either the sequence must have started infinitely far in the past or the process must always be in equilibrium (so that the arbitrary constants are zero).

ARMA(p,q) models

y t = ϕ 0 + ∑ j = 1 p ϕ j y t − j + ∑ j = 0 q β j ϵ t − j , ARMA ⁡ ( p , q )  model.  y_{t}=\phi_{0}+\sum_{j=1}^{p} \phi_{j} y_{t-j}+\sum_{j=0}^{q} \beta_{j} \epsilon_{t-j}, \quad \operatorname{ARMA}(\mathrm{p}, \mathrm{q}) \text { model. } yt=ϕ0+j=1pϕjytj+j=0qβjϵtj,ARMA(p,q) model. 

y t = c + ∑ j = 0 ∞ c j ϵ t − j y_{t}=c+\sum_{j=0}^{\infty} c_{j} \epsilon_{t-j} yt=c+j=0cjϵtj



  • The plot of== γ k \gamma_{k} γk==against k k k is called the autocovariance function.

  • ACF: The plot of== ρ k \rho_{k} ρk== against k k k is called the autocorrelation function (ACF) or correlogram.

ρ k ≡ cov ⁡ ( y t , y t − k ) var ⁡ ( y t ) = γ k γ 0 \rho_{k} \equiv \frac{\operatorname{cov}\left(y_{t}, y_{t-k}\right)}{\operatorname{var}\left(y_{t}\right)}=\frac{\gamma_{k}}{\gamma_{0}} ρkvar(yt)cov(yt,ytk)=γ0γk

Yule-Walker equations


The first p p p Yule-Walker equations determine the initial conditions.

  • The key point is that { γ k } \left\{\gamma_{k}\right\} {γk} and { ρ k } \left\{\rho_{k}\right\} {ρk} eventually will satisfy the homogeneous equation of this A R ( p ) A R(p) AR(p) process.
  • ACF should converge to zero geometrically if the series is stationary.


Form the Yule-Walker equations :
E ( y t y t ) = ϕ 1 E ( y t − 1 y t ) + ϕ 2 E ( y t − 2 y t ) + E ( ϵ t y t ) ⇒ γ 0 = ϕ 1 γ 1 + ϕ 2 γ 2 + σ 2 E ( y t y t − 1 ) = ϕ 1 E ( y t − 1 y t − 1 ) + ϕ 2 E ( y t − 2 y t − 1 ) + E ( ϵ t y t − 1 ) ⇒ γ 1 = ϕ 1 γ 0 + ϕ 2 γ 1 E ( y t y t − k ) = ϕ 1 E ( y t − 1 y t − k ) + ϕ 2 E ( y t − 2 y t − k ) + E ( ϵ t y t − k ) ⇒ γ k = ϕ 1 γ k − 1 + ϕ 2 γ k − 2  for  k ≥ 2 \begin{aligned} E\left(y_{t} y_{t}\right) &=\phi_{1} E\left(y_{t-1} y_{t}\right)+\phi_{2} E\left(y_{t-2} y_{t}\right)+E\left(\epsilon_{t} y_{t}\right) \\ & \Rightarrow \gamma_{0}=\phi_{1} \gamma_{1}+\phi_{2} \gamma_{2}+\sigma^{2} \\ E\left(y_{t} y_{t-1}\right)=& \phi_{1} E\left(y_{t-1} y_{t-1}\right)+\phi_{2} E\left(y_{t-2} y_{t-1}\right)+E\left(\epsilon_{t} y_{t-1}\right) \\ \Rightarrow & \gamma_{1}=\phi_{1} \gamma_{0}+\phi_{2} \gamma_{1} \\ E\left(y_{t} y_{t-k}\right)=& \phi_{1} E\left(y_{t-1} y_{t-k}\right)+\phi_{2} E\left(y_{t-2} y_{t-k}\right)+E\left(\epsilon_{t} y_{t-k}\right) \\ \Rightarrow & \gamma_{k}=\phi_{1} \gamma_{k-1}+\phi_{2} \gamma_{k-2} \quad \text { for } k \geq 2 \end{aligned} E(ytyt)E(ytyt1)=E(ytytk)==ϕ1E(yt1yt)+ϕ2E(yt2yt)+E(ϵtyt)γ0=ϕ1γ1+ϕ2γ2+σ2ϕ1E(yt1yt1)+ϕ2E(yt2yt1)+E(ϵtyt1)γ1=ϕ1γ0+ϕ2γ1ϕ1E(yt1ytk)+ϕ2E(yt2ytk)+E(ϵtytk)γk=ϕ1γk1+ϕ2γk2 for k2

partial autocorrelation function(PACF)

y t = ϕ 1 , 0 + ϕ 1 , 1 y t − 1 + e 1 t y t = ϕ 2 , 0 + ϕ 2 , 1 y t − 1 + ϕ 2 , 2 y t − 2 + e 2 t y t = ϕ 3 , 0 + ϕ 3 , 1 y t − 1 + ϕ 3 , 2 y t − 2 + ϕ 3 , 3 y t − 3 + e 3 t y t = ϕ 4 , 0 + ϕ 4 , 1 y t − 1 + ϕ 4 , 2 y t − 2 + ϕ 4 , 3 y t − 3 + ϕ 4 , 4 y t − 4 + e 4 t ⋮ \begin{aligned} y_{t} &=\phi_{1,0}+\phi_{1,1} y_{t-1}+e_{1 t} \\ y_{t} &=\phi_{2,0}+\phi_{2,1} y_{t-1}+\phi_{2,2} y_{t-2}+e_{2 t} \\ y_{t} &=\phi_{3,0}+\phi_{3,1} y_{t-1}+\phi_{3,2} y_{t-2}+\phi_{3,3} y_{t-3}+e_{3 t} \\ y_{t} &=\phi_{4,0}+\phi_{4,1} y_{t-1}+\phi_{4,2} y_{t-2}+\phi_{4,3} y_{t-3}+\phi_{4,4} y_{t-4}+e_{4 t} \\ & \vdots \end{aligned} ytytytyt=ϕ1,0+ϕ1,1yt1+e1t=ϕ2,0+ϕ2,1yt1+ϕ2,2yt2+e2t=ϕ3,0+ϕ3,1yt1+ϕ3,2yt2+ϕ3,3yt3+e3t=ϕ4,0+ϕ4,1yt1+ϕ4,2yt2+ϕ4,3yt3+ϕ4,4yt4+e4t

{ ϕ k , k , k ≥ 1 } \left\{\phi_{k, k}, k \geq 1\right\} {ϕk,k,k1} is the partial autocorrelation function.

  • e i t e_{it} eit 不一定是白噪音
  • e i t e_{it} eit y t − 1 , y t − 2 , ⋯ y_{t-1},y_{t-2},\cdots yt1,yt2, 无关

E ( y t y t − 1 ) = ϕ 2 , 1 E ( y t − 1 y t − 1 ) + ϕ 2 , 2 E ( y t − 2 y t − 1 ) + E ( y t − 1 e 2 t ) ⇒ γ 1 = ϕ 2 , 1 γ 0 + ϕ 2 , 2 γ 1 ⇒ ρ 1 = ϕ 2 , 1 + ϕ 2 , 2 ρ 1 E ( y t y t − 2 ) = ϕ 2 , 1 E ( y t − 1 y t − 2 ) + ϕ 2 , 2 E ( y t − 2 y t − 2 ) + E ( y t − 2 e 2 t ) ⇒ γ 2 = ϕ 2 , 1 γ 1 + ϕ 2 , 2 γ 0 ⇒ ρ 2 = ϕ 2 , 1 ρ 1 + ϕ 2 , 2 \begin{aligned} E\left(y_{t} y_{t-1}\right) &=\phi_{2,1} E\left(y_{t-1} y_{t-1}\right)+\phi_{2,2} E\left(y_{t-2} y_{t-1}\right)+E\left(y_{t-1} e_{2 t}\right) \\ & \Rightarrow \gamma_{1}=\phi_{2,1} \gamma_{0}+\phi_{2,2} \gamma_{1} \Rightarrow \rho_{1}=\phi_{2,1}+\phi_{2,2} \rho_{1} \\ E\left(y_{t} y_{t-2}\right) &=\phi_{2,1} E\left(y_{t-1} y_{t-2}\right)+\phi_{2,2} E\left(y_{t-2} y_{t-2}\right)+E\left(y_{t-2} e_{2 t}\right) \\ & \Rightarrow \gamma_{2}=\phi_{2,1} \gamma_{1}+\phi_{2,2} \gamma_{0} \Rightarrow \rho_{2}=\phi_{2,1} \rho_{1}+\phi_{2,2} \end{aligned} E(ytyt1)E(ytyt2)=ϕ2,1E(yt1yt1)+ϕ2,2E(yt2yt1)+E(yt1e2t)γ1=ϕ2,1γ0+ϕ2,2γ1ρ1=ϕ2,1+ϕ2,2ρ1=ϕ2,1E(yt1yt2)+ϕ2,2E(yt2yt2)+E(yt2e2t)γ2=ϕ2,1γ1+ϕ2,2γ0ρ2=ϕ2,1ρ1+ϕ2,2
Thus ϕ 2 , 2 = ρ 2 − ρ 1 2 1 − ρ 1 2 \phi_{2,2}=\frac{\rho_{2}-\rho_{1}^{2}}{1-\rho_{1}^{2}} ϕ2,2=1ρ12ρ2ρ12. For any ϕ k , k , k ≥ 1 \phi_{k, k}, k \geq 1 ϕk,k,k1, the similar procedure works.

y t = ϕ 0 + ∑ j = 1 p ϕ j y t − j + ∑ j = 0 q β j ϵ t − j , ARMA ⁡ ( p , q )  model.  y_{t}=\phi_{0}+\sum_{j=1}^{p} \phi_{j} y_{t-j}+\sum_{j=0}^{q} \beta_{j} \epsilon_{t-j}, \quad \operatorname{ARMA}(\mathrm{p}, \mathrm{q}) \text { model. } yt=ϕ0+j=1pϕjytj+j=0qβjϵtj,ARMA(p,q) model. 

  • 假设一个序列是平稳的,我们可以使用样本均值、方差、ACF和PACF来估计实际数据生成过程的参数。

  • 样本ACF和样本PACF可以与各种理论函数进行比较,以帮助识别数据生成过程的实际性质。

Sample mean : y ˉ = ∑ t = 1 T y t T \bar{y}=\frac{\sum_{t=1}^{T} y_{t}}{T} yˉ=Tt=1Tyt
Sample variance : σ ^ y 2 = ∑ t = 1 T ( y t − y ˉ ) 2 T : \widehat{\sigma}_{y}^{2}=\frac{\sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}}{T} :σ y2=Tt=1T(ytyˉ)2.
Lag-k sample autocorrelation:
ρ ^ k = ∑ t = k + 1 T ( y t − y ˉ ) ( y t − k − y ˉ ) ∑ t = 1 T ( y t − y ˉ ) 2 k ≥ 1. \widehat{\rho}_{k}=\frac{\sum_{t=k+1}^{T}\left(y_{t}-\bar{y}\right)\left(y_{t-k}-\bar{y}\right)}{\sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}} \quad k \geq 1 . ρ k=t=1T(ytyˉ)2t=k+1T(ytyˉ)(ytkyˉ)k1.
The statistics { ρ ^ 1 , ρ ^ 2 , … } \left\{\widehat{\rho}_{1}, \widehat{\rho}_{2}, \ldots\right\} {ρ 1,ρ 2,} are called the sample ACF of { y t } \left\{y_{t}\right\} {yt}


t test


For a given positive integer k k k, test H 0 : ρ k = 0 H_{0}: \rho_{k}=0 H0:ρk=0 against H 1 : ρ k ≠ 0 H_{1}: \rho_{k} \neq 0 H1:ρk=0

  • If { y t } \left\{y_{t}\right\} {yt} is a stationary Gaussian series satisfying ρ j = 0 \rho_{j}=0 ρj=0 for j ≥ k j \geq k jk (i.e., if { y t } \left\{y_{t}\right\} {yt} is an M A ( k − 1 ) \mathrm{MA}(\mathrm{k}-1) MA(k1) with normally distributed { ϵ t } \left\{\epsilon_{t}\right\} {ϵt} ), then ρ ^ k \widehat{\rho}_{k} ρ k is asymptotically normal with mean zero and variance 1 + 2 ∑ j = 1 k − 1 ρ j 2 T \frac{1+2 \sum_{j=1}^{k-1} \rho_{j}^{2}}{T} T1+2j=1k1ρj2. Therefore, the test statistic is
    t  ratio  = ρ ^ k ( 1 + 2 ∑ j = 1 k − 1 ρ ^ j 2 ) / T ⟶ D N ( 0 , 1 ) ,  as  T → ∞ \mathrm{t} \text { ratio }=\frac{\widehat{\rho}_{k}}{\sqrt{\left(1+2 \sum_{j=1}^{k-1} \widehat{\rho}_{j}^{2}\right) / T}} \stackrel{\mathcal{D}}{\longrightarrow} \mathcal{N}(0,1), \quad \text { as } T \rightarrow \infty t ratio =(1+2j=1k1ρ j2)/T ρ kDN(0,1), as T
    ​ where ⟶ D \stackrel{\mathcal{D}}{\longrightarrow} D denotes “converge in distribution”.

  • If { y t } \left\{y_{t}\right\} {yt} is an i.i.d. sequence satisfying var ⁡ ( y t ) < ∞ \operatorname{var}\left(y_{t}\right)<\infty var(yt)< (i.e., if { y t } \left\{y_{t}\right\} {yt} is an i.i.d. M A ( 0 ) ) \left.\mathrm{MA}(0)\right) MA(0)), then ρ ^ k \widehat{\rho}_{k} ρ k is asymptotically normal with mean zero and variance 1 T \frac{1}{T} T1 for any k ≥ 1 k \geq 1 k1. Thus t t t ratio = ρ ^ k 1 / T ⟶ D N ( 0 , 1 ) =\frac{\widehat{\rho}_{k}}{\sqrt{1 / T}} \stackrel{\mathcal{D}}{\longrightarrow} \mathcal{N}(0,1) =1/T ρ kDN(0,1), for sufficiently large T T T.

  • { ρ ^ 1 , ρ ^ 2 , … } \left\{\widehat{\rho}_{1}, \widehat{\rho}_{2}, \ldots\right\} {ρ 1,ρ 2,} will be calculated along with the acceptance intervals (significance level = 5 % =5 \% =5% ) under this i.i.d. assumption, once you put the data into Eviews or MATLAB.


t>|1.96|拒绝 t<|1.96|接受

  1. H 0 : ρ 1 = 0   H 1 : q ≥ 1 H_0: \rho_1=0~ H_1:q\ge1 H0:ρ1=0 H1:q1. $t=\frac{\widehat{\rho}_{k}}{\sqrt{1 / T}} $
  2. 拒绝原假设,则继续检验
  3. H 0 : ρ 2 = 0   H 1 : q ≥ 2 H_0: \rho_2=0~ H_1:q\ge2 H0:ρ2=0 H1:q2. t  ratio  = ρ ^ k ( 1 + 2 ∑ j = 1 k − 1 ρ ^ j 2 ) / T {t} \text { ratio }=\frac{\widehat{\rho}_{k}}{\sqrt{\left(1+2 \sum_{j=1}^{k-1} \widehat{\rho}_{j}^{2}\right) / T}} t ratio =(1+2j=1k1ρ j2)/T ρ k
  4. 直到接受原假设,此时可与得到q的阶数

Joint Test (Ljung-Box Q-statistics)


  • Significance test for a group of autocorrelations H 0 : ρ I = ⋯ = ρ m = 0 H_{0}: \rho_{I}=\cdots=\rho_{m}=0 H0:ρI==ρm=0 against H 1 : ρ i ≠ 0 H_{1}: \rho_{i} \neq 0 H1:ρi=0 for some 1 ≤ i ≤ m 1 \leq i \leq m 1im
  • Under the assumption that { y t } \left\{y_{t}\right\} {yt} is an i.i.d. sequence with certain moment conditions
    Q ( m ) = T ( T + 2 ) ∑ k = 1 m ρ ^ k 2 T − k ⟶ D χ m 2 Q(m)=T(T+2) \sum_{k=1}^{m} \frac{\widehat{\rho}_{k}^{2}}{T-k} \stackrel{\mathcal{D}}{\longrightarrow} \chi_{m}^{2} Q(m)=T(T+2)k=1mTkρ k2Dχm2
  • Decision rule : reject H 0 H_{0} H0 if Q ( m ) > χ m 2 ( α ) Q(m)>\chi_{m}^{2}(\alpha) Q(m)>χm2(α), where χ m 2 ( α ) \chi_{m}^{2}(\alpha) χm2(α) denotes the 100 ( 1 − α ) 100(1-\alpha) 100(1α) th percentile of a chi-squared distribution with m m m degrees of freedom.

Q ( m ) = T ( T + 2 ) ∑ k = 1 m ρ ~ k 2 T − k ⟶ D χ m − g 2 , Q(m)=T(T+2) \sum_{k=1}^{m} \frac{\tilde{\rho}_{k}^{2}}{T-k} \stackrel{\mathcal{D}}{\longrightarrow} \chi_{m-g}^{2}, Q(m)=T(T+2)k=1mTkρ~k2Dχmg2,
where ρ ~ k \tilde{\rho}_{k} ρ~k is the sample ACF of estimation residuals, and g g g denotes 模型中固定的常数的个数



ϕ ^ k , k \widehat{\phi}_{k, k} ϕ k,k converges to ϕ k , k \phi_{k, k} ϕk,k in probability as the sample size T T T goes to infinity.

  • ϕ p , p = ϕ p \phi_{p, p}=\phi_{p} ϕp,p=ϕp
  • ϕ k , k = 0 \phi_{k, k}=0 ϕk,k=0 for k > p k>p k>p.
    For k > p , ϕ ^ k , k k>p, \widehat{\phi}_{k, k} k>p,ϕ k,k is asymptotically normal with mean zero and variance 1 T \frac{1}{T} T1

t>|1.96|拒绝 t<|1.96|接受

  1. H 0 : ρ 1 = 0   H 1 : p ≥ 1 H_0: \rho_1=0~ H_1:p\ge1 H0:ρ1=0 H1:p1. $|PACF|>\frac{ 1.96}{\sqrt{1 / T}} $ 拒绝原假设
  2. 拒绝原假设,则继续检验
  3. H 0 : ρ 2 = 0   H 1 : p ≥ 2 H_0: \rho_2=0~ H_1:p\ge2 H0:ρ2=0 H1:p2. $|PACF|>\frac{ 1.96}{\sqrt{1 / T}} $ 拒绝原假设
  4. 直到接受原假设,此时可与得到q的阶数

Criteria: AIC, BIC

自由度和残差的一些trade off

  • Akaike information criterion (AIC):
    A I C ( I ) = log ⁡ ( σ ~ l 2 ) ⏟ goodness of fit  + 2 I T ⏟ penalty function  A I C(I)=\underbrace{\log \left(\tilde{\sigma}_{l}^{2}\right)}_{\text {goodness of fit }}+\underbrace{\frac{2 I}{T}}_{\text {penalty function }} AIC(I)=goodness of fit  log(σ~l2)+penalty function  T2I
    where / / / is the number of parameters estimated and σ ~ I 2 = S S R T \tilde{\sigma}_{I}^{2}=\frac{S S R}{T} σ~I2=TSSR.

  • Bayesian information criterion (BIC) or Schwarz information criterion ( S B C , S I C ) (\mathrm{SBC}, \mathrm{SIC}) (SBC,SIC) :
    B I C ( I ) = log ⁡ ( σ ~ l 2 ) ⏟ goodness of fit  + log ⁡ ( T ) T ⏟ penalty function  B I C(I)=\underbrace{\log \left(\widetilde{\sigma}_{l}^{2}\right)}_{\text {goodness of fit }}+\underbrace{\frac{\log (T)}{T}}_{\text {penalty function }} BIC(I)=goodness of fit  log(σ l2)+penalty function  Tlog(T)
    Choose the model with minimum A I C \mathrm{AIC} AIC or B I C \mathrm{BIC} BIC.

  • BIC适用于大样本。BIC将渐近地提供正确的模型,而AIC则倾向于选择一个过参数化的模型。

  • 在小样本中,AIC比BIC工作得更好。在BIC的背景下,来自给定总体中不同样本的选定模型顺序的平均变化将大于AIC。


  • 由于BIC选择了更简洁的模型,因此您应该检查以确定残差是否显示为白噪声。
  • 由于AIC可以选择一个过度参数化的模型,因此所有系数的t-统计量在传统水平上都应该是显著的。
  • 可能无法找到一个明显占优于所有其他模型的模型

检验 : 残差是否为白噪音 预测未来的数据

Box-Jenkins Approach

Box and Jenkins ( 1970 , 1976 ) (1970,1976) (1970,1976) popularized a three-stage method to estimate an ARMA model in a systematic manner.

  1. Identification

  2. Estimation

  3. Model diagnostic checking

  4. Identification

  • First of all, one might visually examine the time plot of the series, sample ACF and sample PACF. A comparison of the sample ACF and PACF to those of various theoretical ARMA processes may suggest several candidate models.
  • Model specification: AIC, BIC.


  • Similar processes can be approximated by very different models.
  • Common factor problem.
  • Each coefficient is significantly different from zero at the conventional level.
  1. Estimation

Estimation can be done using least squares or maximum likelihood depending on the model.

  • AR models : least squares method or maximum likelihood method
  • MA and ARMA models : maximum likelihood method


Stationarity and Invertibility

  • t-stats, ACF, Q-stats, ⋯ \cdots all assume that the process is stationary.
  • Be suspicious of implied roots near the unit circle.
  • Invertibility implies the model has an AR representation.
    • No unit root in MA part of the model. 在MA模型中无单位根
    • ARMA(p,q)过程的可逆性条件完全由它的MA部分决定。


  1. Model diagnostic checking
  • Residual diagnostics :
    • Plot residuals : look for outliers and periods of poor fit.
    • Residuals should be serially uncorrelated : examine ACF, PACF, Q-stats of residuals.残差要序列不相关
  • Divide sample into subperiods 样本内预测,看拟合效果
  • Out-of-sample forecasts 样本外预测



This result is really quite general: for any stationary ARMA model, the conditional forecast of y t + j y_{t+j} yt+j converges to the unconditional mean as j → ∞ j \rightarrow \infty j.
e t ( j ) = y t + j − E t y t + j e_t(j)=y_{t+j}-E_ty_{t+j} et(j)=yt+jEtyt+j
The 95 % 95 \% 95% confidence interval for the j j j-step ahead forecast is :
[ E t y t + j − 1.96 var ⁡ ( e t ( j ) ) , E t y t + j + 1.96 var ⁡ ( e t ( j ) ) ] .  \left[E_{t} y_{t+j}-1.96 \sqrt{\operatorname{var}\left(e_{t}(j)\right)}, E_{t} y_{t+j}+1.96 \sqrt{\operatorname{var}\left(e_{t}(j)\right)}\right] \text {. } [Etyt+j1.96var(et(j)) ,Etyt+j+1.96var(et(j)) ]

test whether a forecast is accurate or not

We want the forecast errors to be small!

If there are H \mathrm{H} H observations in the holdback periods, and { e j } j = 1 H \left\{e_{j}\right\}_{j=1}^{H} {ej}j=1H are the forecast errors from the candidate model :

  • Mean squared prediction error: MSPE = 1 H ∑ j = 1 H e j 2 =\frac{1}{H} \sum_{j=1}^{H} e_{j}^{2} =H1j=1Hej2. It is also called mean squared error (MSE).
  • Mean absolute error: M A E = 1 H ∑ j = 1 H ∣ e j ∣ \mathrm{MAE}=\frac{1}{H} \sum_{j=1}^{H}\left|e_{j}\right| MAE=H1j=1Hej.
  • Mean absolute percentage error: MAPE = 1 H ∑ j = 1 H ∣ e j y T + j ∣ ⋅ 100 =\frac{1}{H} \sum_{j=1}^{H}\left|\frac{e_{j}}{y_{T+j}}\right| \cdot 100 =H1j=1HyT+jej100.

Many researchers would select the model with the smallest MSPE (or MAE, MAPE).

Diebold-Mariano Test

  • Let the loss from a forecast error in period j j j be denoted by g ( e j ) g\left(e_{j}\right) g(ej). In the typical case of mean squared errors, the loss is e j 2 e_{j}^{2} ej2
  • We can write the differential loss in period j j j from using model 1 versus model 2 as d j = g ( e 1 j ) − g ( e 2 j ) d_{j}=g\left(e_{1 j}\right)-g\left(e_{2 j}\right) dj=g(e1j)g(e2j). The mean loss can be obtained as
    d ˉ = 1 H ∑ j = 1 H [ g ( e 1 j ) − g ( e 2 j ) ] \bar{d}=\frac{1}{H} \sum_{j=1}^{H}\left[g\left(e_{1 j}\right)-g\left(e_{2 j}\right)\right] dˉ=H1j=1H[g(e1j)g(e2j)]
  • Under the null hypothesis of equal forecast accuracy,
    • H 0 : E ( d ˉ ) = E ( d j ) = 0 H_0:\quad E(\bar{d})=E\left(d_{j}\right)=0 H0:E(dˉ)=E(dj)=0
    • H 1 : E ( d ˉ ) > 0 ? E ( d ˉ ) < 0 ? H_1: \quad E(\bar{d})>0 ? E(\bar{d})<0 ? H1:E(dˉ)>0?E(dˉ)<0? (模型2更好,模型1更好)

Under fairly weak conditions, the central limit theorem implies that d ˉ ⟶ D N ( 0 , var ⁡ ( d ˉ ) ) \bar{d} \stackrel{\mathcal{D}}{\longrightarrow} \mathcal{N}(0, \operatorname{var}(\bar{d})) dˉDN(0,var(dˉ)), as H → ∞ H \rightarrow \infty H, under the null hypothesis.

  • If the { d j } \left\{d_{j}\right\} {dj} series is serially uncorrelated with a sample variance of γ ^ \hat{\gamma} γ^, 若序列无关the estimator of var ⁡ ( d ˉ ) \operatorname{var}(\bar{d}) var(dˉ) is simply γ ^ H − 1 \frac{\widehat{\gamma}}{H-1} H1γ . The expression
    d ˉ γ ^ / ( H − 1 ) ⟶ D N ( 0 , 1 ) ,  as  H → ∞ , \frac{\bar{d}}{\sqrt{\hat{\gamma} /(H-1)}} \stackrel{\mathcal{D}}{\longrightarrow} \mathcal{N}(0,1), \quad \text { as } \quad H \rightarrow \infty, γ^/(H1) dˉDN(0,1), as H,
    ​ under the null hypothesis.

  • If the { d j } \left\{d_{j}\right\} {dj} series is serially correlated,若序列相关 there is a very large literature on the best way to estimate var ⁡ ( d ˉ ) \operatorname{var}(\bar{d}) var(dˉ) in the presence of serial correlation (e.g., the Newey-West estimator of the variance proposed in Newey and West ( 1987 ) ) (1987)) (1987)).
    d ˉ var ⁡ ( d ˉ ) ^ ⟶ D N ( 0 , 1 ) ,  as  H → ∞ \frac{\bar{d}}{\sqrt{\hat{\operatorname{var}(\bar{d})}}} \stackrel{\mathcal{D}}{\longrightarrow} \mathcal{N}(0,1), \quad \text { as } \quad H \rightarrow \infty var(dˉ)^ dˉDN(0,1), as H
    ​ under the null hypothesis, where var ⁡ ( d ˉ ) ^ \hat{\operatorname{var}(\bar{d})} var(dˉ)^ is an appropriate estimator of var ⁡ ( d ˉ ) \operatorname{var}(\bar{d}) var(dˉ).



$H_1: \quad E(\bar{d})>0 ? $ 模型二更好1.645

E ( d ˉ ) < 0 ? E(\bar{d})<0 ? E(dˉ)<0? 模型一更好-1.645

time series with trend

  1. 做差分去趋势化(log difference)
  2. 直接回归方程

周期性或季节性ARMA model


Chapter 4 Modeling Volatility

the volatility equation:
σ t 2 = var ⁡ ( y t ∣ F t − 1 ) = var ⁡ ( ϵ t ∣ F t − 1 ) \sigma_{t}^{2}=\operatorname{var}\left(y_{t} \mid \mathcal{F}_{t-1}\right)=\operatorname{var}\left(\epsilon_{t} \mid \mathcal{F}_{t-1}\right) σt2=var(ytFt1)=var(ϵtFt1)

  • Volatility σ t \sigma_{t} σt : the conditional standard deviation of y t y_{t} yt based on a past information set F t − 1 \mathcal{F}_{t-1} Ft1.

ARCH(q) 自回归条件异方差

A natural idea is to model ϵ t 2 \epsilon_{t}^{2} ϵt2 using an A R ( q ) A R(q) AR(q) process:
ϵ t 2 = α 0 + α 1 ϵ t − 1 2 + ⋯ + α q ϵ t − q 2 + η t ⇒ σ t 2 = α 0 + α 1 ϵ t − 1 2 + ⋯ + α q ϵ t − q 2 \begin{aligned} \epsilon_{t}^{2} &=\alpha_{0}+\alpha_{1} \epsilon_{t-1}^{2}+\cdots+\alpha_{q} \epsilon_{t-q}^{2}+\eta_{t} \\ \Rightarrow \sigma_{t}^{2} &=\alpha_{0}+\alpha_{1} \epsilon_{t-1}^{2}+\cdots+\alpha_{q} \epsilon_{t-q}^{2} \end{aligned} ϵt2σt2=α0+α1ϵt12++αqϵtq2+ηt=α0+α1ϵt12++αqϵtq2
Eq. is called an autoregressive conditional heteroskedasticity ( A R C H ) (\mathbf{A R C H}) (ARCH) model of order q q q.

ARCH (q) model
y t = E ( y t ∣ F t − 1 ) + ϵ t , ϵ t = σ t v t σ t 2 = α 0 + α 1 ϵ t − 1 2 + ⋯ + α q ϵ t − q 2 \begin{aligned} y_{t} &=E\left(y_{t} \mid \mathcal{F}_{t-1}\right)+\epsilon_{t}, \quad \epsilon_{t}=\sigma_{t} v_{t} \\ \sigma_{t}^{2} &=\alpha_{0}+\alpha_{1} \epsilon_{t-1}^{2}+\cdots+\alpha_{q} \epsilon_{t-q}^{2} \end{aligned} ytσt2=E(ytFt1)+ϵt,ϵt=σtvt=α0+α1ϵt12++αqϵtq2

  • This is an ARCH (q) model.
  • α 0 > 0 \alpha_{0}>0 α0>0, and α i ≥ 0 \alpha_{i} \geq 0 αi0 for i > 0 i>0 i>0 for positiveness.
  • ∑ i = 1 q α i < 1 \sum_{i=1}^{q} \alpha_{i}<1 i=1qαi<1 for stationarity.
  • { v t } \left\{v_{t}\right\} {vt} is a sequence of i.i.d.r.v. with mean 0 and variance 1 .

协方差Cov( , )反映的是线性相关关系

Corr( , )表示是否相关 独立是p(xy)=p(x)p(y)

E ( ϵ t ) = E [ E ( ϵ t ∣ F t − 1 ) ] = 0 E ( ϵ t ϵ t − j ) = E [ E ( ϵ t ϵ t − j ∣ F t − 1 ) ] = E [ ϵ t − j E ( ϵ t ∣ F t − 1 ) ] = 0 j ≥ 1 \begin{aligned} E\left(\epsilon_{t}\right) &=E\left[E\left(\epsilon_{t} \mid \mathcal{F}_{t-1}\right)\right]=0 \\ E\left(\epsilon_{t} \epsilon_{t-j}\right) &=E\left[E\left(\epsilon_{t} \epsilon_{t-j} \mid \mathcal{F}_{t-1}\right)\right]=E\left[\epsilon_{t-j} E\left(\epsilon_{t} \mid \mathcal{F}_{t-1}\right)\right]=0 \quad j \geq 1 \end{aligned} E(ϵt)E(ϵtϵtj)=E[E(ϵtFt1)]=0=E[E(ϵtϵtjFt1)]=E[ϵtjE(ϵtFt1)]=0j1

ϵ t 2 \epsilon_t^2 ϵt2

Assuming stationarity ( α 1 + ⋯ + α q < 1 ) \left(\alpha_{1}+\cdots+\alpha_{q}<1\right) (α1++αq<1)
var ⁡ ( ϵ t ) = E ( ϵ t 2 ) = E [ E ( ϵ t 2 ∣ F t − 1 ) ] = E ( σ t 2 ) = α 0 + α 1 E ( ϵ t − 1 2 ) + ⋯ + α q E ( ϵ t − q 2 ) \begin{aligned} \operatorname{var}\left(\epsilon_{t}\right) &=E\left(\epsilon_{t}^{2}\right)=E\left[E\left(\epsilon_{t}^{2} \mid \mathcal{F}_{t-1}\right)\right]=E\left(\sigma_{t}^{2}\right) \\ &=\alpha_{0}+\alpha_{1} E\left(\epsilon_{t-1}^{2}\right)+\cdots+\alpha_{q} E\left(\epsilon_{t-q}^{2}\right) \end{aligned} var(ϵt)=E(ϵt2)=E[E(ϵt2Ft1)]=E(σt2)=α0+α1E(ϵt12)++αqE(ϵtq2)
which implies that
var ⁡ ( ϵ t ) = E ( σ t 2 ) = α 0 1 − α 1 − ⋯ − α q \operatorname{var}\left(\epsilon_{t}\right)=E\left(\sigma_{t}^{2}\right)=\frac{\alpha_{0}}{1-\alpha_{1}-\cdots-\alpha_{q}} var(ϵt)=E(σt2)=1α1αqα0
The error { ϵ t } \left\{\epsilon_{t}\right\} {ϵt} is uncorrelated and stationary with mean zero and constant unconditional variance (with constraints to A R C H \mathrm{ARCH} ARCH parameters).

ϵ t 2 \epsilon_t^2 ϵt2 是一个白噪音,uncorrelated, but dependent. 非线性相关关系

ARCH Model 可以描述序列的平稳性和波动性

Heavy tails

long tail / fat tail/ heavy tail

  • Kurtosis of a random variable y y y is defined to be E [ ( y − E ( y ) ) 4 ] [ var ⁡ ( y ) ] 2 \frac{E\left[(y-E(y))^{4}\right]}{[\operatorname{var}(y)]^{2}} [var(y)]2E[(yE(y))4]. For example,
  • the kurtosis of a normal distribution is 3. 3 . 3.
  • the kurtosis of a student’s t \mathrm{t} t distribution with ν \nu ν degrees of freedom is 6 ν − 4 + 3 \frac{6}{\nu-4}+3 ν46+3, for ν > 4 \nu>4 ν>4.
  • Kurtosis identifies whether the tails of a given distribution contain extreme values.
  • Excess kurtosis=kurtosis-3 defines how heavily the tails of a distribution differ from the tails of a normal distribution.

ARCH heavy tail


  • Simplicity
  • A R C H \mathrm{ARCH} ARCH can model the volatility clustering effect since the conditional variance is autoregressive. Such models can be used to forecast volatility.
  • Heavy tails (high kurtosis)


  • Symmetric between positive & negative prior shocks
  • Restrictive on parameter space

Idea : A R C H \mathrm{ARCH} ARCH is like an A R \mathrm{AR} AR model for volatility. G A R C H \mathrm{GARCH} GARCH is like an ARMA model for volatility.

G A R C H ( p , q ) G A R C H(p, q) GARCH(p,q) model :

y t = E ( y t ∣ F t − 1 ) + ϵ t ϵ t = σ t v t σ t 2 = α 0 + ∑ i = 1 q α i ϵ t − i 2 + ∑ j = 1 p β j σ t − j 2 \begin{aligned} y_{t} &=E\left(y_{t} \mid \mathcal{F}_{t-1}\right)+\epsilon_{t} \quad \epsilon_{t}=\sigma_{t} v_{t} \\ \sigma_{t}^{2} &=\alpha_{0}+\sum_{i=1}^{q} \alpha_{i} \epsilon_{t-i}^{2}+\sum_{j=1}^{p} \beta_{j} \sigma_{t-j}^{2} \end{aligned} ytσt2=E(ytFt1)+ϵtϵt=σtvt=α0+i=1qαiϵti2+j=1pβjσtj2

  • α 0 > 0 \alpha_{0}>0 α0>0, and α i ≥ 0 , β j ≥ 0 \alpha_{i} \geq 0, \beta_{j} \geq 0 αi0,βj0 for i , j > 0 i, j>0 i,j>0 ensure positiveness.
  • ∑ i = 1 max ⁡ ( p , q ) ( α i + β i ) < 1 \sum_{i=1}^{\max (p, q)}\left(\alpha_{i}+\beta_{i}\right)<1 i=1max(p,q)(αi+βi)<1 ensures stationarity.(证明如下)
  • { v t } \left\{v_{t}\right\} {vt} is a sequence of i.i.d. r.v. with mean 0 and variance 1 .

Re-parameterization :
Let η t = ϵ t 2 − σ t 2 . { η t } \eta_{t}=\epsilon_{t}^{2}-\sigma_{t}^{2} .\left\{\eta_{t}\right\} ηt=ϵt2σt2.{ηt} are uncorrelated series. The G A R C H \mathrm{GARCH} GARCH model becomes
ϵ t 2 = α 0 + ∑ i = 1 max ⁡ ( p , q ) ( α i + β i ) ϵ t − i 2 + η t − ∑ j = 1 p β j η t − j \epsilon_{t}^{2}=\alpha_{0}+\sum_{i=1}^{\max (p, q)}\left(\alpha_{i}+\beta_{i}\right) \epsilon_{t-i}^{2}+\eta_{t}-\sum_{j=1}^{p} \beta_{j} \eta_{t-j} ϵt2=α0+i=1max(p,q)(αi+βi)ϵti2+ηtj=1pβjηtj
This is an ARMA form for the squared series ϵ t 2 \epsilon_{t}^{2} ϵt2.

The error { ϵ t } \left\{\epsilon_{t}\right\} {ϵt} is uncorrelated and stationary with mean zero and finite unconditional variance,
E ( ϵ t ∣ F t − 1 ) = 0 , E ( ϵ t ) = 0 , E ( ϵ t ϵ t − j ) = 0 j ≥ 1 var ⁡ ( ϵ t ) = E ( ϵ t 2 ) = α 0 1 − ( ∑ i = 1 m α i ) − ( ∑ j = 1 s β j ) \begin{aligned} E\left(\epsilon_{t} \mid \mathcal{F}_{t-1}\right) &=0, \quad E\left(\epsilon_{t}\right)=0, \quad E\left(\epsilon_{t} \epsilon_{t-j}\right)=0 \quad j \geq 1 \\ \operatorname{var}\left(\epsilon_{t}\right) &=E\left(\epsilon_{t}^{2}\right)=\frac{\alpha_{0}}{1-\left(\sum_{i=1}^{m} \alpha_{i}\right)-\left(\sum_{j=1}^{s} \beta_{j}\right)} \end{aligned} E(ϵtFt1)var(ϵt)=0,E(ϵt)=0,E(ϵtϵtj)=0j1=E(ϵt2)=1(i=1mαi)(j=1sβj)α0
provided that ∑ i = 1 max ⁡ ( m , s ) ( α i + β i ) < 1 \sum_{i=1}^{\max (m, s)}\left(\alpha_{i}+\beta_{i}\right)<1 i=1max(m,s)(αi+βi)<1

GARCH heavy tails



Identification of A R C H \mathrm{ARCH} ARCH and GARCH Models

  1. Modeling the mean equation and testing for A R C H \mathrm{ARCH} ARCH effects.
    • H 0 H_{0} H0 : no A R C H \mathrm{ARCH} ARCH effects versus H 1 : A R C H H_{1}: \mathrm{ARCH} H1:ARCH effects.
    • Use Q-statistics of squared residuals { ϵ ^ t 2 } \left\{\hat{\epsilon}_{t}^{2}\right\} {ϵ^t2} or LM test.
  2. Order determination :
    • PACF of the squared residuals ϵ ^ t 2 \widehat{\epsilon}_{t}^{2} ϵ t2 gives useful information about the ARCH order q q q (see Eq.(2)); 、
    • to identify GARCH models, we use information criteria.

Testing for ARCH Effects

类似于AR model的检验

Ljung-Box statistics

Consider testing H 0 : ( H_{0}:( H0:( No A R C H ) α 1 = α 2 = ⋯ = α q = 0 \mathrm{ARCH}) \alpha_{1}=\alpha_{2}=\cdots=\alpha_{q}=0 ARCH)α1=α2==αq=0 against H 1 : ( A R C H ) H_{1}:(\mathrm{ARCH}) H1:(ARCH) at least one α i ≠ 0 \alpha_{i} \neq 0 αi=0

  • Step 1 : Compute residuals { ϵ t ^ } \left\{\hat{\epsilon_{t}}\right\} {ϵt^} from mean equation regression.
  • Step 2 : Apply the usual Ljung-Box statistics Q ( m ) Q(m) Q(m) to { ϵ ^ t 2 } \left\{\hat{\epsilon}_{t}^{2}\right\} {ϵ^t2} series.
  • View/Residual Diagnostics : Correlogram Squared Residuals

Engle derived a simple LM test :

  • Step 1: Compute residuals { ϵ t ^ } \left\{\hat{\epsilon_{t}}\right\} {ϵt^} from mean equation regression.

  • Step 2 : Estimate auxiliary regression
    ϵ ^ t 2 = α 0 + α 1 ϵ ^ t − 1 2 + ⋯ + α q ϵ ^ t − q 2 +  error  t \hat{\epsilon}_{t}^{2}=\alpha_{0}+\alpha_{1} \hat{\epsilon}_{t-1}^{2}+\cdots+\alpha_{q} \hat{\epsilon}_{t-q}^{2}+\text { error }_{t} ϵ^t2=α0+α1ϵ^t12++αqϵ^tq2+ error t
    ​ Obtain R 2 ≡ R A U X 2 R^{2} \equiv R_{A U X}^{2} R2RAUX2 from this regression.

  • Step 3 : Form the LM test statistic
    L M A R C H = T ⋅ R A U X 2 L M_{A R C H}=T \cdot R_{A U X}^{2} LMARCH=TRAUX2
    ​ where T = T= T= sample size from auxiliary regression.

​ Under H 0 H_{0} H0 : (No A R C H ) , L M A R C H \mathrm{ARCH}), L M_{A R C H} ARCH),LMARCH is asymptotically distributed as χ 2 ( q ) \chi^{2}(q) χ2(q).

Estimation of ARCH/GARCH Models


The steps involved in actually estimating an A R C H \mathrm{ARCH} ARCH or GARCH model are as follows :
(1) Specify the appropriate equations for the mean and the variance - e.g. an A R ( 1 ) − G A R C H ( 1 , 1 ) \mathrm{AR}(1)-\mathrm{GARCH}(1,1) AR(1)GARCH(1,1) model:
y t = ϕ 0 + ϕ 1 y t − 1 + ϵ t , ϵ t = σ t v t , v t ∼ i . i . d N ( 0 , 1 ) , σ t 2 = α 0 + α 1 ϵ t − 1 2 + β 1 σ t − 1 2 . \begin{aligned} &y_{t}=\phi_{0}+\phi_{1} y_{t-1}+\epsilon_{t}, \epsilon_{t}=\sigma_{t} v_{t}, v_{t} \stackrel{i . i . d}{\sim} \mathcal{N}(0,1), \\ &\sigma_{t}^{2}=\alpha_{0}+\alpha_{1} \epsilon_{t-1}^{2}+\beta_{1} \sigma_{t-1}^{2} . \end{aligned} yt=ϕ0+ϕ1yt1+ϵt,ϵt=σtvt,vti.i.dN(0,1),σt2=α0+α1ϵt12+β1σt12.
(2) Specify the log-likelihood function to maximize:
L = − T 2 log ⁡ ( 2 π ) − 1 2 ∑ t = 1 T log ⁡ ( σ t 2 ) − 1 2 ∑ t = 1 T ( y t − ϕ 0 − ϕ 1 y t − 1 ) 2 σ t 2 . L=-\frac{T}{2} \log (2 \pi)-\frac{1}{2} \sum_{t=1}^{T} \log \left(\sigma_{t}^{2}\right)-\frac{1}{2} \sum_{t=1}^{T} \frac{\left(y_{t}-\phi_{0}-\phi_{1} y_{t-1}\right)^{2}}{\sigma_{t}^{2}} . L=2Tlog(2π)21t=1Tlog(σt2)21t=1Tσt2(ytϕ0ϕ1yt1)2.
(3) The computer will maximize the function and give estimates and their standard errors.

Model Checking

For a properly specified A R C H / G A R C H \mathrm{ARCH} / \mathrm{GARCH} ARCH/GARCH model, the standardized residuals
v ^ t = ϵ ^ t σ ^ t \widehat{v}_{t}=\frac{\widehat{\epsilon}_{t}}{\widehat{\sigma}_{t}} v t=σ tϵ t

  • If the mean equation is adequate
    • (i.e. the serial correlation in y t y_{t} yt is completely captured),
    • the residuals { ϵ ^ t } \left\{\widehat{\epsilon}_{t}\right\} {ϵ t} should behave as a white noise process.
    • Consequently, the standardized residuals { v ^ t } \left\{\widehat{v}_{t}\right\} {v t} should also behave as a white noise process.
  • If the volatility equation is adequate
    • (i.e. the dependence in ϵ t 2 \epsilon_{t}^{2} ϵt2 is completely captured),
    • the squared standardized residuals { v ^ t 2 = ϵ ^ t 2 σ ^ t 2 } \left\{\widehat{v}_{t}^{2}=\frac{\hat{\epsilon}_{t}^{2}}{\widehat{\sigma}_{t}^{2}}\right\} {v t2=σ t2ϵ^t2} should be uncorrelated across time.


  1. { v ^ t } \left\{\widehat{v}_{t}\right\} {v t} 的ljung-box统计量可以用来检验平均方程的充分性。
  2. { v ^ t 2 = ϵ ^ t 2 σ ^ t 2 } \left\{\widehat{v}_{t}^{2}=\frac{\hat{\epsilon}_{t}^{2}}{\widehat{\sigma}_{t}^{2}}\right\} {v t2=σ t2ϵ^t2} 的Ljung-Box统计量可以用来检验波动率方程的充分性。
  3. { v ^ t } \left\{\widehat{v}_{t}\right\} {v t}的偏度、峰度和QQ图可以用来检验分布假设的有效性。???
heavy tail 是用来干啥的以及以上

Forecasting Variances using ARCH Models

Interval Forecasting

Confidence bands for the 1-step ahead forecast at the forecast origin t \mathrm{t} t

  • The conditional distribution of y t + 1 y_{t+1} yt+1 given the information available at time t t t is D [ E ( y t + 1 ∣ F t ) , var ⁡ ( y t + 1 ∣ F t ) ] D\left[E\left(y_{t+1} \mid \mathcal{F}_{t}\right), \operatorname{var}\left(y_{t+1} \mid \mathcal{F}_{t}\right)\right] D[E(yt+1Ft),var(yt+1Ft)], where D D D depends on the distribution of v t v_{t} vt.
  • var ⁡ ( y t + 1 ∣ F t ) = σ t + 1 2 \operatorname{var}\left(y_{t+1} \mid \mathcal{F}_{t}\right)=\sigma_{t+1}^{2} var(yt+1Ft)=σt+12
  • Under normality assumption, 95% confidence interval of the prediction is [ E ( y t + 1 ∣ F t ) − 1.96 σ t + 1 , E ( y t + 1 ∣ F t ) + 1.96 σ t + 1 ] \left[E\left(y_{t+1} \mid \mathcal{F}_{t}\right)-1.96 \sigma_{t+1}, E\left(y_{t+1} \mid \mathcal{F}_{t}\right)+1.96 \sigma_{t+1}\right] [E(yt+1Ft)1.96σt+1,E(yt+1Ft)+1.96σt+1].
real data example 考吗?
  1. mean equation
  2. test for ARCH effect
  3. 根据残差项的平方的 ACF PACF判断是使用ARCH还是GARCH
  4. estimate mean equation+ARCH/GARCH
  5. Wald test-cofficient
  6. 有些系数不显著,简化模型
  7. Model checking,同时调整
  8. forecast
