【计量经济学】时间序列分析笔记（上）

最新推荐文章于 2022-08-22 23:42:04 发布

拔剑吧！

最新推荐文章于 2022-08-22 23:42:04 发布

阅读量1.1k

点赞数

分类专栏：经济学课程学习文章标签：经验分享其他

本文链接：https://blog.csdn.net/weixin_48642879/article/details/125256766

版权

经济学课程学习专栏收录该内容

9 篇文章 10 订阅

订阅专栏

时间序列复习

密码 2022年6月11日-6月13日

额外的参考资料：北大金融时间序列的备课笔记

文章目录

时间序列复习
Chapter 2 Difference Equations and Their Solutions
Chapter 3 Univariate Time Series
Chapter 4 Modeling Volatility

Chapter 2 Difference Equations and Their Solutions

通解的描述

This solution $y_{t}=A \phi_{1}^{t}$ to the homogeneous equation is called the homogeneous solution.

If $\left|\phi_{1}\right|<1$ , ==the homogeneous solution converges to zero as $\rightarrow \infty$ .==Convergence is direct if $0<\phi_{1}<1$ and oscillatory if $-1<\phi_{1}<0$
If $\left|\phi_{1}\right|>1$ , the homogeneous solution is divergent. If $\phi_{1}>1$ , the solution approaches $\infty$ as $t$ increases. If $\phi_{1}<-1$ , the solution oscillates explosively.
If $\phi_{1}=1$ , any arbitrary constant $A$ satisfies the homogeneous equation and $y_{t}=y_{t-1}$ . If $\phi_{1}=-1, y_{t}=A$ for even values of $t$ and $y_{t}=-A$ for odd values of $t$ , and $y_{t}=-y_{t-1}$ .

odd:奇数 even: 偶数

差分方程的求解过程

找到齐次方程并求解
找到特解
得到通解
代入初始条件，得到一些常数参数的取值（如果有的话）

lag operators

For $∣ a ∣ < 1$ , the infinite sum
$\left(1+a L+a^{2} L^{2}+a^{3} L^{3}+\cdots\right) y_{t}=\frac{y_{t}}{1-a L}$
可以通过等比数列的求和公式得到

可以使用lag来求解差分方程

It is straightforward to use lag operators to solve linear difference equations. If $\left|\phi_{1}\right|<1$ , we obtain
$\begin{aligned} y_{t} &=\left(1-\phi_{1} L\right)^{-1}\left(\phi_{0}+x_{t}\right)=\left(1+\phi_{1} L+\phi_{1}^{2} L^{2}+\cdots\right)\left(\phi_{0}+x_{t}\right) \\ &=\frac{\phi_{0}}{1-\phi_{1}}+\sum_{j=0}^{\infty} \phi_{1}^{j} x_{t-j} \end{aligned}$

特征方程

characteristic equation and inverse characteristic equation

求解
$\alpha^{t}=\phi_{1} A \alpha^{t-1}+\phi_{2} A \alpha^{t-2} \Rightarrow \underbrace{\alpha^{2}-\phi_{1} \alpha-\phi_{2}=0}_{\text {characteristic equation }}$
Case 1: If $\phi_{1}^{2}+4 \phi_{2}>0, \alpha_{1}, \alpha_{2}=\frac{\phi_{1} \pm \sqrt{\phi_{1}^{2}+4 \phi_{2}}}{2}$
$y_{t}=A_{1} \alpha_{1}^{t}+A_{2} \alpha_{2}^{t}$
Case 2: If $\phi_{1}^{2}+4 \phi_{2}=0, \alpha_{1}=\alpha_{2}=\frac{\phi_{1}}{2}$
$y_{t}=A_{1}\left(\frac{\phi_{1}}{2}\right)^{t}+A_{2} t\left(\frac{\phi_{1}}{2}\right)^{t}$
Case 3: If $\phi_{1}^{2}+4 \phi_{2}<0, \alpha_{1}, \alpha_{2}=\frac{\phi_{1} \pm i \sqrt{-\phi_{1}^{2}-4 \phi_{2}}}{2}$ .
$\begin{aligned} &\alpha_{1}=\frac{\phi_{1}+i \sqrt{-\phi_{1}^{2}-4 \phi_{2}}}{2}=r(\cos \theta+i \sin \theta) \\ &\alpha_{2}=\frac{\phi_{1}-i \sqrt{-\phi_{1}^{2}-4 \phi_{2}}}{2}=r(\cos \theta-i \sin \theta) \end{aligned}$
where $r=\sqrt{-\phi_{2}}$ is the modulus of $\alpha_{1}$ and $\alpha_{2}$ ,

and $\theta=\arccos \left(\frac{\phi_{1}}{2 \sqrt{-\phi_{2}}}\right)$ is the argument of $\alpha_{1}$ and $\alpha_{2}$ .
$y_{t}=A_{1} \alpha_{1}^{t}+A_{2} \alpha_{2}^{t} \equiv B_{1} r^{t} \cos \left(\theta t+B_{2}\right)$

stability conditions

Stability requires that all characteristic roots (defined in Eq. (10)) lie within the unit circle, i.e. $\left|\alpha_{j}\right|<1$ for all $j$ .

a necessary condition for stability : $\sum_{j=1}^{p} \phi_{j}<1$ .
a sufficient condition for stability : $\sum_{j=1}^{p}\left|\phi_{j}\right|<1$ .
At least one characteristic root equals unity if

$\sum_{j=1}^{p} \phi_{j}=1$

Chapter 3 Univariate Time Series

介绍MA(q), AR§, ARMA(p,q)

white noise

$\left\{\epsilon_{t}\right\}$ is called a white noise process if for all $t$
$\begin{aligned} E\left(\epsilon_{t}\right) &=0 \quad \text { mean zero } \\ E\left(\epsilon_{t}^{2}\right) &=\operatorname{var}\left(\epsilon_{t}\right)=\sigma^{2} \quad \text { variance } \sigma^{2} \\ E\left(\epsilon_{t} \epsilon_{\tau}\right) &=\operatorname{cov}\left(\epsilon_{t}, \epsilon_{\tau}\right)=0, \text { for all } \tau \neq t \quad \text { uncorrelated across time } \end{aligned}$
If in addition, $\left\{\epsilon_{t}\right\}$ is independent across time, then it is called an independent white noise process.
If furthermore, $\epsilon_{t} \sim N\left(0, \sigma^{2}\right)$ , then we have the Gaussian white noise process.

stationarity

Strict or strong stationarity :

distributions are time-invariant. This is a very strong condition that is hard to verify empirically.
均值和方差不一定有限

Weak stationarity( covariance stationary) :

first 2 moments are time-invariant. In this course, we are mainly concerned with weakly stationary series.

A stochastic process $\left\{y_{t}\right\}$ having a finite mean and variance is covariance stationary (weakly stationary) if
(1) Mean (or expectation) is the same for each period:
$E\left(y_{t}\right)=\mu \text { for all } t$
(2) Variance (variability) is the same for each period:
$\operatorname{var}\left(y_{t}\right)=E\left[\left(y_{t}-\mu\right)^{2}\right]=\sigma_{y}^{2} \text { for all } t$
(3) Lag-k autocovariance :
$\gamma_{k}=\operatorname{cov}\left(y_{t}, y_{t-k}\right)=E\left[\left(y_{t}-\mu\right)\left(y_{t-k}-\mu\right)\right]~~ for ~all~ t ~and ~any~k$
与t无关，都是常数

Lag-k autocorrelation (or serial correlation)
$\rho_{k} \equiv \frac{\operatorname{cov}\left(y_{t}, y_{t-k}\right)}{\operatorname{var}\left(y_{t}\right)}=\frac{\gamma_{k}}{\gamma_{0}}$
在多元模型行中，自协方差是指 $y_t$ 和其滞后项之间的协方差，而协方差是指一个序列和另一个序列之间的协方差。在一元时间序列模型中不会产生歧义。

MA(q) models

$x_{t}=\mu+\sum_{j=0}^{q} \beta_{j} \epsilon_{t-j}\qquad M A(q)~ models$

$\beta_{0}$ is always set to be unity for normalization.

使用stationarity的定义可以得到MA(q)是stationarity的

绝对值求和收敛那么平方和也收敛

AR§ models

$y_{t}=\phi_{0}+\sum_{j=1}^{p} \phi_{j} y_{t-j}+\epsilon_{t}, \quad \mathrm{AR}(\mathrm{p}) \text { model. }$

Without initial conditions, the general solution to Eq.(6) is:
$y_{t}=A \phi_{1}^{t}+\frac{\phi_{0}}{1-\phi_{1}}+\sum_{j=0}^{\infty} \phi_{1}^{j} \epsilon_{t-j}, \quad \text { if }\left|\phi_{1}\right|<1 .$

The characteristic root $\phi_{1}$ must be less than unity in absolute value.
The homogeneous solution $\phi_{1}^{t}$ must be zero. Either the sequence must have started infinitely far in the past (so that $\phi_{1}^{t} \approx 0$ ) or the process must always be in equilibrium (so that $A = 0)$

或者

Stationarity conditions for AR§ processes

$\left|\alpha_{j}\right|<1$ for all $\cdots, p$ .
The homogeneous solution must be zero. Either the sequence must have started infinitely far in the past or the process must always be in equilibrium (so that the arbitrary constants are zero).

ARMA(p,q) models

$y_{t}=\phi_{0}+\sum_{j=1}^{p} \phi_{j} y_{t-j}+\sum_{j=0}^{q} \beta_{j} \epsilon_{t-j}, \quad \operatorname{ARMA}(\mathrm{p}, \mathrm{q}) \text { model. }$

特解：
$y_{t}=c+\sum_{j=0}^{\infty} c_{j} \epsilon_{t-j}$

特解+齐次方程的通解即可得到ARMA(p,q)的解

ACF

The plot of== $\gamma_{k}$ ==against $k$ is called the autocovariance function.
ACF: The plot of== $\rho_{k}$ == against $k$ is called the autocorrelation function (ACF) or correlogram.

$\rho_{k} \equiv \frac{\operatorname{cov}\left(y_{t}, y_{t-k}\right)}{\operatorname{var}\left(y_{t}\right)}=\frac{\gamma_{k}}{\gamma_{0}}$

Yule-Walker equations

用于分析ACF

The first $p$ Yule-Walker equations determine the initial conditions.

The key point is that $\left\{\gamma_{k}\right\}$ and $\left\{\rho_{k}\right\}$ eventually will satisfy the homogeneous equation of this $A R (p)$ process.
ACF should converge to zero geometrically if the series is stationary.

一定要注意每一项之间的关系，不要漏掉

Form the Yule-Walker equations :
$\begin{aligned} E\left(y_{t} y_{t}\right) &=\phi_{1} E\left(y_{t-1} y_{t}\right)+\phi_{2} E\left(y_{t-2} y_{t}\right)+E\left(\epsilon_{t} y_{t}\right) \\ & \Rightarrow \gamma_{0}=\phi_{1} \gamma_{1}+\phi_{2} \gamma_{2}+\sigma^{2} \\ E\left(y_{t} y_{t-1}\right)=& \phi_{1} E\left(y_{t-1} y_{t-1}\right)+\phi_{2} E\left(y_{t-2} y_{t-1}\right)+E\left(\epsilon_{t} y_{t-1}\right) \\ \Rightarrow & \gamma_{1}=\phi_{1} \gamma_{0}+\phi_{2} \gamma_{1} \\ E\left(y_{t} y_{t-k}\right)=& \phi_{1} E\left(y_{t-1} y_{t-k}\right)+\phi_{2} E\left(y_{t-2} y_{t-k}\right)+E\left(\epsilon_{t} y_{t-k}\right) \\ \Rightarrow & \gamma_{k}=\phi_{1} \gamma_{k-1}+\phi_{2} \gamma_{k-2} \quad \text { for } k \geq 2 \end{aligned}$

partial autocorrelation function(PACF)

$\begin{aligned} y_{t} &=\phi_{1,0}+\phi_{1,1} y_{t-1}+e_{1 t} \\ y_{t} &=\phi_{2,0}+\phi_{2,1} y_{t-1}+\phi_{2,2} y_{t-2}+e_{2 t} \\ y_{t} &=\phi_{3,0}+\phi_{3,1} y_{t-1}+\phi_{3,2} y_{t-2}+\phi_{3,3} y_{t-3}+e_{3 t} \\ y_{t} &=\phi_{4,0}+\phi_{4,1} y_{t-1}+\phi_{4,2} y_{t-2}+\phi_{4,3} y_{t-3}+\phi_{4,4} y_{t-4}+e_{4 t} \\ & \vdots \end{aligned}$

$\left\{\phi_{k, k}, k \geq 1\right\}$ is the partial autocorrelation function.

$e_{it}$ 不一定是白噪音
$e_{it}$ 与 $y_{t-1},y_{t-2},\cdots$ 无关

我们可以利用Yule-Walker方程从ACF中推导出PACF。
$\begin{aligned} E\left(y_{t} y_{t-1}\right) &=\phi_{2,1} E\left(y_{t-1} y_{t-1}\right)+\phi_{2,2} E\left(y_{t-2} y_{t-1}\right)+E\left(y_{t-1} e_{2 t}\right) \\ & \Rightarrow \gamma_{1}=\phi_{2,1} \gamma_{0}+\phi_{2,2} \gamma_{1} \Rightarrow \rho_{1}=\phi_{2,1}+\phi_{2,2} \rho_{1} \\ E\left(y_{t} y_{t-2}\right) &=\phi_{2,1} E\left(y_{t-1} y_{t-2}\right)+\phi_{2,2} E\left(y_{t-2} y_{t-2}\right)+E\left(y_{t-2} e_{2 t}\right) \\ & \Rightarrow \gamma_{2}=\phi_{2,1} \gamma_{1}+\phi_{2,2} \gamma_{0} \Rightarrow \rho_{2}=\phi_{2,1} \rho_{1}+\phi_{2,2} \end{aligned}$
Thus $\phi_{2,2}=\frac{\rho_{2}-\rho_{1}^{2}}{1-\rho_{1}^{2}}$ . For any $\phi_{k, k}, k \geq 1$ , the similar procedure works.

$y_{t}=\phi_{0}+\sum_{j=1}^{p} \phi_{j} y_{t-j}+\sum_{j=0}^{q} \beta_{j} \epsilon_{t-j}, \quad \operatorname{ARMA}(\mathrm{p}, \mathrm{q}) \text { model. }$
ACF，PACF

假设一个序列是平稳的，我们可以使用样本均值、方差、ACF和PACF来估计实际数据生成过程的参数。
样本ACF和样本PACF可以与各种理论函数进行比较，以帮助识别数据生成过程的实际性质。

Sample mean : $\bar{y}=\frac{\sum_{t=1}^{T} y_{t}}{T}$
Sample variance $\widehat{\sigma}_{y}^{2}=\frac{\sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}}{T}$ .
Lag-k sample autocorrelation:
$\widehat{\rho}_{k}=\frac{\sum_{t=k+1}^{T}\left(y_{t}-\bar{y}\right)\left(y_{t-k}-\bar{y}\right)}{\sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}} \quad k \geq 1 .$
The statistics $\left\{\widehat{\rho}_{1}, \widehat{\rho}_{2}, \ldots\right\}$ are called the sample ACF of $\left\{y_{t}\right\}$

ARMA(p,q)阶数的识别

t test

检验ACF的，即判断q有几阶

For a given positive integer $k$ , test $H_{0}: \rho_{k}=0$ against $H_{1}: \rho_{k} \neq 0$

If $\left\{y_{t}\right\}$ is a stationary Gaussian series satisfying $\rho_{j}=0$ for $\geq k$ (i.e., if $\left\{y_{t}\right\}$ is an $\mathrm{MA}(\mathrm{k}-1)$ with normally distributed $\left\{\epsilon_{t}\right\}$ ), then $\widehat{\rho}_{k}$ is asymptotically normal with mean zero and variance $\frac{1+2 \sum_{j=1}^{k-1} \rho_{j}^{2}}{T}$ . Therefore, the test statistic is
$\mathrm{t} \text { ratio }=\frac{\widehat{\rho}_{k}}{\sqrt{\left(1+2 \sum_{j=1}^{k-1} \widehat{\rho}_{j}^{2}\right) / T}} \stackrel{\mathcal{D}}{\longrightarrow} \mathcal{N}(0,1), \quad \text { as } T \rightarrow \infty$
where $\stackrel{\mathcal{D}}{\longrightarrow}$ denotes “converge in distribution”.
If $\left\{y_{t}\right\}$ is an i.i.d. sequence satisfying $\operatorname{var}\left(y_{t}\right)<\infty$ (i.e., if $\left\{y_{t}\right\}$ is an i.i.d. $\left.\mathrm{MA}(0)\right)$ , then $\widehat{\rho}_{k}$ is asymptotically normal with mean zero and variance $\frac{1}{T}$ for any $\geq 1$ . Thus $t$ ratio $=\frac{\widehat{\rho}_{k}}{\sqrt{1 / T}} \stackrel{\mathcal{D}}{\longrightarrow} \mathcal{N}(0,1)$ , for sufficiently large $T$ .
$\left\{\widehat{\rho}_{1}, \widehat{\rho}_{2}, \ldots\right\}$ will be calculated along with the acceptance intervals (significance level $\%$ ) under this i.i.d. assumption, once you put the data into Eviews or MATLAB.

检验过程

t>|1.96|拒绝 t<|1.96|接受

$H_0: \rho_1=0~ H_1:q\ge1$ . $t=\frac{\widehat{\rho}_{k}}{\sqrt{1 / T}} $
拒绝原假设，则继续检验
$H_0: \rho_2=0~ H_1:q\ge2$ . $\text { ratio }=\frac{\widehat{\rho}_{k}}{\sqrt{\left(1+2 \sum_{j=1}^{k-1} \widehat{\rho}_{j}^{2}\right) / T}}$
直到接受原假设，此时可与得到q的阶数

Joint Test (Ljung-Box Q-statistics)

联合检验ACF，判断q值

Significance test for a group of autocorrelations $H_{0}: \rho_{I}=\cdots=\rho_{m}=0$ against $H_{1}: \rho_{i} \neq 0$ for some $\leq i \leq m$
Under the assumption that $\left\{y_{t}\right\}$ is an i.i.d. sequence with certain moment conditions
$\sum_{k=1}^{m} \frac{\widehat{\rho}_{k}^{2}}{T-k} \stackrel{\mathcal{D}}{\longrightarrow} \chi_{m}^{2}$
Decision rule : reject $H_{0}$ if $Q(m)>\chi_{m}^{2}(\alpha)$ , where $\chi_{m}^{2}(\alpha)$ denotes the $100(1-\alpha)$ th percentile of a chi-squared distribution with $m$ degrees of freedom.

Q检验也可以判断残差是否为白噪音，但自由度会减小
$\sum_{k=1}^{m} \frac{\tilde{\rho}_{k}^{2}}{T-k} \stackrel{\mathcal{D}}{\longrightarrow} \chi_{m-g}^{2},$
where $\tilde{\rho}_{k}$ is the sample ACF of estimation residuals, and $g$ denotes 模型中固定的常数的个数

PACF可以帮助识别P

对于平稳的AR§模型

$\widehat{\phi}_{k, k}$ converges to $\phi_{k, k}$ in probability as the sample size $T$ goes to infinity.

$\phi_{p, p}=\phi_{p}$
$\phi_{k, k}=0$ for $k > p$ .
For $\widehat{\phi}_{k, k}$ is asymptotically normal with mean zero and variance $\frac{1}{T}$

t>|1.96|拒绝 t<|1.96|接受

$H_0: \rho_1=0~ H_1:p\ge1$ . $|PACF|>\frac{ 1.96}{\sqrt{1 / T}} $ 拒绝原假设
拒绝原假设，则继续检验
$H_0: \rho_2=0~ H_1:p\ge2$ . $|PACF|>\frac{ 1.96}{\sqrt{1 / T}} $ 拒绝原假设
直到接受原假设，此时可与得到q的阶数

Criteria: AIC, BIC

自由度和残差的一些trade off

Akaike information criterion (AIC):
$C(I)=\underbrace{\log \left(\tilde{\sigma}_{l}^{2}\right)}_{\text {goodness of fit }}+\underbrace{\frac{2 I}{T}}_{\text {penalty function }}$
where $/$ is the number of parameters estimated and $\tilde{\sigma}_{I}^{2}=\frac{S S R}{T}$ .
Bayesian information criterion (BIC) or Schwarz information criterion $(\mathrm{SBC}, \mathrm{SIC})$ :
$C(I)=\underbrace{\log \left(\widetilde{\sigma}_{l}^{2}\right)}_{\text {goodness of fit }}+\underbrace{\frac{\log (T)}{T}}_{\text {penalty function }}$
Choose the model with minimum $\mathrm{AIC}$ or $\mathrm{BIC}$ .
BIC适用于大样本。BIC将渐近地提供正确的模型，而AIC则倾向于选择一个过参数化的模型。
在小样本中，AIC比BIC工作得更好。在BIC的背景下，来自给定总体中不同样本的选定模型顺序的平均变化将大于AIC。

如果他们建议不同的模型，应该选择哪个IC？

由于BIC选择了更简洁的模型，因此您应该检查以确定残差是否显示为白噪声。
由于AIC可以选择一个过度参数化的模型，因此所有系数的t-统计量在传统水平上都应该是显著的。
可能无法找到一个明显占优于所有其他模型的模型

检验：残差是否为白噪音预测未来的数据

Box-Jenkins Approach

Box and Jenkins $(1970, 1976)$ popularized a three-stage method to estimate an ARMA model in a systematic manner.

Identification
Estimation
Model diagnostic checking
Identification

First of all, one might visually examine the time plot of the series, sample ACF and sample PACF. A comparison of the sample ACF and PACF to those of various theoretical ARMA processes may suggest several candidate models.
Model specification: AIC, BIC.

Parsimony简洁原则

Similar processes can be approximated by very different models.
Common factor problem.
Each coefficient is significantly different from zero at the conventional level.

Estimation

Estimation can be done using least squares or maximum likelihood depending on the model.

AR models : least squares method or maximum likelihood method
MA and ARMA models : maximum likelihood method

残差平方和最小和极大似然估计

Stationarity and Invertibility

t-stats, ACF, Q-stats, $\cdots$ all assume that the process is stationary.
Be suspicious of implied roots near the unit circle.
Invertibility implies the model has an AR representation.
- No unit root in MA part of the model. 在MA模型中无单位根
- ARMA(p，q)过程的可逆性条件完全由它的MA部分决定。

极大似然估计?

待完善。。。

Model diagnostic checking

Residual diagnostics :
- Plot residuals : look for outliers and periods of poor fit.
- Residuals should be serially uncorrelated : examine ACF, PACF, Q-stats of residuals.残差要序列不相关
Divide sample into subperiods 样本内预测，看拟合效果
Out-of-sample forecasts 样本外预测

预测

条件均值（期望）

This result is really quite general: for any stationary ARMA model, the conditional forecast of $y_{t+j}$ converges to the unconditional mean as $\rightarrow \infty$ .
$e_t(j)=y_{t+j}-E_ty_{t+j}$
The $\%$ confidence interval for the $j$ -step ahead forecast is :
$\left[E_{t} y_{t+j}-1.96 \sqrt{\operatorname{var}\left(e_{t}(j)\right)}, E_{t} y_{t+j}+1.96 \sqrt{\operatorname{var}\left(e_{t}(j)\right)}\right] \text {. }$

test whether a forecast is accurate or not

We want the forecast errors to be small!

If there are $\mathrm{H}$ observations in the holdback periods, and $\left\{e_{j}\right\}_{j=1}^{H}$ are the forecast errors from the candidate model :

Mean squared prediction error: MSPE $=\frac{1}{H} \sum_{j=1}^{H} e_{j}^{2}$ . It is also called mean squared error (MSE).
Mean absolute error: $\mathrm{MAE}=\frac{1}{H} \sum_{j=1}^{H}\left|e_{j}\right|$ .
Mean absolute percentage error: MAPE $=\frac{1}{H} \sum_{j=1}^{H}\left|\frac{e_{j}}{y_{T+j}}\right| \cdot 100$ .

Many researchers would select the model with the smallest MSPE (or MAE, MAPE).

Diebold-Mariano Test

Let the loss from a forecast error in period $j$ be denoted by $g\left(e_{j}\right)$ . In the typical case of mean squared errors, the loss is $e_{j}^{2}$
We can write the differential loss in period $j$ from using model 1 versus model 2 as $d_{j}=g\left(e_{1 j}\right)-g\left(e_{2 j}\right)$ . The mean loss can be obtained as
$\bar{d}=\frac{1}{H} \sum_{j=1}^{H}\left[g\left(e_{1 j}\right)-g\left(e_{2 j}\right)\right]$
Under the null hypothesis of equal forecast accuracy,
- $H_0:\quad E(\bar{d})=E\left(d_{j}\right)=0$
- $H_1: \quad E(\bar{d})>0 ? E(\bar{d})<0 ?$ （模型2更好，模型1更好）

Under fairly weak conditions, the central limit theorem implies that $\bar{d} \stackrel{\mathcal{D}}{\longrightarrow} \mathcal{N}(0, \operatorname{var}(\bar{d}))$ , as $\rightarrow \infty$ , under the null hypothesis.

If the $\left\{d_{j}\right\}$ series is serially uncorrelated with a sample variance of $\hat{\gamma}$ , 若序列无关the estimator of $\operatorname{var}(\bar{d})$ is simply $\frac{\widehat{\gamma}}{H-1}$ . The expression
$\frac{\bar{d}}{\sqrt{\hat{\gamma} /(H-1)}} \stackrel{\mathcal{D}}{\longrightarrow} \mathcal{N}(0,1), \quad \text { as } \quad H \rightarrow \infty,$
under the null hypothesis.
If the $\left\{d_{j}\right\}$ series is serially correlated,若序列相关 there is a very large literature on the best way to estimate $\operatorname{var}(\bar{d})$ in the presence of serial correlation (e.g., the Newey-West estimator of the variance proposed in Newey and West $(1987))$ .
$\frac{\bar{d}}{\sqrt{\hat{\operatorname{var}(\bar{d})}}} \stackrel{\mathcal{D}}{\longrightarrow} \mathcal{N}(0,1), \quad \text { as } \quad H \rightarrow \infty$
under the null hypothesis, where $\hat{\operatorname{var}(\bar{d})}$ is an appropriate estimator of $\operatorname{var}(\bar{d})$ .

双边检验：1.96

单边检验：1.645

$H_1: \quad E(\bar{d})>0 ? $ 模型二更好1.645

$E(\bar{d})<0 ?$ 模型一更好-1.645

time series with trend

做差分去趋势化(log difference)
直接回归方程

周期性或季节性ARMA model

拟合模型应同时考虑数据中的季节模式和非季节模式。

Chapter 4 Modeling Volatility

the volatility equation:
$\sigma_{t}^{2}=\operatorname{var}\left(y_{t} \mid \mathcal{F}_{t-1}\right)=\operatorname{var}\left(\epsilon_{t} \mid \mathcal{F}_{t-1}\right)$

Volatility $\sigma_{t}$ : the conditional standard deviation of $y_{t}$ based on a past information set $\mathcal{F}_{t-1}$ .

ARCH(q) 自回归条件异方差

A natural idea is to model $\epsilon_{t}^{2}$ using an $A R (q)$ process:
$\begin{aligned} \epsilon_{t}^{2} &=\alpha_{0}+\alpha_{1} \epsilon_{t-1}^{2}+\cdots+\alpha_{q} \epsilon_{t-q}^{2}+\eta_{t} \\ \Rightarrow \sigma_{t}^{2} &=\alpha_{0}+\alpha_{1} \epsilon_{t-1}^{2}+\cdots+\alpha_{q} \epsilon_{t-q}^{2} \end{aligned}$
Eq. is called an autoregressive conditional heteroskedasticity $(\mathbf{A R C H})$ model of order $q$ .

ARCH (q) model
$\begin{aligned} y_{t} &=E\left(y_{t} \mid \mathcal{F}_{t-1}\right)+\epsilon_{t}, \quad \epsilon_{t}=\sigma_{t} v_{t} \\ \sigma_{t}^{2} &=\alpha_{0}+\alpha_{1} \epsilon_{t-1}^{2}+\cdots+\alpha_{q} \epsilon_{t-q}^{2} \end{aligned}$

This is an ARCH (q) model.
$\alpha_{0}>0$ , and $\alpha_{i} \geq 0$ for $i > 0$ for positiveness.
$\sum_{i=1}^{q} \alpha_{i}<1$ for stationarity.
$\left\{v_{t}\right\}$ is a sequence of i.i.d.r.v. with mean 0 and variance 1 .

协方差Cov( , )反映的是线性相关关系

Corr( , )表示是否相关独立是p(xy)=p(x)p(y)

善用全期望公式
$\begin{aligned} E\left(\epsilon_{t}\right) &=E\left[E\left(\epsilon_{t} \mid \mathcal{F}_{t-1}\right)\right]=0 \\ E\left(\epsilon_{t} \epsilon_{t-j}\right) &=E\left[E\left(\epsilon_{t} \epsilon_{t-j} \mid \mathcal{F}_{t-1}\right)\right]=E\left[\epsilon_{t-j} E\left(\epsilon_{t} \mid \mathcal{F}_{t-1}\right)\right]=0 \quad j \geq 1 \end{aligned}$

$\epsilon_t^2$

Assuming stationarity $\left(\alpha_{1}+\cdots+\alpha_{q}<1\right)$
$\begin{aligned} \operatorname{var}\left(\epsilon_{t}\right) &=E\left(\epsilon_{t}^{2}\right)=E\left[E\left(\epsilon_{t}^{2} \mid \mathcal{F}_{t-1}\right)\right]=E\left(\sigma_{t}^{2}\right) \\ &=\alpha_{0}+\alpha_{1} E\left(\epsilon_{t-1}^{2}\right)+\cdots+\alpha_{q} E\left(\epsilon_{t-q}^{2}\right) \end{aligned}$
which implies that
$\operatorname{var}\left(\epsilon_{t}\right)=E\left(\sigma_{t}^{2}\right)=\frac{\alpha_{0}}{1-\alpha_{1}-\cdots-\alpha_{q}}$
The error $\left\{\epsilon_{t}\right\}$ is uncorrelated and stationary with mean zero and constant unconditional variance (with constraints to $\mathrm{ARCH}$ parameters).

$\epsilon_t^2$ 是一个白噪音，uncorrelated, but dependent. 非线性相关关系

ARCH Model 可以描述序列的平稳性和波动性

Heavy tails

long tail / fat tail/ heavy tail

Kurtosis of a random variable $y$ is defined to be $\frac{E\left[(y-E(y))^{4}\right]}{[\operatorname{var}(y)]^{2}}$ . For example,
the kurtosis of a normal distribution is $3 .$
the kurtosis of a student’s $\mathrm{t}$ distribution with $\nu$ degrees of freedom is $\frac{6}{\nu-4}+3$ , for $\nu>4$ .
Kurtosis identifies whether the tails of a given distribution contain extreme values.
Excess kurtosis=kurtosis-3 defines how heavily the tails of a distribution differ from the tails of a normal distribution.

ARCH heavy tail

Advantages

Simplicity
$\mathrm{ARCH}$ can model the volatility clustering effect since the conditional variance is autoregressive. Such models can be used to forecast volatility.
Heavy tails (high kurtosis)

Weaknesses

Symmetric between positive & negative prior shocks
Restrictive on parameter space

Idea : $\mathrm{ARCH}$ is like an $\mathrm{AR}$ model for volatility. $\mathrm{GARCH}$ is like an ARMA model for volatility.

$G A R C H (p, q)$ model :

$\begin{aligned} y_{t} &=E\left(y_{t} \mid \mathcal{F}_{t-1}\right)+\epsilon_{t} \quad \epsilon_{t}=\sigma_{t} v_{t} \\ \sigma_{t}^{2} &=\alpha_{0}+\sum_{i=1}^{q} \alpha_{i} \epsilon_{t-i}^{2}+\sum_{j=1}^{p} \beta_{j} \sigma_{t-j}^{2} \end{aligned}$

$\alpha_{0}>0$ , and $\alpha_{i} \geq 0, \beta_{j} \geq 0$ for $i, j > 0$ ensure positiveness.
$\sum_{i=1}^{\max (p, q)}\left(\alpha_{i}+\beta_{i}\right)<1$ ensures stationarity.（证明如下）
$\left\{v_{t}\right\}$ is a sequence of i.i.d. r.v. with mean 0 and variance 1 .

Re-parameterization :
Let $\eta_{t}=\epsilon_{t}^{2}-\sigma_{t}^{2} .\left\{\eta_{t}\right\}$ are uncorrelated series. The $\mathrm{GARCH}$ model becomes
$\epsilon_{t}^{2}=\alpha_{0}+\sum_{i=1}^{\max (p, q)}\left(\alpha_{i}+\beta_{i}\right) \epsilon_{t-i}^{2}+\eta_{t}-\sum_{j=1}^{p} \beta_{j} \eta_{t-j}$
This is an ARMA form for the squared series $\epsilon_{t}^{2}$ .

The error $\left\{\epsilon_{t}\right\}$ is uncorrelated and stationary with mean zero and finite unconditional variance,
$\begin{aligned} E\left(\epsilon_{t} \mid \mathcal{F}_{t-1}\right) &=0, \quad E\left(\epsilon_{t}\right)=0, \quad E\left(\epsilon_{t} \epsilon_{t-j}\right)=0 \quad j \geq 1 \\ \operatorname{var}\left(\epsilon_{t}\right) &=E\left(\epsilon_{t}^{2}\right)=\frac{\alpha_{0}}{1-\left(\sum_{i=1}^{m} \alpha_{i}\right)-\left(\sum_{j=1}^{s} \beta_{j}\right)} \end{aligned}$
provided that $\sum_{i=1}^{\max (m, s)}\left(\alpha_{i}+\beta_{i}\right)<1$

GARCH heavy tails

一般来说，一个GARCH（1,1）模型将足以捕获数据中的波动性聚类。

除了ARCH,GARCH，还有其他的包含过去信息的形式的模型

Identification of $\mathrm{ARCH}$ and GARCH Models

Modeling the mean equation and testing for $\mathrm{ARCH}$ effects.
- $H_{0}$ : no $\mathrm{ARCH}$ effects versus $H_{1}: \mathrm{ARCH}$ effects.
- Use Q-statistics of squared residuals $\left\{\hat{\epsilon}_{t}^{2}\right\}$ or LM test.
Order determination :
- PACF of the squared residuals $\widehat{\epsilon}_{t}^{2}$ gives useful information about the ARCH order $q$ (see Eq.(2)); 、
- to identify GARCH models, we use information criteria.

Testing for ARCH Effects

类似于AR model的检验

Ljung-Box statistics

Consider testing $H_{0}:($ No $\mathrm{ARCH}) \alpha_{1}=\alpha_{2}=\cdots=\alpha_{q}=0$ against $H_{1}:(\mathrm{ARCH})$ at least one $\alpha_{i} \neq 0$

Step 1 : Compute residuals $\left\{\hat{\epsilon_{t}}\right\}$ from mean equation regression.
Step 2 : Apply the usual Ljung-Box statistics $Q (m)$ to $\left\{\hat{\epsilon}_{t}^{2}\right\}$ series.
View/Residual Diagnostics : Correlogram Squared Residuals

Engle derived a simple LM test :

Step 1: Compute residuals $\left\{\hat{\epsilon_{t}}\right\}$ from mean equation regression.
Step 2 : Estimate auxiliary regression
$\hat{\epsilon}_{t}^{2}=\alpha_{0}+\alpha_{1} \hat{\epsilon}_{t-1}^{2}+\cdots+\alpha_{q} \hat{\epsilon}_{t-q}^{2}+\text { error }_{t}$
Obtain $R^{2} \equiv R_{A U X}^{2}$ from this regression.
Step 3 : Form the LM test statistic
$M_{A R C H}=T \cdot R_{A U X}^{2}$
where $T =$ sample size from auxiliary regression.

Under $H_{0}$ : (No $\mathrm{ARCH}), L M_{A R C H}$ is asymptotically distributed as $\chi^{2}(q)$ .

Estimation of ARCH/GARCH Models

由于非线性，所以只能使用极大似然估计

The steps involved in actually estimating an $\mathrm{ARCH}$ or GARCH model are as follows :
(1) Specify the appropriate equations for the mean and the variance - e.g. an $\mathrm{AR}(1)-\mathrm{GARCH}(1,1)$ model:
$\begin{aligned} &y_{t}=\phi_{0}+\phi_{1} y_{t-1}+\epsilon_{t}, \epsilon_{t}=\sigma_{t} v_{t}, v_{t} \stackrel{i . i . d}{\sim} \mathcal{N}(0,1), \\ &\sigma_{t}^{2}=\alpha_{0}+\alpha_{1} \epsilon_{t-1}^{2}+\beta_{1} \sigma_{t-1}^{2} . \end{aligned}$
(2) Specify the log-likelihood function to maximize:
$L=-\frac{T}{2} \log (2 \pi)-\frac{1}{2} \sum_{t=1}^{T} \log \left(\sigma_{t}^{2}\right)-\frac{1}{2} \sum_{t=1}^{T} \frac{\left(y_{t}-\phi_{0}-\phi_{1} y_{t-1}\right)^{2}}{\sigma_{t}^{2}} .$
(3) The computer will maximize the function and give estimates and their standard errors.

Model Checking

For a properly specified $\mathrm{ARCH} / \mathrm{GARCH}$ model, the standardized residuals
$\widehat{v}_{t}=\frac{\widehat{\epsilon}_{t}}{\widehat{\sigma}_{t}}$

If the mean equation is adequate
- (i.e. the serial correlation in $y_{t}$ is completely captured),
- the residuals $\left\{\widehat{\epsilon}_{t}\right\}$ should behave as a white noise process.
- Consequently, the standardized residuals $\left\{\widehat{v}_{t}\right\}$ should also behave as a white noise process.
If the volatility equation is adequate
- (i.e. the dependence in $\epsilon_{t}^{2}$ is completely captured),
- the squared standardized residuals $\left\{\widehat{v}_{t}^{2}=\frac{\hat{\epsilon}_{t}^{2}}{\widehat{\sigma}_{t}^{2}}\right\}$ should be uncorrelated across time.

综上所述

$\left\{\widehat{v}_{t}\right\}$ 的ljung-box统计量可以用来检验平均方程的充分性。
$\left\{\widehat{v}_{t}^{2}=\frac{\hat{\epsilon}_{t}^{2}}{\widehat{\sigma}_{t}^{2}}\right\}$ 的Ljung-Box统计量可以用来检验波动率方程的充分性。
$\left\{\widehat{v}_{t}\right\}$ 的偏度、峰度和QQ图可以用来检验分布假设的有效性。???

heavy tail 是用来干啥的以及以上

Forecasting Variances using ARCH Models

Interval Forecasting

Confidence bands for the 1-step ahead forecast at the forecast origin $\mathrm{t}$

The conditional distribution of $y_{t+1}$ given the information available at time $t$ is $D\left[E\left(y_{t+1} \mid \mathcal{F}_{t}\right), \operatorname{var}\left(y_{t+1} \mid \mathcal{F}_{t}\right)\right]$ , where $D$ depends on the distribution of $v_{t}$ .
$\operatorname{var}\left(y_{t+1} \mid \mathcal{F}_{t}\right)=\sigma_{t+1}^{2}$
Under normality assumption, 95% confidence interval of the prediction is $\left[E\left(y_{t+1} \mid \mathcal{F}_{t}\right)-1.96 \sigma_{t+1}, E\left(y_{t+1} \mid \mathcal{F}_{t}\right)+1.96 \sigma_{t+1}\right]$ .