时序分析 15 协整序列上

最新推荐文章于 2022-02-12 22:35:51 发布

置顶 Magic Ktwc37

最新推荐文章于 2022-02-12 22:35:51 发布

阅读量1.4k

点赞数 2

分类专栏：时序分析文章标签： python 数据分析金融时序分析协整时间序列平稳过程

本文链接：https://blog.csdn.net/weixin_43171270/article/details/110821031

版权

时序分析专栏收录该内容

50 篇文章

订阅专栏

本文介绍了金融时序分析中协整序列的概念，强调了序列的平稳性对于分析的重要性。通过实例展示了平稳序列A和非平稳序列B的生成，并通过移动平均和ADF检验来说明平稳性和非平稳性的区别。此外，还探讨了Order of Integration的概念，解释了如何通过差分将序列转化为平稳过程。最后，以微软股票价格为例，展示了实际数据的一阶差分和乘法收益的平稳性检验。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

时序分析 15 协整序列上

协整序列(Cointegration)

在金融时序分析领域，经常会涉及到有两个时序本身存在经济本质关联。例如，金融衍生品和标的资产的价格，美元指数和石油。因为其经济实质，它们的时序数据不会偏离太远或者存在某种必然的联系。
协整序列就是研究这种问题的一种手段。
本文会先从理论部分给出协整序列的定义和解释其依赖概念，然后给出实践事例。

理论部分
在讨论协整序列之前，我们需要引入平稳随机过程的概念。本系列文章前面所讨论的各种技术方法都没有要求时序数据是严格平稳的。前面所提到的协方差平稳实际上是弱平稳，而一般说的平稳过程指的是严格平稳。
而强平稳与弱平稳的区别就是：
- 弱平稳要求的是一阶和二阶矩不变。（均值，方差，协方差)
- 强平稳要求时序的所有统计特性都是不变的。
进一步讲，平稳过程假设产生该过程的参数是不随时间推移而改变的。让我们来模拟两个时序，一个的生成参数不改变而另一个是变化的。

import numpy as np
import pandas as pd

import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint, adfuller

import matplotlib.pyplot as plt

序列 A （平稳)

def generate_datapoint(params):
    mu = params[0]
    sigma = params[1]
    return np.random.normal(mu, sigma)

# Set the parameters and the number of datapoints
params = (0, 1)
T = 100

A = pd.Series(index=range(T))
A.name = 'A'

for t in range(T):
    A[t] = generate_datapoint(params)

plt.plot(A)
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend(['Series A']);

在这里插入图片描述

序列 B (不平稳)

# Set the number of datapoints
T = 100

B = pd.Series(index=range(T))
B.name = 'B'

for t in range(T):
    # Now the parameters are dependent on time
    # Specifically, the mean of the series changes over time
    params = (t * 0.1, 1)
    B[t] = generate_datapoint(params)

plt.plot(B)
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend(['Series B']);

在这里插入图片描述
很多时序的统计检验都要求时序数据是平稳的，如果违反这个原则，结果是不可预知的且是无效的。让我们看一个例子，我们对非平稳数据B取平均值。

m = np.mean(B)

plt.plot(B)
plt.hlines(m, 0, len(B), linestyles='dashed', colors='r')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend(['Series B', 'Mean']);

在这里插入图片描述
平均值的结果没有错，但是它毫无意义，这个序列的长期趋势并不是这个平均值。
现在让我们对时序平稳性进行检验，

def check_for_stationarity(X, cutoff=0.01):
    # H_0 in adfuller is unit root exists (non-stationary)
    # We must observe significant p-value to convince ourselves that the series is stationary
    pvalue = adfuller(X)[1]
    if pvalue < cutoff:
        print('p-value = ' + str(pvalue) + ' The series ' + X.name +' is likely stationary.')
        return True
    else:
        print('p-value = ' + str(pvalue) + ' The series ' + X.name +' is likely non-stationary.')
        return False

check_for_stationarity(A);
check_for_stationarity(B);

p-value = 1.558759378041957e-06 The series A is likely stationary.
p-value = 0.8502071583742886 The series B is likely non-stationary.

很明显，结果是符合预期的。让我们再试一个比较微妙的例子，

# Set the number of datapoints
T = 100

C = pd.Series(index=range(T))
C.name = 'C'

for t in range(T):
    # Now the parameters are dependent on time
    # Specifically, the mean of the series changes over time
    params = (np.sin(t), 1)
    C[t] = generate_datapoint(params)

plt.plot(C)
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend(['Series C']);

在这里插入图片描述
这里，我们用正弦函数模拟了循环移动平均值。想把这种情况与噪声区分是比较困难的，在实践中有些时候统计检验并不能很好的帮助我们。

check_for_stationarity(C);

p-value = 0.03862100125339725 The series C is likely non-stationary.

移动平均表示

在时序分析领域，移动平均是对时序的一个重要的描述方法。(AR,MA,ARMA前面文章已有详细阐述)。任何一个时序都可以以如下方式描述:
$Y_t = \sum_{j=0}^\infty b_j \epsilon_{t-j} + \eta_t$

$\epsilon$ is the ‘innovation’ series(变更序列)
$b_j$ are the moving average weights of the innovation series(变更项的权重)
$\eta$ is a deterministic series(确定序列)
这里， $\eta$ 是一个确定项，例如正弦波。而变更过程是一个随机项，它代表当新的事件发生后所带来的新信息对数据的时时改变。
进一步讲， $\epsilon_t = \hat Y_t - Y_t$ 这里 $\hat Y_t$ 是在拥有 $t$ 时刻以前的信息时对 $Y_t$ 的最优估计。 $b_j$ 表示 $\epsilon$ 前面的值对 $Y_t$ 有多大的影响。

现在我们可以引入 Order of Integration的概念: 表示为 $I (i)$

这个概念就是指从原时序得到一个协方差平稳的时序需要做多少次差分。
$I (0)$ 就是在移动平均的表示下·
$\sum_{k=0}^\infty |b_k|^2 < \infty$
也就是说，时序中的自协方差衰减的非常快。（这还不是平稳过程,但平稳过程一定是 𝐼(0) )
时序为Order of Integration $d$ ,就是满足 $1-L)^dY_t$ 是平稳过程，其中 $L$ 为 lag operator, $(1 - L)$ 就代表一阶差分， $d$ 代表做了几次差分。
如果 $Y_t$ 为 $I (1)$ ,对其进行差分就可以得到一个 $I (0)$ 的时序；换言之，对 $I (0)$ 时序进行累积和就可以得到一个 $I (1)$ 的时序。

让我们来演示一下，

A 为平稳过程

plt.plot(A)
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend(['Series A']);

在这里插入图片描述

A1为 $I (1)$

A1 = np.cumsum(A)

plt.plot(A1)
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend(['Series A1']);

A2为 $I (2)$

 A2 = np.cumsum(A1)

plt.plot(A2)
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend(['Series A2']);

在这里插入图片描述

我们看一下真实数据的情况

import yfinance as yf
tickerSymbol = 'MSFT'
tickerData = yf.Ticker(tickerSymbol)
tickerDf = tickerData.history(period='1d',start='2019-01-01',end='2020-01-01')
tickerDf.head()

在这里插入图片描述

X=tickerDf['Open']
check_for_stationarity(X);

p-value = 0.7505522076130329 The series Open is likely non-stationary.

看一下时序图

plt.plot(X.index, X.values)
plt.ylabel('Price')
plt.legend([X.name]);

在这里插入图片描述

一阶差分（加法收益)

X1 = X.diff()[1:]
X1.name = X.name + ' Additive Returns'
check_for_stationarity(X1)
plt.plot(X1.index, X1.values)
plt.ylabel('Additive Returns')
plt.legend([X1.name]);

p-value = 2.7841038621445375e-30 The series Open Additive Returns is likely stationary.
在这里插入图片描述

乘法收益

X1 = X.pct_change()[1:]
X1.name = X.name + ' Multiplicative Returns'
check_for_stationarity(X1)
plt.plot(X1.index, X1.values)
plt.ylabel('Multiplicative Returns')
plt.legend([X1.name]);