接下来就是为VAR模型定阶,可以让阶数从1逐渐增加,当AIC值尽量小时,可以确定最大滞后期。我们使用最小二乘法,求解每个方程的系数,并通过逐渐增加阶数,为模型定阶,Python代码如下:
1# 模型阶数从1开始逐一增加
2rows, cols = subdata_diff1.shape
3aicList = []
4lmList = []
5
6for p in range(1,11):
7 baseData = None
8 for i in range(p,rows):
9 tmp_list = list(subdata_diff1[i,:]) + list(subdata_diff1[i-p:i].flatten())
10 if baseData is None:
11 baseData = [tmp_list]
12 else:
13 baseData = np.r_[baseData, [tmp_list]]
14 X = np.c_[[1]*baseData.shape[0],baseData[:,cols:]]
15 Y = baseData[:,0:cols]
16 coefMatrix = np.matmul(np.matmul(np.linalg.inv(np.matmul(X.T,X)),X.T),Y)
17 aic = np.log(np.linalg.det(np.cov(Y - np.matmul(X,coefMatrix),rowvar=False))) + 2*(coefMatrix.shape[0]-1)**2*p/baseData.shape[0]
18 aicList.append(aic)
19 lmList.append(coefMatrix)
20
21#对比查看阶数和AIC
22pd.DataFrame({“P”:range(1,11),“AIC”:aicList})
23# P AIC
24# 0 1 13.580156
25# 1 2 13.312225
26# 2 3 13.543633
27# 3 4 14.266087
28# 4 5 15.512437
29# 5 6 17.539047
30# 6 7 20.457337
31# 7 8 24.385459
32# 8 9 29.438091
33# 9 10 35.785909
如上述代码所示,当p=2时,AIC值最小为13.312225。因此VAR模型定阶为2,并可从对象lmList[1]中获取各指标对应的线性模型。
基于lmList[1]中获取各指标对应的线性模型,对未来30期的数据进行预测,并与验证数据集进行比较分析,Python代码如下:
1p = np.argmin(aicList)+1
2n = rows
3preddf = None
4for i in range(30):
5 predData = list(subdata_diff1[n+i-p:n+i].flatten())
6 predVals = np.matmul([1]+predData,lmList[p-1])
7 # 使用逆差分运算,还原预测值
8 predVals=data.iloc[n+i,:].values[:4]+predVals
9 if preddf is None:
10 preddf = [predVals]
11 else:
12 preddf = np.r_[preddf, [predVals]]
13 # 为subdata_diff1增加一条新记录
14 subdata_diff1 = np.r_[subdata_diff1, [data.iloc[n+i+1,:].values[:4] - data.iloc[n+i,:].values[:4]]]
15
16#分析预测残差情况
17(np.abs(preddf - data.iloc[-30:data.shape[0],:4])/data.iloc[-30:data.shape[0],:4]).describe()
18# High Low Open Close
19# count 30.000000 30.000000 30.000000 30.000000
20# mean 0.010060 0.009380 0.005661 0.013739
21# std 0.008562 0.009968 0.006515 0.013674
22# min 0.001458 0.000115 0.000114 0.000130
23# 25% 0.004146 0.001950 0.001653 0.002785
24# 50% 0.007166 0.007118 0.002913 0.010414
25# 75% 0.014652 0.012999 0.006933 0.022305
26# max 0.039191 0.045802 0.024576 0.052800
从上述代码第17行可以看出这4个指标的最大百分误差率分别为3.9191%、4.5802%、2.4576%、5.28%,最小百分误差率分别为0.1458%、0.0115%、0.0114%、0.013%,进一步,绘制二维图表观察预测数据与真实数据的逼近情况,Python代码如下:
1import matplotlib.pyplot as plt
2plt.figure(figsize=(10,7))
3for i in range(4):
4 plt.subplot(2,2,i+1)
5 plt.plot(range(30),data.iloc[-30: