自回归移动平均模型(ARMA)-平稳序列

自回归滑动平均模型(ARMA模型,Auto-Regression and Moving Average Model)是研究时间序列的重要方法,由自回归模型(AR模型)与滑动平均模型(MA模型)为基础“混合”而成,具有适用范围广、预测误差小的特点。ARMA原理分析,见此篇博客。

1. 导入python中的相关模块

import tushare as ts
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.tsa.api as smtsa
from statsmodels.tsa.stattools import adfuller as ADF
from statsmodels.tsa.arima_model import ARMA
from statsmodels.stats.diagnostic import acorr_ljungbox as acorr
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import warnings
warnings.filterwarnings('ignore')

2. 载入数据集

本模型使用的为某股票公开数据集

ts.set_token('46af5ae83a803c592eeff2620f87402fea47911650b115cf1f671f64')
pro = ts.pro_api()
data = pro.daily(ts_code='000001.SZ', start_date='20100101', end_date='20200101')
data.sort_values(by='trade_date', inplace=True)
data.reset_index(drop='True', inplace=True)
data# 查看data数据
ts_codetrade_dateopenhighlowclosepre_closechangepct_chgvolamount
0000001.SZ2010010424.5224.5823.6823.7124.37-0.66-2.7100241922.765.802495e+05
1000001.SZ2010010523.7523.9022.7523.3023.71-0.41-1.7300556499.821.293477e+06
2000001.SZ2010010623.2523.2522.7222.9023.30-0.40-1.7200412143.139.444537e+05
3000001.SZ2010010722.9023.0522.4022.6522.90-0.25-1.0900355336.858.041663e+05
4000001.SZ2010010822.5022.7522.3522.6022.65-0.05-0.2200288543.066.506674e+05
....................................
2356000001.SZ2019122516.4516.5616.2416.3016.40-0.10-0.6098414917.986.796646e+05
2357000001.SZ2019122616.3416.4816.3216.4716.300.171.0429372033.866.103818e+05
2358000001.SZ2019122716.5316.9316.4316.6316.470.160.97151042574.721.741473e+06
2359000001.SZ2019123016.4616.6316.1016.5716.63-0.06-0.3608976970.311.603153e+06
2360000001.SZ2019123116.5716.6316.3116.4516.57-0.12-0.7242704442.251.154704e+06

2361 rows × 11 columns

填补data中的缺失值

data = data.iloc[:, 1:]
data = data.fillna(method='ffill')# 用前一个非缺失值去填充该缺失值
data.head()
trade_dateopenhighlowclosepre_closechangepct_chgvolamount
02010010424.5224.5823.6823.7124.37-0.66-2.71241922.765.802495e+05
12010010523.7523.9022.7523.3023.71-0.41-1.73556499.821.293477e+06
22010010623.2523.2522.7222.9023.30-0.40-1.72412143.139.444537e+05
32010010722.9023.0522.4022.6522.90-0.25-1.09355336.858.041663e+05
42010010822.5022.7522.3522.6022.65-0.05-0.22288543.066.506674e+05
data = data[['trade_date', 'open', 'close', 'high', 'low']]
data.plot(subplots=True, figsize=(10, 12))
plt.title(' zhangshang stock attributes from 2010-01-01 to 2020-01-01')
plt.show()

在这里插入图片描述

3. 平稳性检验

#平稳性检验
adf = ADF(data['close'])
if adf[1] > 0.05:# adf[i]表示对data['close']数据进行1阶差分
    print(u'原始序列经检验不平稳,p值为:%s'%(adf[1]))
else:
    print(u'原始序列经检验平稳,p值为:%s'%(adf[1]))
原始序列经检验平稳,p值为:0.017668059342580877

4. 白噪声检验

p值越大表示数据是随机的可能性越大,随机性很大的数据没有研究意义。

#采用LB统计量的方法进行白噪声检验
p = acorr(data['close'], lags=1)
if p[1] < 0.05:
    print(u'原始序列非白噪声序列,p值为:%s'%p[1])
else:
    print(u'原始序列为白噪声序列,p值为:%s'%p[1])
原始序列非白噪声序列,p值为:[0.]

5. 模型识别

# 定义绘图函数plotds
def plotds (xt, nlag=30, fig_size=(12,8)):
    if not isinstance(xt, pd.Series): #判断xt是否是pd.Series类型数据,不是则转化为该类型数据
        xt = pd.Series(xt)
        
    plt.figure(figsize=fig_size)
    plt.plot(xt)# 原始数据时序图
    plt.title("Time Series")
    plt.show()
    
    plt.figure(figsize=fig_size)
    layout = (2, 2)
    ax_acf = plt.subplot2grid(layout, (1, 0))
    ax_pacf = plt.subplot2grid(layout, (1, 1))
    plot_acf(xt, lags=nlag, ax=ax_acf)# 自相关图
    plot_pacf(xt, lags=nlag, ax=ax_pacf)# 偏自相关图
    plt.show()
    
    return None


plotds(data['close'].dropna(), nlag=50)

在这里插入图片描述

在这里插入图片描述

#定阶
data_df = data.copy()
aicVal = []
for ari in range(1, 3):
    for maj in range(0,5):
        try:
            arma_obj = smtsa.ARMA(data_df.close.tolist(), order=(ari, maj))\
            .fit(maxlag=30, method='mle', trend='nc')
            aicVal.append([ari, maj, arma_obj.aic])
        except Exception as e:
            print(e)
            
aicVal
[[1, 0, 1958.0143376009719],
 [1, 1, 1958.7504776249589],
 [1, 2, 1960.3388645021796],
 [1, 3, 1962.0299202559945],
 [1, 4, 1963.6365504220475],
 [2, 0, 1958.7847615798437],
 [2, 1, 1960.5080165518111],
 [2, 2, 1957.0524190412552],
 [2, 3, 1959.0187650693679],
 [2, 4, 1961.0165491508224]]

6. 训练模型

aicVal的结果可以看到模型阶数为(2,2)时,AIC最小为1957.0524190412552,故选择(2,2)阶数的模型。

arma_obj_fin = smtsa.ARMA(data_df['close'].tolist(), order=(2, 2)).fit(maxlag=30, method='mle', trend='nc', disp=False)
arma_obj_fin.summary()
ARMA Model Results
Dep. Variable:y No. Observations: 2361
Model:ARMA(2, 2) Log Likelihood -973.526
Method:mle S.D. of innovations0.365
Date:Thu, 06 May 2021 AIC 1957.052
Time:19:06:47 BIC 1985.887
Sample:0 HQIC 1967.551
coefstd errzP>|z|[0.0250.975]
ar.L1.y 0.0320 nan nan nan nan nan
ar.L2.y 0.9676 nan nan nan nan nan
ma.L1.y 0.9468 nan nan nan nan nan
ma.L2.y -0.0329 0.020 -1.628 0.103 -0.073 0.007
Roots
Real Imaginary Modulus Frequency
AR.1 1.0002 +0.0000j 1.0002 0.0000
AR.2 -1.0333 +0.0000j 1.0333 0.5000
MA.1 -1.0200 +0.0000j 1.0200 0.5000
MA.2 29.7678 +0.0000j 29.7678 0.0000

7.模型拟合效果

#plot the curves
data_df["ARMA"] = arma_obj_fin.predict()
plt.figure(figsize=(10,8))
plt.plot(data_df['close'].iloc[-100:], color='b', label='Actual')
plt.plot(data_df["ARMA"].iloc[-100:], color='r', linestyle='--', label='ARMA(2,2)_pre')
plt.xlabel('index')
plt.ylabel('close price')
plt.legend(loc='best')
plt.show()

在这里插入图片描述

8. 模型预测

fig = arma_obj_fin.plot_predict(len(data_df)-50, len(data_df)+10)

在这里插入图片描述

predict = arma_obj_fin.predict(start=1, end=len(data_df)+10)
predict[-10:]
array([16.44779647, 16.45696253, 16.44202199, 16.4504122 , 16.43622479,
       16.44388842, 16.43040644, 16.43738965, 16.42456844, 16.43091446])
  • 5
    点赞
  • 52
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值