观察沪深300股指收盘点时,会发现大部分个股的收盘价走势与股指不是同步的。下面一段程序提供了一个方法寻找沪深300股指收盘点位和其成分股收盘价格匹配度比较高的个股。 此段程序的主要思路是先确定股指和个股收盘价格的线性关系系数,然后通过计算其线性关系中的残差项,对其进行ADF检测来评估哪些个股走势与沪深300股指走势比较相近。
股指和个股使用数据来自tushare. http://tushare.org/trading.html
统计时间段为2014-10-1到2017-10-1,只发现15只个股走势和沪深300指数比较相近,同步度比较高 的股票代码如下:
‘’002024', '600038', '002415', '601333', '000060', '600406', '600332', '000540', '000402', '601988', '601998', '601328', '601111', '600048', '000069']
贴几张走势图,但有的走势图看起来涨跌幅的匹配度并不高。 下面三张图中,第三张有些时候匹配不是很好。
**********************************************************************************************
***********************************************************************************************
**********************************************************************************************
以下是Python程序:
#_*_coding:utf-8_*_ ''' Version: V17.1.0 Date: 2017-11-5 @Author: Cheney ''' # 从tushare上获取数据,查询时间段2014.10-2017.10中,HS300股票收盘价格走势和HS300股指相似的个股 # Part I import datetime import numpy as np import pandas as pd import tushare as ts import matplotlib.pyplot as plt import traceback import statsmodels.tsa.stattools as sts import statsmodels.api as sm t = datetime.datetime.now() print ('Program is starting ... %s' %t) def plot_price_relation(df, start, end, st_a, st_b='hs300'): ''' Draw HS300 Index and stocks price relation plot df--DataFrame, index is date, columns are stock and hs300 index close start and end -- set the start and end date for stock and HS300 Index comparision st_a , st_b -- stock code and hs300 code or label ''' fig, (ax,bx) = plt.subplots(nrows=2) x_date = [datetime.datetime.strptime(d, '%Y-%m-%d').date() for d in df.index] ax.plot(x_date, df[st_b], label=st_b, c='g') ax.set_title("%s index and stock %s daily prices relation" % (st_b, st_a)) ax.set_xticklabels([]) ax.set_ylabel("HS300Index") ax.grid(True) ax.legend(loc='best') bx.plot(x_date, df[st_a], label=st_a, c='b') bx.set_xlabel("Year/Month") bx.set_ylabel("Stock Price") bx.grid(True) bx.legend(loc='best') fig.autofmt_xdate() # Save figures in a folder or show in time plt.savefig('hs30index_pair_stock_plot/ %s+%s.png' % (st_b, st_a)) # plt.show() def get_df_close(stocka, stockb): # Transform stock data as dateframe format and keep the close columns and date index # stocka and stockb--stocks code, like '600036' sta = ts.get_hist_data(stocka) stb = ts.get_hist_data(stockb) # To build a new DataFrame to get the close of stock and HS300 Index df = pd.concat([sta, stb], axis=1) df = df['close'].fillna(method='ffill') df.columns = ['%s' %stocka, '%s' % stockb] return df #Part II if __name__ == "__main__": start = datetime.datetime(2014,10,1).strftime('%Y-%m-%d') end = datetime.datetime(2017,10,1).strftime('%Y-%m-%d') # Get HS300 stocks code list hs_name = 'hs300' hs = ts.get_hs300s() hs_list = hs['code'] stockADF = {} for code in hs_list: #Get the stock and hs300 index close data df = get_df_close(code, hs_name) #Calculate the linear model's coefficient x_value= df['%s'%code] x = sm.add_constant(x_value) y = list(df['%s'%hs_name]) try: #Calcualte the residuals of linear model, if it can't get the fit data, it will raise exception res = sm.OLS(y, x_value) res = res.fit() betaCoef = res.params[0] if (betaCoef-betaCoef) != 0: raise except: print ("Can't catch the res params of stock %s and %s"%(code,hs_name)) traceback.print_exc() continue df['res'] = df['%s'%hs_name] - betaCoef * df['%s'%code] tempStockADF = sts.adfuller(df['res']) #Save the ADF test value in a dict for polting price comparision figure stockADF[code+''+ hs_name] = [tempStockADF[0], tempStockADF[4]['1%']] #Compare the ADF test value and 1% salient threshold to estimate whether meet stationary time series for key,value in stockADF.items(): if value[0] < value[1]: print ("The best pairs stocks %s, ADF values %s and percent-1 %s" %(key,value[0],value[1])) keyCode = key.strip("\'\'") code, hs_name = keyCode[:6], keyCode[-5:] df = get_df_close(code,hs_name) plot_price_relation(df, start, end, '%s'%code,'%s'%hs_name) print ('Program total running time is %s' %(datetime.datetime.now() -t))以上是量化交易学习中一点点的知识积累,有不足之处还望大牛多多指导。