python获取数据并回测A股市场的所有股票

1 篇文章 0 订阅
1 篇文章 0 订阅

前言:

      第一次写文章哈哈!分享我最近研究的回测A股的脚本。我是一个完全没有任何编程基础的人,写代码都得是边写边查,浏览器经常开了很多标签,还有用来翻译异常问题的百度搜索。近来文字型AI发展迅猛,所以我还会打开AI对话的窗口,方便我问它要解决方法和代码,目前我最常用的国内大模型是科大讯飞的讯飞星火,准确率比较高,但有时也会回答一些错的,anyway,能借用的工具必须全部利用起来。在这个过程中,我花了大量的时间去写代码和debug,到后面一步一步优化代码以及优化过程,慢慢加入以前没接触过的功能,比如浮点数运算、多线程回测、数据保存成二进制文件或numpy文件,用到了很多新的库,学习到好多好多新的知识,CSDN这个网站也帮我解决了很多很多过程中遇到的难题,非常感谢在这里分享心得的博主。

大致步骤:

第一步:建立获取单个股票数据的脚本

第二步:编写想要回测的技术指标并封装成函数

第三步:遍历沪深京所有股票,调用指标函数回测,获取回测数据并保存

第四步:数据的统计

前期准备:

      通过安装conda安装python,编译器用的是jupyter lab,真是太好用了,超级喜欢,没接触jupyter之前,我用的是Thonny,jupyter的笔记本用来写代码真是太棒了,占用空间也不大,运行完变量数据还在,还能单独运行单元格,非常适合初学者用于学习!

第一步:建立获取单个股票数据的脚本

这里比较简单,使用python的开源库akshare爬取国内财经网站的数据,完全免费,童叟无欺,比tushare简直好太多了。

函数可以在这里面查:Welcome to AKShare’s Online Documentation! — AKShare 1.11.31 文档

import akshare as ak
import pandas as pd
import sys

data = pd.DataFrame()

#symbol = sys.argv[1]
symbol = '000004'
print(f'{symbol}回测开始')

df=ak.stock_zh_a_hist(symbol = symbol,
    period = 'daily',
    start_date = '20000101',
    end_date = '20500101',
    adjust = 'qfq'
    )
#df.to_excel(f'{symbol}.xlsx', index=False)
df
日期开盘收盘最高最低成交量成交额振幅涨跌幅涨跌额换手率
02000-01-048.478.668.678.2865775.628000e+064.632.730.231.58
12000-01-058.678.948.978.4357075.044000e+066.243.230.281.37
22000-01-068.929.329.398.80323603.039400e+076.604.250.387.77
32000-01-079.179.799.799.15213032.088600e+076.875.040.475.11
42000-01-1010.2010.2810.289.93246232.540500e+073.585.010.495.91
....................................
54112023-10-0916.0415.7616.3415.71846801.351435e+083.93-1.68-0.276.71
54122023-10-1015.9317.3417.3415.841553772.674170e+089.5210.031.5812.30
54132023-10-1118.2317.5618.5017.483740276.687257e+085.881.270.2229.62
54142023-10-1217.0017.3217.7617.001664272.884935e+084.33-1.37-0.2413.18
54152023-10-1317.1018.3318.3917.081316552.335914e+087.565.831.0110.43

5416 rows × 11 columns

我的公式要用到月线,所以再爬

df_monthly=ak.stock_zh_a_hist(symbol = symbol,
    period = 'monthly',
    start_date = '20000101',
    end_date = '20500101',
    adjust = 'qfq'
    )
mike(df_monthly, 'm')
df_monthly['日期'] = pd.to_datetime(df_monthly['日期'])
#df_monthly.to_excel(f'{symbol}.xlsx', index=False)
df_monthly
日期开盘收盘最高最低成交量成交额振幅涨跌幅涨跌额换手率WEKRzfyq冲不冲
02009-10-306.426.629.735.442262072.373081e+09122.5789.143.1289.48NaNTrueFalse
12009-11-275.908.058.405.836201526.386345e+0938.8221.601.43245.319.086667TrueFalse
22009-12-318.706.789.176.583874714.580679e+0932.17-15.78-1.27153.279.099865FalseFalse
32010-01-296.877.328.156.453413433.827560e+0925.077.960.54135.029.127664TrueFalse
42010-02-266.759.499.786.753310394.204930e+0941.3929.642.17104.769.147199TrueTrue
.............................................
1622023-06-3013.3414.0016.3012.30356538934.927038e+1029.543.400.46195.998.001622FalseFalse
1632023-07-3113.9610.3713.969.78226345082.544471e+1029.86-25.93-3.63124.429.405911FalseFalse
1642023-08-3110.3510.8711.229.13232667182.388117e+1020.154.820.50127.9010.640708FalseTrue
1652023-09-2810.8710.0610.938.69147968861.446398e+1020.61-7.45-0.8181.3411.656688FalseFalse
1662023-10-129.9510.1710.669.7730294353.121413e+098.851.090.1116.6512.434831FalseFalse

167 rows × 14 columns

第二步:编写技术指标

这是一个金融群分享的指标,名字叫mike

先看通达信的源码:

这是盘面

这是源码

HLC:=REF(EMA((HIGH+LOW+CLOSE)/3,10),1);
HV:=EMA(HHV(HIGH,10),3);
LV:=EMA(LLV(LOW,10),3);
WEKR:EMA(HLC*2-LV,8),LINETHICK5,COLORBLUE;
IF(C>=REF(WEKR,1),WEKR,DRAWNULL),LINETHICK5,COLORMAGENTA;
IF(C<=REF(WEKR,1),WEKR,DRAWNULL),LINETHICK5,COLORBLUE;
DRAWKLINE(HIGH,OPEN,LOW,CLOSE);
涨幅要求2:=(C/REF(C,1)-1)*100>7;
冲:=O<WEKR AND C>WEKR  AND 涨幅要求2 ;
STICKLINE(冲,O,C,4,0),COLORYELLOW;
DRAWTEXT(冲,L*0.955, '冲'),COLORRED;

这里面有一个很重要的函数,就是EMA,在python要写个函数给它。

下面是转译成python的样子,输入的变量就是爬来的df表格。

import pandas as pd
import numpy as np

def ema(data,n):
    #df[f'ema_{n}'] = np.nan
    ema_values = []
    for i in range(len(data)):
        if i == 0 or pd.isnull(data[i-1]):    #去掉第一个格子,或上一个格子是空白的格子
            #df.loc[i, f'ema_{n}'] = df.loc[i, column]
            ema_values.append(data[i])  #直接加入列表
        else:
            #df.loc[i, f'ema_{n}'] = (2 * df.loc[i, column] + (n-1) * df.loc[i-1, f'ema_{n}'])/(n+1)  
            ema_values.append((2*data[i] + (n-1)*ema_values[i-1])/(n+1))  # #EMA的基础公式:EMA = 前一日EMA x (N-1)/(N+1) + 当日收盘价 x 2/(N+1)
    return ema_values

def mike(df, data_type = 'd'):   #主函数,在DataFrame上以增加列的方式保存过程中间的变量
    HIGH = df['最高']
    LOW = df['最低']
    CLOSE = df['收盘']
    df['data1'] = (HIGH+LOW+CLOSE)/3
    df['HLC'] = ema(df['data1'].values, 10)
    df['HLC'] = df['HLC'].shift(1)
    df.fillna(0)
    df['HHV(HIGH,10)'] = HIGH.rolling(10, min_periods=1).max()  #一段时间最高价翻译过来是这样子
    df['LLV(LOW,10)'] = LOW.rolling(10, min_periods=1).min()  #如上
    df['HV'] = ema(df['HHV(HIGH,10)'].values,3)
    df['LV'] = ema(df['LLV(LOW,10)'].values,3)
    df['HLC*2-LV'] = df['HLC']*2-df['LV']
    df['WEKR'] = ema(df['HLC*2-LV'].values, 8)
    df['zfyq'] = df['涨跌幅']-7
    df['zfyq'] = df['zfyq'].apply(lambda x: True if x >= 0 else False)  #+-数值转成布林值
    df['WEKR>O'] = df['WEKR']-df['开盘']
    df['WEKR>O'] = df['WEKR>O'].apply(lambda x: True if x >= 0 else False)
    df['C>WEKR'] = df['收盘']-df['WEKR']
    #df.fillna(0)
    df['C>WEKR'] = df['C>WEKR'].apply(lambda x: True if x >= 0 else False)
    df['On-1<WEKR'] =  df['WEKR'] - df['收盘'].shift(1)
    df['On-1<WEKR'] = df['On-1<WEKR'].apply(lambda x: True if x >= 0 else False)
    tj1 = df.filter(items=['WEKR>O', 'On-1<WEKR']).any(axis=1)
    tj2 = df['zfyq']
    tj3 = df['C>WEKR']
    if data_type == 'd':
        df['冲不冲'] = tj1 & tj2 & tj3
    elif data_type == 'm':
        df['冲不冲'] = tj1 & tj3
    else:sys.exit()
    df.drop(['data1', 'HLC', 'HHV(HIGH,10)', 'LLV(LOW,10)', 'HV', 'LV', 'HLC*2-LV', 'WEKR>O', 'C>WEKR', 'On-1<WEKR'], axis=1, inplace=True)  #最后把没用的过程变量去掉

日线计算简单,直接mike(df,'d'),第二个参数是选日线还是月线

mike(df, 'd')
df
日期开盘收盘最高最低成交量成交额振幅涨跌幅涨跌额换手率WEKRzfyq冲不冲
02009-10-306.426.629.735.442262072.373081e+09122.5789.143.1289.48NaNTrueFalse
12009-11-025.905.976.565.90753287.190947e+089.97-9.82-0.6529.809.086667FalseFalse
22009-11-036.146.156.315.99417474.004070e+085.363.020.1816.518.996162FalseFalse
32009-11-046.146.426.556.13336823.357801e+086.834.390.2713.328.852258FalseFalse
42009-11-056.486.646.856.42324053.350685e+086.703.430.2212.828.697696FalseFalse
.............................................
32642023-09-2810.1110.0610.189.995756995.789843e+081.890.100.013.1610.233231FalseFalse
32652023-10-099.9510.1910.419.779250709.380054e+086.361.290.135.0910.321869FalseFalse
32662023-10-1010.2810.3510.6610.189199249.616347e+084.711.570.165.0610.428242FalseFalse
32672023-10-1110.2110.3810.5810.187027607.296557e+083.860.290.033.8610.563653FalseFalse
32682023-10-1210.3910.1710.4110.104816824.921174e+082.99-2.02-0.212.6510.710706FalseFalse

3269 rows × 14 columns

由于当日的月线是到算到当天的收盘为止,所以当月之前的数据用爬来的月线数据, 当月的用当月的日线数据手动算出。这个是后一步,先把符合条件的日子选出来

import sys

rows = df[df['冲不冲']==True].copy()
if rows.size == 0:
    print(f'获得{symbol}回测数据完成,共出现{rows.shape[0]}个信号\n',end='')
    sys.exit(0)
rows.reset_index(inplace=True) 
rows
index日期开盘收盘最高最低成交量成交额振幅涨跌幅涨跌额换手率WEKRzfyq冲不冲日期范围
0182009-11-256.677.397.396.67331323.656763e+0810.7910.790.7213.117.252044TrueTrue(2009-10-31, 2009-11-30]
1922010-03-2210.6411.1911.4010.51154322.543407e+088.527.180.754.8810.581057TrueTrue(2010-02-28, 2010-03-31]
21022010-04-0612.7313.7413.7512.72205564.105921e+088.057.430.956.5013.260945TrueTrue(2010-03-31, 2010-04-30]
34202011-08-015.576.006.135.57809822.492260e+0810.138.500.477.845.976972TrueTrue(2011-07-31, 2011-08-31]
44352011-08-236.406.886.916.34842902.884709e+088.968.180.528.166.775105TrueTrue(2011-07-31, 2011-08-31]
55522012-02-213.864.134.173.75755261.601069e+0810.947.550.297.313.883402TrueTrue(2012-01-31, 2012-02-29]
68242013-04-083.283.723.723.251043701.998146e+0814.0311.040.375.173.672354TrueTrue(2013-03-31, 2013-04-30]
78612013-06-054.454.764.914.432724154.106805e+0810.817.210.328.404.547126TrueTrue(2013-05-31, 2013-06-30]
88832013-08-236.786.786.786.7823565.011339e+060.0010.420.640.076.274634TrueTrue(2013-07-31, 2013-08-31]
98922013-09-059.4710.2810.439.392470057.613480e+0810.847.190.697.829.536524TrueTrue(2013-08-31, 2013-09-30]
109042013-09-2510.2111.1211.2810.151759125.919190e+0811.028.490.875.5510.356636TrueTrue(2013-08-31, 2013-09-30]
119642013-12-258.639.269.378.551656174.649416e+089.507.300.635.239.068894TrueTrue(2013-11-30, 2013-12-31]
129662013-12-278.869.759.758.841688914.935936e+0810.209.300.835.339.163578TrueTrue(2013-11-30, 2013-12-31]
139752014-01-2011.1111.1111.1110.722910679.950094e+083.8710.331.048.2310.337553TrueTrue(2013-12-31, 2014-01-31]
1410562014-05-237.638.228.227.624390445.420473e+088.0410.190.766.078.042405TrueTrue(2014-04-30, 2014-05-31]
1511022014-10-3010.0610.0610.0610.0654448.405736e+060.0010.190.930.089.769167TrueTrue(2014-09-30, 2014-10-31]
1611232014-11-2810.3611.0011.2710.296188051.029956e+099.587.530.778.5210.378976TrueTrue(2014-10-31, 2014-11-30]
1711482015-01-0610.9312.0712.0710.774593098.192253e+0811.8710.231.126.2912.040501TrueTrue(2014-12-31, 2015-01-31]
1811562015-01-1612.2513.1613.2912.204162748.174527e+088.998.491.035.7013.106821TrueTrue(2014-12-31, 2015-01-31]
1911822015-03-0214.6015.9515.9714.606233511.454102e+099.4510.001.458.5415.109282TrueTrue(2015-02-28, 2015-03-31]
2011992015-04-0215.8917.0917.2615.886577331.651666e+098.758.301.319.0116.850508TrueTrue(2015-03-31, 2015-04-30]
2112192015-05-0422.4824.7224.7222.283299721.170745e+0910.8710.112.274.5223.557275TrueTrue(2015-04-30, 2015-05-31]
2212352015-05-2623.7525.7125.7523.745085431.919091e+098.468.211.956.8025.080889TrueTrue(2015-04-30, 2015-05-31]
2313692015-12-1712.4613.6113.6112.3710404201.386511e+0910.0310.111.259.1113.276117TrueTrue(2015-11-30, 2015-12-31]
2414472016-04-149.029.719.909.0010165339.929497e+0810.018.010.728.849.212066TrueTrue(2016-03-31, 2016-04-30]
2518752018-01-116.256.776.776.245129473.428190e+088.6210.080.623.716.276158TrueTrue(2017-12-31, 2018-01-31]
2621622019-03-214.915.355.414.8912347026.541604e+0810.598.960.448.765.309214TrueTrue(2019-02-28, 2019-03-31]
2722232019-06-214.244.334.454.159289684.099626e+087.437.180.296.594.156205TrueTrue(2019-05-31, 2019-06-30]
2823082019-10-283.803.803.803.803978581.535731e+080.0010.140.352.813.700198TrueTrue(2019-09-30, 2019-10-31]
2923932020-03-043.654.074.073.6014689085.876640e+0812.7410.300.389.013.751133TrueTrue(2020-02-29, 2020-03-31]
3024002020-03-133.924.544.543.9215013406.606580e+0815.0510.190.429.214.244405TrueTrue(2020-02-29, 2020-03-31]
3124172020-04-084.314.314.314.31764353.340189e+070.0010.230.400.474.264800TrueTrue(2020-03-31, 2020-04-30]
3224272020-04-225.746.356.355.7229729481.848982e+0910.9210.050.5818.246.329473TrueTrue(2020-03-31, 2020-04-30]
3324432020-05-196.046.556.555.9723574841.513926e+099.7510.080.6014.466.298994TrueTrue(2020-04-30, 2020-05-31]
3424562020-06-056.156.876.876.0712520518.465352e+0812.8210.100.637.686.507364TrueTrue(2020-05-31, 2020-06-30]
3524762020-07-076.487.037.156.4421698951.513698e+0910.948.320.5413.326.985538TrueTrue(2020-06-30, 2020-07-31]
3625212020-09-086.937.437.446.8417849931.300526e+098.667.220.5010.967.133406TrueTrue(2020-08-31, 2020-09-30]
3725682020-11-205.676.066.245.6716030409.721904e+0810.4010.580.589.845.833056TrueTrue(2020-10-31, 2020-11-30]
3826442021-03-164.514.815.004.5010335465.019764e+0811.369.320.416.344.757991TrueTrue(2021-02-28, 2021-03-31]
3927392021-08-034.595.525.524.5524433621.321823e+0921.1320.260.9314.545.031415TrueTrue(2021-07-31, 2021-08-31]
4027632021-09-065.275.866.225.2727243601.618332e+0918.2712.690.6616.215.387002TrueTrue(2021-08-31, 2021-09-30]
4129692022-07-183.924.144.153.898083933.298014e+086.909.810.374.584.096755TrueTrue(2022-06-30, 2022-07-31]
4231192023-03-015.996.626.745.9821054771.354974e+0912.609.780.5911.576.170033TrueTrue(2023-02-28, 2023-03-31]
4331322023-03-207.177.387.747.1729329692.219677e+098.6211.650.7716.126.660683TrueTrue(2023-02-28, 2023-03-31]
4431412023-03-318.289.399.638.1122155611.987753e+0918.4013.681.1312.188.896722TrueTrue(2023-02-28, 2023-03-31]
4531542023-04-2011.2713.1213.4911.2630273203.711599e+0919.4914.691.6816.6412.197207TrueTrue(2023-03-31, 2023-04-30]
4631602023-04-2812.2513.7014.2812.2530215914.088951e+0916.4811.201.3816.6113.535083TrueTrue(2023-03-31, 2023-04-30]
4731942023-06-2014.3515.4816.3014.2327098334.162040e+0914.437.871.1314.9014.631142TrueTrue(2023-05-31, 2023-06-30]
4832422023-08-299.7310.7010.969.7021197032.208865e+0913.2112.161.16

11.65

10.649530TrueTrue(2023-07-31, 2023-08-31]

在符合条件的日子中,挑选同时符合月线条件的日子。找到上面选出来符合日线条件的日子的月份,先取所在月份之前的所有月线数据。再计算当月当前日期之前的月线数据。最后合并在一起计算MIKE指标买入的True/False,并入上面的表格rows。

monthly_data = pd.DataFrame(index=range(len(rows)), columns=rows.columns)
monthly_data['日期'] = rows['日期']
monthly_data.drop(columns = ['index'], inplace=True)

#i=2
for i in range(len(rows)):
    the_day = rows.loc[i]
    group = df[df['日期范围']==the_day['日期范围']]
    group = group[group['日期']<=the_day['日期']]
    group.reset_index(inplace=True)
    if group.size > 0:
        monthly_data.loc[i, '开盘'] = group.head(1)['开盘'].values
        monthly_data.loc[i, '收盘'] = group.tail(1)['收盘'].values
        monthly_data.loc[i, '最高'] = group['最高'].max()
        monthly_data.loc[i, '最低'] = group['最低'].min()
        divisor = df_monthly[df_monthly['日期'] < the_day['日期']].tail(1)['收盘']
        if divisor.size>0 and divisor.values!=0:
            monthly_data.loc[i, '涨跌幅'] = (group.tail(1)['收盘'].values/divisor.values - 1)*100
            monthly_data.loc[i, '涨跌幅'] = monthly_data.loc[i, '涨跌幅'].round(2)
            #monthly_match.loc[i, '上月月线WEKR'] = df_monthly[df_monthly['日期'] < the_day['日期']].tail(1)['WEKR'].values
        #计算月WEKR,需要将本月月线数据并入股票月线数据表并计算WEKR
        #日期前所有月线
        df_before_date = df_monthly[df_monthly['日期'] < the_day['日期']].copy()
        df_insert = pd.DataFrame(monthly_data.loc[i]).transpose()   #计算出的月线数据
        df_insert.dropna(axis=1, inplace=True)
        if df_insert.size > 0:
            df_before_date = pd.concat([df_before_date, df_insert], axis=0)  #并入本月以前的月线数据
        mike(df_before_date)
        rows.loc[i, '日月穿'] = df_before_date.tail(1)['冲不冲'].values
df_insert.dropna(axis=1)

再筛选出‘日月穿’列为True 的列

rows_ryc = rows[rows['日月穿']==True].copy()
if rows_ryc.size == 0:
    print(f'获得{symbol}回测数据完成,共出现{rows_ryc.shape[0]}个信号\n',end='')
rows_ryc.size
index日期开盘收盘最高最低成交量成交额振幅涨跌幅涨跌额换手率WEKRzfyq冲不冲日期范围日月穿
78612013-06-054.454.764.914.432724154.106805e+0810.817.210.328.404.547126TrueTrue(2013-05-31, 2013-06-30]True
1711482015-01-0610.9312.0712.0710.774593098.192253e+0811.8710.231.126.2912.040501TrueTrue(2014-12-31, 2015-01-31]True
1811562015-01-1612.2513.1613.2912.204162748.174527e+088.998.491.035.7013.106821TrueTrue(2014-12-31, 2015-01-31]True
3224272020-04-225.746.356.355.7229729481.848982e+0910.9210.050.5818.246.329473TrueTrue(2020-03-31, 2020-04-30]True

最后选出来这4行就是同时符合日线和月线的买入信号。

最后我要找出这些日期中,1日后、3日后、5日后、10日后、20日后的涨幅

def zf_after_index(index, daynum):
    try:
        zf = df.loc[index+daynum, '收盘'] / df.loc[index, '收盘'] - 1
        zf*=100
        return zf.round(2)
    except:pass

for i in rows_ryc.index:
    #print(i)
    for daynum in [1,3,5,10,20]:
        zf = zf_after_index(rows_ryc.loc[i, 'index'], daynum)
        rows_ryc.loc[i, f'{daynum}日后涨幅'] = zf

rows_ryc['symbol'] = str(symbol)
rows_ryc
index日期开盘收盘最高最低成交量成交额振幅涨跌幅...zfyq冲不冲日期范围日月穿1日后涨幅3日后涨幅5日后涨幅10日后涨幅20日后涨幅symbol
78612013-06-054.454.764.914.432724154.106805e+0810.817.21...TrueTrue(2013-05-31, 2013-06-30]True-0.2112.1818.49-0.6322.06300002
1711482015-01-0610.9312.0712.0710.774593098.192253e+0811.8710.23...TrueTrue(2014-12-31, 2015-01-31]True-0.333.562.9015.2416.57300002
1811562015-01-1612.2513.1613.2912.204162748.174527e+088.998.49...TrueTrue(2014-12-31, 2015-01-31]True0.468.135.701.902.81300002
3224272020-04-225.746.356.355.7229729481.848982e+0910.9210.05...TrueTrue(2020-03-31, 2020-04-30]True-6.46-12.76-21.42-8.03-12.60300002

4 rows × 23 columns

 到这里为止,单个股票的回测的代码已经完成,加下面一句,方便批量化的时候观察信息。

print(f'获得{symbol}回测数据完成,共出现{rows_ryc.shape[0]}个信号\n',end='')

然后得封装成函数,下面退格,开头加个def test(symbol):就行。

(不封装成函数的方法我也试过,方法是调用命令行传参,但命令行传参只能传字符串不能传别了,最后还是定义函数比较方便调用。

第三步:遍历沪深京所有股票,调用指标函数回测,获取回测数据并保存

       首先就是要取所有股票的代码,同样通过akshare库,然后for循环计算,因为A股现在有5000多只股票,虽然可以一直让电脑运行到计算完毕,但这样效率低而且一出故障,前面做的所有工作就白费,而且不符合我学习的目的。所以我的选择是分批进行获取计算并储存在本地,储存已经遍历过的股票代码,以及最后的成果表格,当下一次打开时,可以直接调用,并继续工作。

       最后回测出来的日期的数据,我用pandas保存成excel,已经算过的股票代码,我保存成二进制文件,我觉得这样读写快,因为在外面不需要打开,读取时也不用关系数据类型,比如我就直接将list写进文件,加载出来就是list,如果保存成txt,就还要做一些处理。保存二进制文件,用到的库是pickle库,使用pickle.save()和pickle.load()就行。

       上面提到,遍历多个股票运算,最简单的方法就是for循环,但是这样效率太低太慢了,要等一个股票算完,才能算下一个股票。所以单线程肯定是不现实了,然后我又去学了多线程,使用的是threading库。

        threading库的使用方法是,将一个函数,放进一个子线程:

               threading.Thread(target=函数名,args=(函数的输入变量1,函数的输入变量2,……))

       接着使用    Thread.start()让子线程开始运行 。

       所以for循环计算单个股票,就变成for循环建立子线程并开始运算。

       接着用到的功能就是Thread.join(),即在线程工作完之前让主线程暂停运行,避免子线程还没得出结果,主线程就先运行完了,一般来说是需要设置的,因为主线程在建立完子线程之后就没事干了,这里可以让主线程同时运行一些代码,把Thread.join()放更后面,这样就可以并行运算。

       多线程方法介绍完了,接下来是我用到的另一个功能queue队列功能,它允许将函数运算得出的结果一个一个地塞进队列里去,相当于弹匣填充,然后在需要时一个一个取出来用,相当于将弹匣里面的子弹一个一个发射出去,每发射一次就失去一个子弹,同理,每提取一次变量,队列就失去一个变量。那么如果队列处于空状态呢,那么get函数会一直等待,直到又变量塞进来,就可以提取出去。使用也很简单,分别是queue.Queue()建立队列,queue.put()填充子弹,queque.get()发射子弹。

       还有一个功能,就是通过函数名称调用函数,我做这个的目的是当我有多个指标回测时,方便我切换。

       所有用到的功能都已经介绍完了,下面给出代码:

from formula_testV2 import test    #这个是一开始封装的MIKE指标回测函数
from RSI import testrsi
import queue
import subprocess
import sys
import os
import threading
import pandas as pd
import pickle
from concurrent.futures import ThreadPoolExecutor    #这个是线程池,试过一下,还不如不用。
import time
import numpy as np

start_time = time.time()

with open('list.bin', 'rb') as f:
    a_list = pickle.load(f)

lock = threading.Lock()  #原来加锁是为了让print的时候显示不要混乱,原先print的时候要两步,现在只要一步就行,就不用加锁了。
rows_rycs = []
q = queue.Queue()

'''
if os.path.isfile('history_test.xlsx'):
    main_sheet = pd.read_excel('history_test.xlsx')
    rows_rycs.append(main_sheet)
'''
formula = 'testrsi'
combiner = 'concat_np'
total = 100

def process(symbol):
    global count_done
    #print(f"{symbol} 在 线程{threading.current_thread().name} 中处理\n",end='')
    print(f'加载任务 {symbol}\n',end ='')
    try:
        func = globals()[formula]
        rows_ryc = func(symbol)  #如果第一个没运行完,count_done是0,队列不会产生数据,已合并的数量也是0,如果设置大于等于就直接结束了
        #rows_rycs.append(rows_ryc)
        q.put(rows_ryc)
        symbols.append(symbol)
        with lock:   #儿子拿到遥控器才能操作
            count_done+=1   #成功了执行了test才能加,大于等于total才结束
    except Exception as e:
        error_type = type(e).__name__  # 获取异常类型
        print(f"{symbol}发生错误:", error_type)
        raise(e)
    #print(f'处理{symbol}出错')
    finally:
        print(f"{symbol} 线程关闭,已完成{count_done}/{total},队列有{q.qsize()}\n",end='')

def concat():
    print('合并启动')
    if os.path.isfile('history_test.xlsx'):
        main_sheet = pd.read_excel('history_test.xlsx')
    else:main_sheet = pd.DataFrame()
    
    num = 0
    while True:  #连接超时困在这个循环里
        main_sheet = pd.concat([main_sheet, q.get()])  #当队列中没有数据时,会一直等,要设置timeout=5, 连接超时返回异常
        num+=1
        print(f'已合并{num}个')
        if num%100 == 0:
            main_sheet.to_excel('history_test.xlsx', index=False)
            print('自动保存history_test.xlsx完成')
        if num>=total:    #合并数同样要大于等于total才结束
            print('合并结束')
            break
    main_sheet.to_excel('history_test.xlsx', index=False)
    print('最后保存history_test.xlsx完成')

def concat_np():
    print('合并启动')
    if os.path.isfile('rsi_result.npy'):
        rsi = np.load('rsi_result.npy')
    else:rsi = np.array([])
    
    num = 0
    while True:  #连接超时困在这个循环里
        try:
            rsi = np.append(rsi, q.get(timeout=10))  #当队列中没有数据时,会一直等,要设置timeout=5, 连接超时返回异常
        except Exception as e:
            error_type = type(e).__name__  # 获取异常类型
            print(f"{symbol}发生错误:", error_type)
        num+=1
        print(f'已合并{num}个')
        if num%100 == 0:
            np.save('rsi_result', rsi)
            print('自动保存rsi_result.npy完成')
        if num>=total:    #合并数同样要大于等于total才结束
            print('合并结束')
            break
    np.save('rsi_result', rsi)
    print('最后保存rsi_result.npy完成')

threads = []

if os.path.isfile('saved_stocks.bin'):
    with open('saved_stocks.bin','rb') as f:
        symbols = pickle.load(f)             #读已经完成计算的股票列表
else:
    symbols = []

print(f'已完成 {len(symbols)} 个股票的处理,剩余{len(a_list) - len(symbols)}个')
count = 0
total = total
if total + len(symbols) > len(a_list):
    total = len(a_list) - len(symbols)
count_done = count
if total == 0:
    print('已全部完成')
    sys.exit(0)

for symbol in a_list:   #给儿子分派任务并开始
    if symbol not in symbols:
        thread = threading.Thread(target=process, args=(symbol, ))
        threads.append(thread)
        thread.start()
        count+=1
        if count>=total or len(symbols) >= len(a_list):
            break

func_combine = globals()[combiner]   #调用对应名称的函数
func_combine()

#concat_np()  #父亲分派完任务后,接收儿子的成果开始合并
#而事实上,父亲合并完才能往下,所以下面两行可要可不要
for thread in threads:  #告诉父亲,等所有儿子执行完任务才能做接下来的事
    thread.join()   #阻塞父亲继续行动

#要子线程完结才能执行下面
with open('saved_stocks.bin','wb') as f:
    pickle.dump(symbols, f)

#print('合并结果')  #如果使用QUEUE的话,可以一边产出,一边合并
#main_sheet = pd.concat(rows_rycs)
#main_sheet.to_excel('history_test.xlsx', index=False)
expand = time.time()-start_time
expand = round(expand, 2)
print(f'回测完成,用时{expand}s,平均每个{round(expand/total,2)}s')

回测完5288个股票后,得到一个 25229行的excel表格。

       有了这个表格,你单纯滚着看它数据也行,接着用pandas做进一步的信息提取也行,比如几个重要的回测数据:总体胜率、买入后各时间段平均的涨幅和跌幅、胜率的时间相关性、出现高胜率的时间分布、买入后涨跌和版块/市值/股价/公告/当日的换手率/当日的市场情绪 等的相关性、各时间段涨跌的数学期望值   等等。

第四步:数据的统计 

           这部分很简单,了解一些pandas的使用就行了,废话不多说,直接上代码:

import pandas as pd
%matplotlib inline

data = pd.read_excel('history_testV2.xlsx', usecols=['index', 
                                                      '日期', 
                                                      'symbol', 
                                                      '1日后涨幅', 
                                                      '3日后涨幅', 
                                                      '5日后涨幅',
                                                      '10日后涨幅',
                                                      '20日后涨幅',
                                                      ])
data

 得到

index日期1日后涨幅3日后涨幅5日后涨幅10日后涨幅20日后涨幅symbol
030862013-05-306.0812.7125.9711.603.878
149772022-02-180.003.82-2.78-5.90-13.548
2572000-04-064.00-1.89-1.05-3.16-11.795
332172014-09-25-4.86-8.56-7.64-7.18-11.345
433752015-10-2210.043.312.37-6.720.575
...........................
252236692020-10-19-2.69-14.95-14.50-8.52-4.04831039
252246972021-09-0221.8934.7653.3629.7636.48831039
252257872022-01-18-6.11-7.44-14.00-13.11-17.89831039
252262062017-01-20-10.422.08-6.2539.5829.17831445
252276242019-03-227.5014.509.0013.0040.00831445

25228 rows × 8 columns

#每一列的涨跌比
rom = [[] for i in range(6)]
index = []

for daynum in [1,3,5,10,20]:
    index.append(f'{daynum}日后')
    p_up = data[data[f'{daynum}日后涨幅'] > 0].shape[0]/data.shape[0] * 100
    up_average = data[data[f'{daynum}日后涨幅'] > 0][f'{daynum}日后涨幅'].mean()
    down_average = data[data[f'{daynum}日后涨幅'] < 0][f'{daynum}日后涨幅'].mean()
    up_max = data[data[f'{daynum}日后涨幅'] > 0][f'{daynum}日后涨幅'].max()
    down_max = data[data[f'{daynum}日后涨幅'] < 0][f'{daynum}日后涨幅'].min()
    p_math = ((1 + up_average * p_up/10000) * (1 + down_average * (1-p_up)/10000)-1)*100
    
    '''
    print(f'{daynum}日后上涨概率:{p_up:.2f}%')
    print(f'平均涨幅:{up_average:.2f}%')
    print(f'平均跌幅:{down_average:.2f}%')
    print(f'最大涨幅:{up_max:.2f}%')
    print(f'最大跌幅:{down_max:.2f}%')
    print(f'数学期望:{p_math:.2f}%')
    '''

    rom[0].append(round(p_up, 2))
    rom[1].append(round(up_average, 2))
    rom[2].append(round(down_average, 2))
    rom[3].append(round(up_max, 2))
    rom[4].append(round(down_max, 2))
    rom[5].append(round(p_math, 2))

df = {'上涨概率':rom[0], 
      '平均涨幅':rom[1], 
      '平均跌幅':rom[2], 
      '最大涨幅':rom[3], 
      '最大跌幅':rom[4], 
      '数学期望':rom[5], }

index = {i:x for i,x in enumerate(index)}

df = pd.DataFrame(df)
df = df.rename(index=index)
df

得到

上涨概率平均涨幅平均跌幅最大涨幅最大跌幅数学期望
1日后50.135.55-4.30222.22-235.424.96
3日后48.769.92-6.99751.51-158.338.34
5日后46.3612.70-8.68882.35-341.6710.06
10日后46.1718.03-11.361247.06-252.9413.88
20日后45.4425.67-14.655716.67-276.4718.93

       单从数学期望和平均涨跌幅来说还是不错的,涨的时候涨得多,跌的时候跌得少,不过这个胜率倒是一般般,只有1日后涨的概率过半,其他都不过半,用来高频量化交易不太妙。

参考文献:

[1]Anaconda-- conda 创建、激活、退出、删除虚拟环境_anaconda如何关闭虚拟环境-CSDN博客

[2]JupyterLab使用教程_jupiterlab-CSDN博客

[3]Miniconda — miniconda documentation

[4]命令行给python脚本传参数的几种方式_python cmd 参数_zhuifengxu的博客-CSDN博客

[5]anaconda prompt快捷方式? - 知乎 

[6] Welcome to AKShare’s Online Documentation! — AKShare 1.11.21 文档

[7] Pandas 教程 | 菜鸟教程

[8]Python 基础教程 | 菜鸟教程

[9] python列表解析([ x for x in list])-CSDN博客

[10]python3 踩坑之:*操作符生成二维列表_[[a]*3]*3 python-CSDN博客 

[11] NumPy 教程 | 菜鸟教程

[12]Python 跳出多重循环总结_python退出多重循环-CSDN博客 

[13]功能强大的python包(十一):threading (多线程)_threading包使用方法-CSDN博客

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

枫!@爷,%

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值