进来学习人民邮电出版社出版的《Python机器学习指南》,第7章为股票市场预测。特参考预测上证指数,发现与文章中标普500趋势差别很大,市场不同,股票波动也不一样。
通过分析可知:
1、自2010-01-01至今,上证指数波动较大;
2、盘中交易模型策略,明显优于每日回报和隔夜交易回报;
3、策略构建模型有一定预测作用,但效果不明显;
本文章仅供学习DataFrame数据操作,以及预测模型构建。未完待续!
一、主要股指
import tushare as ts
ts.get_index()
code | name | change | open | preclose | close | high | low | volume | amount | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 000001 | 上证指数 | -1.83 | 2792.3221 | 2779.6407 | 2728.7563 | 2815.8737 | 2728.7563 | 291159852 | 3186.1139 |
1 | 000002 | A股指数 | -1.83 | 2926.1598 | 2912.8667 | 2859.4864 | 2950.8481 | 2859.4864 | 290911924 | 3184.7548 |
2 | 000003 | B股指数 | -0.91 | 223.8562 | 222.9788 | 220.9449 | 226.0923 | 220.8130 | 247928 | 1.3591 |
3 | 000008 | 综合指数 | -1.89 | 2624.3784 | 2615.6766 | 2566.3254 | 2638.1342 | 2565.1782 | 58775863 | 618.0305 |
4 | 000009 | 上证380 | -1.61 | 4635.3816 | 4601.6229 | 4527.5488 | 4690.9859 | 4527.5407 | 64660678 | 688.4413 |
5 | 000010 | 上证180 | -2.15 | 7979.1711 | 7944.9118 | 7774.3879 | 8059.9412 | 7774.3879 | 91047723 | 1314.0773 |
6 | 000011 | 基金指数 | -1.12 | 6096.8121 | 6078.7292 | 6010.9243 | 6157.6370 | 5996.7194 | 176965959 | 484.9572 |
7 | 000012 | 国债指数 | 0.00 | 180.7271 | 180.7106 | 180.7158 | 180.7271 | 180.6389 | 710175 | 7.1898 |
8 | 000016 | 上证50 | -2.23 | 2695.4241 | 2685.5907 | 2625.7470 | 2724.3019 | 2625.6003 | 37681392 | 661.8735 |
9 | 000017 | 新综指 | -1.83 | 2359.4264 | 2348.7081 | 2305.6608 | 2379.3315 | 2305.6608 | 287887091 | 3025.1833 |
10 | 000300 | 沪深300 | -1.98 | 3729.6853 | 3709.6822 | 3636.2565 | 3775.8506 | 3636.2565 | 155037547 | 2441.2716 |
11 | 000905 | 中证500 | -1.78 | 5257.9438 | 5212.0619 | 5119.4716 | 5333.1204 | 5119.4716 | 164286604 | 1605.3606 |
12 | 399001 | 深证成指 | -1.70 | 10294.9550 | 10202.7530 | 10029.5680 | 10479.7830 | 10029.5680 | 45258080567 | 5183.7388 |
13 | 399002 | 深成指R | -1.70 | 12629.8200 | 12516.7070 | 12304.2440 | 12856.5660 | 12304.2440 | 19749137662 | 2811.7789 |
14 | 399003 | 成份B指 | -2.42 | 4914.2920 | 4888.0810 | 4769.6800 | 5006.8100 | 4762.6150 | 10605034 | 0.4797 |
15 | 399004 | 深证100R | -1.68 | 5715.5180 | 5671.0870 | 5575.9920 | 5819.3920 | 5574.7890 | 6297467580 | 1187.9062 |
16 | 399005 | 中小板指 | -1.05 | 6692.9000 | 6616.3500 | 6546.8130 | 6850.9000 | 6546.8130 | 21684891350 | 2288.3061 |
17 | 399006 | 创业板指 | -1.60 | 1941.3010 | 1917.7020 | 1887.0420 | 1980.7890 | 1887.0420 | 11913560022 | 1752.0276 |
18 | 399008 | 中小300 | -1.35 | 1291.8000 | 1277.8930 | 1260.6520 | 1318.8290 | 1260.6520 | 11579259839 | 1484.3824 |
19 | 399100 | 新 指 数 | -1.64 | 7856.6660 | 7787.0980 | 7659.3310 | 7990.9410 | 7659.3310 | 44510130185 | 5157.1885 |
20 | 399101 | 中小板综 | -1.31 | 9754.3890 | 9659.9810 | 9533.4390 | 9939.2100 | 9533.4390 | 21684891350 | 2288.3061 |
21 | 399106 | 深证综指 | -1.55 | 1720.5270 | 1704.7360 | 1678.2480 | 1750.6060 | 1678.2480 | 45258080567 | 5183.7388 |
22 | 399107 | 深证A指 | -1.55 | 1800.1780 | 1783.6340 | 1755.9270 | 1831.6840 | 1755.9270 | 45239025191 | 5183.0058 |
23 | 399108 | 深证B指 | -1.73 | 858.8660 | 856.0570 | 841.2740 | 870.7270 | 840.2250 | 19055376 | 0.7330 |
24 | 399333 | 中小板R | -1.05 | 7534.8670 | 7448.6870 | 7370.4010 | 7712.7430 | 7370.4010 | 4507355897 | 814.0477 |
25 | 399606 | 创业板R | -1.60 | 2048.0590 | 2023.1620 | 1990.8160 | 2089.7190 | 1990.8160 | 3025419853 | 647.9224 |
二、导入上证指数历史
import pandas as pd
import numpy as np
import pandas_datareader as pdr
cons = ts.get_apis()#建立连接
start_date = pd.to_datetime('2010-01-01')
stop_date = pd.to_datetime('2020-03-12')
shx = ts.bar('000001',conn=cons,asset='INDEX',start_date='2010-01-01',end_date="")
shx = shx.dropna()#删除所有null行
shx.sort_values(['datetime'],inplace=True)#日期顺序
shx
code | open | close | high | low | vol | amount | p_change | |
---|---|---|---|---|---|---|---|---|
datetime | ||||||||
2010-01-05 | 000001 | 3254.47 | 3282.18 | 3290.51 | 3221.46 | 1422749.0 | 1.653394e+11 | 1.18 |
2010-01-06 | 000001 | 3277.52 | 3254.21 | 3295.87 | 3253.04 | 1351312.0 | 1.602844e+11 | -0.85 |
2010-01-07 | 000001 | 3253.99 | 3192.78 | 3268.82 | 3176.71 | 1452287.0 | 1.605338e+11 | -1.89 |
2010-01-08 | 000001 | 3177.26 | 3196.00 | 3198.92 | 3149.02 | 1146025.0 | 1.248136e+11 | 0.10 |
2010-01-11 | 000001 | 3301.61 | 3212.75 | 3306.75 | 3197.33 | 1615021.0 | 1.825149e+11 | 0.52 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2020-03-12 | 000001 | 2936.02 | 2923.49 | 2944.47 | 2906.28 | 3077784.0 | 3.282092e+11 | -1.52 |
2020-03-13 | 000001 | 2804.23 | 2887.43 | 2910.88 | 2799.98 | 3664504.0 | 3.930197e+11 | -1.23 |
2020-03-16 | 000001 | 2897.30 | 2789.25 | 2898.03 | 2784.66 | 3518786.0 | 3.756270e+11 | -3.40 |
2020-03-17 | 000001 | 2796.28 | 2779.64 | 2826.91 | 2715.22 | 3061496.0 | 3.230128e+11 | -0.34 |
2020-03-18 | 000001 | 2792.32 | 2728.76 | 2815.87 | 2728.76 | 2911598.0 | 3.186114e+11 | -1.83 |
2479 rows × 8 columns
三、收盘价绘制数据
import matplotlib.pyplot as plt
%matplotlib inline
fig, ax = plt.subplots(figsize=(15,10))
shx['close'].plot(color='black')
plt.title('SHX', fontsize=20)
四、数据分析
1.收盘价每日变动
shx['daily change'] = pd.Series(shx['close']- shx['open'])
shx['daily change']
datetime
2010-01-05 27.71
2010-01-06 -23.31
2010-01-07 -61.21
2010-01-08 18.74
2010-01-11 -88.86
...
2020-03-12 -12.53
2020-03-13 83.20
2020-03-16 -108.05
2020-03-17 -16.64
2020-03-18 -63.56
Name: daily change, Length: 2479, dtype: float64
shx
code | open | close | high | low | vol | amount | p_change | daily change | |
---|---|---|---|---|---|---|---|---|---|
datetime | |||||||||
2010-01-05 | 000001 | 3254.47 | 3282.18 | 3290.51 | 3221.46 | 1422749.0 | 1.653394e+11 | 1.18 | 27.71 |
2010-01-06 | 000001 | 3277.52 | 3254.21 | 3295.87 | 3253.04 | 1351313.0 | 1.602844e+11 | -0.85 | -23.31 |
2010-01-07 | 000001 | 3253.99 | 3192.78 | 3268.82 | 3176.71 | 1452287.0 | 1.605338e+11 | -1.89 | -61.21 |
2010-01-08 | 000001 | 3177.26 | 3196.00 | 3198.92 | 3149.02 | 1146025.0 | 1.248136e+11 | 0.10 | 18.74 |
2010-01-11 | 000001 | 3301.61 | 3212.75 | 3306.75 | 3197.33 | 1615021.0 | 1.825149e+11 | 0.52 | -88.86 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2020-03-12 | 000001 | 2936.02 | 2923.49 | 2944.47 | 2906.28 | 3077784.0 | 3.282092e+11 | -1.52 | -12.53 |
2020-03-13 | 000001 | 2804.23 | 2887.43 | 2910.88 | 2799.98 | 3664504.0 | 3.930197e+11 | -1.23 | 83.20 |
2020-03-16 | 000001 | 2897.30 | 2789.25 | 2898.04 | 2784.66 | 3518786.0 | 3.756270e+11 | -3.40 | -108.05 |
2020-03-17 | 000001 | 2796.28 | 2779.64 | 2826.91 | 2715.22 | 3061496.0 | 3.230128e+11 | -0.34 | -16.64 |
2020-03-18 | 000001 | 2792.32 | 2728.76 | 2815.87 | 2728.76 | 2911598.0 | 3.186114e+11 | -1.83 | -63.56 |
2479 rows × 9 columns
shx['daily change'].sum()#历史收盘价变动总数
7698.730000000007
shx['close'].sum()-shx['open'].sum()#每天收盘价减去开盘价,每日波动,求和
7698.730000000447
shx[shx['daily change']>200]['daily change'].sum()#日浮动大于200点日期,求和
547.3500000000004
shx[shx['daily change']>200]#日浮动大于200点日期
code | open | close | high | low | vol | amount | p_change | daily change | |
---|---|---|---|---|---|---|---|---|---|
datetime | |||||||||
2015-06-30 | 000001 | 4006.75 | 4277.22 | 4279.97 | 3847.88 | 7091766.0 | 9.415246e+11 | 5.53 | 270.47 |
2015-07-09 | 000001 | 3432.45 | 3709.33 | 3748.48 | 3373.54 | 6569146.0 | 6.733110e+11 | 5.76 | 276.88 |
shx[shx['daily change']<-200]['daily change'].sum()#日浮动大于-200点日期,求和
-1730.150000000001
shx[shx['daily change']<-200]#日浮动大于-200点日期
code | open | close | high | low | vol | amount | p_change | daily change | |
---|---|---|---|---|---|---|---|---|---|
datetime | |||||||||
2015-05-28 | 000001 | 4943.74 | 4620.27 | 4986.50 | 4614.24 | 7829646.0 | 1.247926e+12 | -6.50 | -323.47 |
2015-06-19 | 000001 | 4689.93 | 4478.36 | 4744.08 | 4476.50 | 4526896.0 | 6.854582e+11 | -6.42 | -211.57 |
2015-06-26 | 000001 | 4399.93 | 4192.87 | 4456.90 | 4139.53 | 5652178.0 | 7.878357e+11 | -7.40 | -207.06 |
2015-06-29 | 000001 | 4289.77 | 4053.03 | 4297.47 | 3875.05 | 6737863.0 | 9.042714e+11 | -3.34 | -236.74 |
2015-07-27 | 000001 | 3985.57 | 3725.56 | 4051.16 | 3720.44 | 5560032.0 | 7.212981e+11 | -8.48 | -260.01 |
2015-08-18 | 000001 | 3999.13 | 3748.16 | 4006.34 | 3743.39 | 5437708.0 | 7.224672e+11 | -6.15 | -250.97 | <