继续学习人民邮电出版的《机器学习实践指南》,第7章股市预测,模型直接拿来用于上证指数,动态时间扭曲建模,结果跟教程预测标普500差别很大。
文中涉及pandas_profiling模块,图像是嵌入式,Markdown没有显示结果。
本文仅用于python数据分析之DataFrame操作学习!
一 、指数搜索
import tushare as ts
ts.get_index()
code | name | change | open | preclose | close | high | low | volume | amount | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 000001 | 上证指数 | 1.61 | 2727.0186 | 2702.1296 | 2745.6182 | 2751.8964 | 2702.4933 | 252019507 | 2815.8337 |
1 | 000002 | A股指数 | 1.61 | 2857.6618 | 2831.5502 | 2877.1556 | 2883.7260 | 2831.9092 | 251741036 | 2814.3846 |
2 | 000003 | B股指数 | 1.06 | 220.9384 | 220.0086 | 222.3326 | 223.2548 | 220.5027 | 278471 | 1.4491 |
3 | 000008 | 综合指数 | 1.73 | 2544.9532 | 2522.4297 | 2565.9899 | 2574.0928 | 2528.0644 | 53368275 | 585.2588 |
4 | 000009 | 上证380 | 1.46 | 4579.1702 | 4537.5197 | 4603.5710 | 4609.5472 | 4528.2879 | 55041282 | 629.8522 |
5 | 000010 | 上证180 | 1.88 | 7734.6028 | 7654.7969 | 7798.7930 | 7821.4258 | 7661.7982 | 83125699 | 1155.2277 |
6 | 000011 | 基金指数 | 0.91 | 6009.5347 | 5985.2705 | 6039.5412 | 6040.9660 | 5964.3047 | 153023459 | 407.7303 |
7 | 000012 | 国债指数 | 0.03 | 180.8213 | 180.8046 | 180.8513 | 180.8523 | 180.8106 | 400973 | 4.0256 |
8 | 000016 | 上证50 | 2.28 | 2597.9793 | 2569.7891 | 2628.4161 | 2636.1406 | 2580.3573 | 37231436 | 618.8246 |
9 | 000017 | 新综指 | 1.61 | 2304.1910 | 2283.1352 | 2319.9088 | 2325.2075 | 2283.4227 | 249192522 | 2677.9100 |
10 | 000300 | 沪深300 | 1.79 | 3629.5126 | 3589.0926 | 3653.2239 | 3663.9517 | 3588.7766 | 139603000 | 2112.2307 |
11 | 000905 | 中证500 | 1.19 | 5217.4880 | 5157.8235 | 5219.2809 | 5225.0268 | 5129.0733 | 127513289 | 1315.1903 |
12 | 399001 | 深证成指 | 1.30 | 10150.7840 | 10019.8580 | 10150.1250 | 10174.5090 | 9964.2860 | 38004697447 | 4354.5853 |
13 | 399002 | 深成指R | 1.30 | 12453.4930 | 12292.8670 | 12452.6860 | 12482.6010 | 12224.6880 | 15836815504 | 2267.2686 |
14 | 399003 | 成份B指 | 1.57 | 4664.6670 | 4648.4650 | 4721.5380 | 4721.5380 | 4619.9620 | 10341618 | 0.7069 |
15 | 399004 | 深证100R | 1.54 | 5590.8520 | 5518.6460 | 5603.5730 | 5620.6970 | 5501.8790 | 5399802857 | 993.9201 |
16 | 399005 | 中小板指 | 0.71 | 6681.1970 | 6588.8830 | 6635.7860 | 6699.7590 | 6525.1590 | 17873990979 | 1922.1422 |
17 | 399006 | 创业板指 | 1.06 | 1923.9600 | 1894.9430 | 1915.0460 | 1928.8790 | 1878.8490 | 9944795691 | 1473.2099 |
18 | 399008 | 中小300 | 0.79 | 1287.6060 | 1270.6290 | 1280.6110 | 1290.9450 | 1259.3420 | 9323739585 | 1237.4965 |
19 | 399100 | 新 指 数 | 1.33 | 7766.5520 | 7673.6800 | 7775.6440 | 7784.8760 | 7640.9580 | 37295220442 | 4316.5081 |
20 | 399101 | 中小板综 | 1.06 | 9720.5630 | 9601.0040 | 9702.4630 | 9740.9030 | 9546.9760 | 17873990979 | 1922.1422 |
21 | 399106 | 深证综指 | 1.28 | 1702.9490 | 1682.9300 | 1704.4620 | 1707.0660 | 1675.4420 | 38004697447 | 4354.5853 |
22 | 399107 | 深证A指 | 1.28 | 1781.9020 | 1760.9240 | 1783.4680 | 1786.2040 | 1753.0940 | 37985716774 | 4353.5633 |
23 | 399108 | 深证B指 | 0.87 | 821.9270 | 819.7140 | 826.8650 | 827.0460 | 814.0530 | 18980673 | 1.0220 |
24 | 399333 | 中小板R | 0.71 | 7521.6910 | 7417.7640 | 7470.5670 | 7542.5880 | 7346.0230 | 3718392677 | 695.0080 |
25 | 399606 | 创业板R | 1.06 | 2029.7650 | 1999.1520 | 2020.3610 | 2034.9540 | 1982.1730 | 2292605286 | 496.8369 |
二、导入上证指数数据
import pandas as pd
import numpy as np
#import pandas_datareader as pdr
cons = ts.get_apis()#建立连接
#start_date = pd.to_datetime('2010-01-01')
#stop_date = pd.to_datetime('2020-03-12')
shx = ts.bar('000001',conn=cons,asset='INDEX',start_date='2019-01-01',end_date="")
shx = shx.dropna()#删除所有null行
shx.sort_values(['datetime'],inplace=True)#日期顺序
shx
code | open | close | high | low | vol | amount | p_change | |
---|---|---|---|---|---|---|---|---|
datetime | ||||||||
2019-01-03 | 000001 | 2461.78 | 2464.36 | 2488.48 | 2455.93 | 1243975.0 | 1.069228e+11 | -0.04 |
2019- |