量化数据的获取对于量化交易至关重要,但公开平台很多数据的获取和处理不够严谨。本文通过专业数据库的使用,给大家提供获取的方法。WIND是不错的数据,但其内部WINDATA格式不是传统的 DATAFRAME,所以有一些朋友用起来不方便。作为入门量化数据系列之一,我们就首先介绍一下,省去大家看文档的时间。
(特此声明:本案例中的所有股票代码均随机抽选,不具有任何投资建议和价值!)
示例1 通过WIND获取某只股票从2020年 1月1日-2024年1月1日的数据,整理为DataFrame格式。
代码为:
第一步引入相关库。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings; warnings.simplefilter('ignore')
from WindPy import w
w.start() # 默认命令超时时间为120秒,可加入waitTime参数,如waitTime=30,即设置命令超时时间为60秒
w.isconnected() # 判断WindPy是否已经登录成功
#1.获取特斯拉过去四年的量价数据
data_tela = w.wsd("TSLA.O", "open,high,low,close,volume", "2020-01-01", "2024-02-18", "TradingCalendar=AMEX;Currency=USD;PriceAdj=F")
data_tela
.ErrorCode=0
.Codes=[TSLA.O]
.Fields=[OPEN,HIGH,LOW,CLOSE,VOLUME]
.Times=[20200102,20200103,20200106,20200107,20200108,20200109,20200110,20200113,20200114,20200115,...]
.Data=[[28.300003598531546,29.36667040083191,29.364670400577598,30.76000391133676,31.580004015605166,33.14000421396945,32.11933741751829,32.90000418345187,36.28367128037405,35.317337824165065,...],[28.713050317719816,30.266670515272843,30.104003827922035,31.442003998057555,33.23267089241929,33.25333756171386,32.329337444221174,35.042004455821285,36.49400464045266,35.856004559326756,...],[28.114003574880417,29.128003703817203,29.333337063260025,30.22367050980511,31.21533730256873,31.524670675235836,31.580004015605166,32.80000417073621,34.99333778296633,34.45235771417716,...],[28.684003647359674,29.53400375544278,30.10267049441916,31.27067064293806,32.80933750525634,32.08933741370359,31.87667071999495,34.99067111596058,35.86133789333825,34.56667106204619,...],[9558386.0,17774420.0,10157500.0,18209140.0,31199390.0,28463190.0,12976830.0,26634550.0,29061380.0,17368830.0,...]]等
请注意:这是WindPy.w.WindData格式的数据,很多人不熟悉。如果要获得DataFrame格式的数据,则需要指定usedf=True。如果要获得分项数据,则需要df.items()。
第二步:将WindData对象转换为DataFrame格式。
代码:
data1 = w.wsd("TSLA.O", "open,high,low,close,volume", "2020-01-01", "2024-01-01", "TradingCalendar=AMEX;Currency=USD;PriceAdj=F",usedf=True)
data1
(0,
OPEN HIGH LOW CLOSE VOLUME
2020-01-02 28.300004 28.713050 28.114004 28.684004 9558386.0
2020-01-03 29.366670 30.266671 29.128004 29.534004 17774420.0
2020-01-06 29.364670 30.104004 29.333337 30.102670 10157500.0
2020-01-07 30.760004 31.442004 30.223671 31.270671 18209140.0
2020-01-08 31.580004 33.232671 31.215337 32.809338 31199390.0
... ... ... ... ... ...
2023-12-22 256.760000 258.220000 251.370000 252.540000 93370094.0
2023-12-26 254.490000 257.970000 252.910000 256.610000 86892382.0
2023-12-27 258.350000 263.340000 257.520000 261.440000 106494359.0
2023-12-28 263.660000 265.130000 252.710000 253.180000 113619943.0
2023-12-29 255.100000 255.190000 247.430000 248.480000 100891578.0
[1006 rows x 5 columns])
注意,该数据格式仍不是我们想要的格式,需要进一步修改。
其中,data1[0] #ErrorCode =0表示代码运行正常,元组中的第二个数,为数据本身。
data1[1] # 取第二个数,为数据本身
OPEN HIGH LOW CLOSE VOLUME
2020-01-02 28.300004 28.713050 28.114004 28.684004 9558386.0
2020-01-03 29.366670 30.266671 29.128004 29.534004 17774420.0
2020-01-06 29.364670 30.104004 29.333337 30.102670 10157500.0
2020-01-07 30.760004 31.442004 30.223671 31.270671 18209140.0
2020-01-08 31.580004 33.232671 31.215337 32.809338 31199390.0
... ... ... ... ... ...
2023-12-22 256.760000 258.220000 251.370000 252.540000 93370094.0
2023-12-26 254.490000 257.970000 252.910000 256.610000 86892382.0
2023-12-27 258.350000 263.340000 257.520000 261.440000 106494359.0
2023-12-28 263.660000 265.130000 252.710000 253.180000 113619943.0
2023-12-29 255.100000 255.190000 247.430000 248.480000 100891578.0
1006 rows × 5 columns