Python股票历史数据预处理(一)
在进行量化投资交易编程时,我们需要股票历史数据作为分析依据,下面介绍如何通过Python获取股票历史数据并且将结果存为DataFrame格式。处理后的股票历史数据下载链接为:http://download.csdn.net/detail/suiyingy/9688505。
具体步骤如下:
- (1) 建立股票池,这里按照股本大小来作为选择依据。
- (2) 分别读取股票池中所有股票的历史涨跌幅。
- (3) 将各支股票的历史涨跌幅存到DataFrame结构变量中,每一列代表一支股票,对于在指定时间内还没有发行的股票的涨跌幅设置为0。
- (4) 将DataFrame最后一行的数值设置为各支股票对应的交易天数。
- (5) 将DataFrame数据存到csv文件中去。
具体代码如下:
# -*- coding: utf-8 -*-
"""
Created on Thu Nov 17 23:04:33 2016
获取股票的历史涨跌幅,先合并为DataFrame后存为csv格式
@author: yehxqq1513760265
"""
import numpy as np
import pandas as pd
#按照市值从小到大的顺序获得50支股票的代码
df = get_fundamentals (
query (fundamentals. eod_derivative_indicator. market_cap )
. order_by (fundamentals. eod_derivative_indicator. market_cap. asc ( ) )
. limit ( 50 ) , '2016-11-17' , '1y'
)
b1 = { }
priceChangeRate_300 = get_price_change_rate ( '000300.XSHG' , '20060101' , '20161118' )
df300 = pd. DataFrame (priceChangeRate_300 )
lenReference = len (priceChangeRate_300 )
dfout = df300
dflen = pd. DataFrame ( )
dflen [ '000300.XSHG' ] = [lenReference ]
#分别对这一百只股票进行50支股票操作
#获取从2006.01.01到2016.11.17的涨跌幅数据
#将数据存到DataFrame中
#DataFrame存为csv文件
for stock in range ( 50 ):
priceChangeRate = get_price_change_rate (df [ 'market_cap' ]. columns [stock ] , '20150101' , '20161118' )
if priceChangeRate is None:
openDays = 0
else:
openDays = len (priceChangeRate )
dftempPrice = pd. DataFrame (priceChangeRate )
tempArr = [ ]
for i in range (lenReference ):
if df300. index [i ] in list (dftempPrice. index ):
#保存为4位有效数字
tempArr. append ( "%.4f" % ( (dftempPrice. loc [ str (df300. index [i ] ) ] [ 0 ] ) ) )
pass
else:
tempArr. append ( float ( 0.0 ) )
fileName = ''
fileName = fileName. join (df [ 'market_cap' ]. columns [stock ]. split ( '.' ) )
dfout [fileName ] = tempArr
dflen [fileName ] = [ len (priceChangeRate ) ]
dfout = dfout. append (dflen )
dfout. to_csv ( '00050.csv' )
"""
Created on Thu Nov 17 23:04:33 2016
获取股票的历史涨跌幅,先合并为DataFrame后存为csv格式
@author: yehxqq1513760265
"""
import numpy as np
import pandas as pd
#按照市值从小到大的顺序获得50支股票的代码
df = get_fundamentals (
query (fundamentals. eod_derivative_indicator. market_cap )
. order_by (fundamentals. eod_derivative_indicator. market_cap. asc ( ) )
. limit ( 50 ) , '2016-11-17' , '1y'
)
b1 = { }
priceChangeRate_300 = get_price_change_rate ( '000300.XSHG' , '20060101' , '20161118' )
df300 = pd. DataFrame (priceChangeRate_300 )
lenReference = len (priceChangeRate_300 )
dfout = df300
dflen = pd. DataFrame ( )
dflen [ '000300.XSHG' ] = [lenReference ]
#分别对这一百只股票进行50支股票操作
#获取从2006.01.01到2016.11.17的涨跌幅数据
#将数据存到DataFrame中
#DataFrame存为csv文件
for stock in range ( 50 ):
priceChangeRate = get_price_change_rate (df [ 'market_cap' ]. columns [stock ] , '20150101' , '20161118' )
if priceChangeRate is None:
openDays = 0
else:
openDays = len (priceChangeRate )
dftempPrice = pd. DataFrame (priceChangeRate )
tempArr = [ ]
for i in range (lenReference ):
if df300. index [i ] in list (dftempPrice. index ):
#保存为4位有效数字
tempArr. append ( "%.4f" % ( (dftempPrice. loc [ str (df300. index [i ] ) ] [ 0 ] ) ) )
pass
else:
tempArr. append ( float ( 0.0 ) )
fileName = ''
fileName = fileName. join (df [ 'market_cap' ]. columns [stock ]. split ( '.' ) )
dfout [fileName ] = tempArr
dflen [fileName ] = [ len (priceChangeRate ) ]
dfout = dfout. append (dflen )
dfout. to_csv ( '00050.csv' )