使用Tushare下载日线数据
Tushare是一个免费、开源的python财经数据接口包。主要实现对股票等金融数据从数据采集、清洗加工 到 数据存储的过程,能够为金融分析人员提供快速、整洁、和多样的便于分析的数据,为他们在数据获取方面极大地减轻工作量,使他们更加专注于策略和模型的研究与实现上。考虑到Python pandas包在金融量化分析中体现出的优势,Tushare返回的绝大部分的数据格式都是pandas DataFrame类型,非常便于用pandas/NumPy/Matplotlib进行数据分析和可视化。
注册一个tushare账号,到这里注册:https://tushare.pro/register?reg=350968
上一篇提到了每天下载日线数据,代码给出后有些人说跑不通,我看了一下,主要是因为数据存储的原因,在我的代码里,用了mysql来进行本地数据的存储。
为了使用mysql,首先要安装mysql: pip install pymysql
然后在文件中要进行mysql的引入
#获取股票信息,存入数据库
import pymysql
然后首先要链接数据库,初始化连接和游标,这里定义一个函数:
def init_cursor():
conn = pymysql.connect('127.0.0.1', port=3306, user='lhjr', password='Lhjr%2020', db='stock', charset='utf8')
return conn, conn.cursor()
当然在mysql数据库中首先要创建一个stock数据库,可以在mysql的图形界面中创建一个,很简单,大家自行操作一下,端口号3306是默认的,用户名密码大家在自己的mysql中自行定义自行更改这里的user和password字符串就行。
后续每次要使用数据库,都可以采用本函数进行连接初始化。
有了连接,先要创建初始的数据表,在tushare中,基本的数据表有:股票列表和基本行情表,定义两个函数,如下:
#函数一,股票列表
#创建股票列表数据表
def createStockListTable():
create_sql = '''
CREATE TABLE stocklist (
id int not null auto_increment PRIMARY KEY comment "编号",
ts_code varchar(12) not null comment "股票代码",
symbol varchar(8) comment "交易简码",
ts_name varchar(20) comment "交易名称",
area varchar(20) comment "所在区域",
industry varchar(100) comment "所在行业",
fullname varchar(100) comment "股票全称",
enname varchar(100) comment "英文全称",
market varchar(100) comment "市场类型 (主板/中小板/创业板/科创板/CDR)",
exchange varchar(100) comment "交易所代码",
curr_type varchar(100) comment "交易货币",
list_status varchar(100) comment "上市状态: L上市 D退市 P暂停上市",
list_date varchar(100) comment "上市日期",
delist_date varchar(100) comment "退市日期",
is_hs varchar(100) comment "是否沪深港通标的,N否 H沪股通 S深股通"
)
'''
conn, cursor = init_cursor()
excute_sql(conn, cursor, create_sql)
close_conn(conn, cursor)
#函数二,基本行情表,日线
def createDailyTable():
create_sql = '''
CREATE TABLE stockdaily (
id int not null auto_increment PRIMARY KEY comment "编号",
ts_code varchar(12) not null comment "股票代码",
trade_date DATE comment "交易日期",
open float comment "开盘价",
high float comment "最高价",
low float comment "最低价",
close float comment "收盘价",
pre_close float comment "昨日收价",
chg float comment "涨幅",
pct_chg float comment "涨幅比",
vol float comment "成交量",
amount float comment "成交额"
)
'''
conn, cursor = init_cursor()
excute_sql(conn, cursor, create_sql)
close_conn(conn, cursor)
在上面两个函数中,有个 excute_sql(conn, cursor, create_sql),这个函数主要负责sql语句的执行,这个函数主要用来进行sql语句的执行,如果成功,则更新数据,如果失败,则回滚数据。涉及到数据库的操作,大家自行查找相关资料。函数如下:
def excute_sql(conn, cursor, str_sql):
try:
cursor.execute(str_sql)
conn.commit()
pos = str_sql.index('(')
#print("执行SQL成功", str_sql[ 12/ : pos])
except pymysql.Error as e:
conn.rollback()
print("执行SQL失败! \n" + str_sql[0:43] + '\n', e)
有了数据表stockdaily和stocklist,就可以进行数据插入更新了。
首先进行stocklist的插入更新,由于插入数据是很费时间的事情,所以我们将100条数据打包进行一次性更新,这样用来缩短时间,
#插入股票列表
def insert_gpList(conn, cursor,gpList):
prefix_sql = "insert into stocklist (ts_code,symbol,ts_name,area,industry, fullname,enname,market," \
"exchange,curr_type,list_status,list_date,delist_date,is_hs) values {0} "
val = " ('{0}','{1}','{2}','{3}','{4}','{5}','{6}','{7}','{8}','{9}','{10}','{11}','{12}','{13}')"
vals = ''
cnt = gpList.ts_code.count() - 1
for ind in gpList.index:
#批量插入数据,每一百条插入一次
if ind == 0:
vals = val.format(str(gpList.loc[ind, 'ts_code']), str(gpList.loc[ind, 'symbol']),
str(gpList.loc[ind, 'name']), str(gpList.loc[ind, 'area']),
str(gpList.loc[ind, 'industry']),str(gpList.loc[ind, 'fullname']) ,
str(gpList.loc[ind, 'enname']),str(gpList.loc[ind, 'market']),
str(gpList.loc[ind, 'exchange']),str(gpList.loc[ind, 'curr_type']),
str(gpList.loc[ind, 'list_status']),
str(gpList.loc[ind, 'list_date']),str(gpList.loc[ind, 'delist_date']),
str(gpList.loc[ind, 'is_hs']))
elif ind % 1 != 0:
vals = vals + ", " + val.format(str(gpList.loc[ind, 'ts_code']), str(gpList.loc[ind, 'symbol']),
str(gpList.loc[ind, 'name']), str(gpList.loc[ind, 'area']),
str(gpList.loc[ind, 'industry']),str(gpList.loc[ind, 'fullname']) ,
str(gpList.loc[ind, 'enname']),str(gpList.loc[ind, 'market']),
str(gpList.loc[ind, 'exchange']),str(gpList.loc[ind, 'curr_type']),
str(gpList.loc[ind, 'list_status']),
str(gpList.loc[ind, 'list_date']),str(gpList.loc[ind, 'delist_date']),
str(gpList.loc[ind, 'is_hs']))
elif ind % 1 == 0:
#每100条数据一起插入表格
insert_sql = prefix_sql.format(vals)
print(insert_sql)
excute_sql(conn, cursor, insert_sql)
#插入数据后本条数据作为下一批次插入数据的第一条
vals = val.format(str(gpList.loc[ind, 'ts_code']), str(gpList.loc[ind, 'symbol']),
str(gpList.loc[ind, 'name']), str(gpList.loc[ind, 'area']),
str(gpList.loc[ind, 'industry']), str(gpList.loc[ind, 'fullname']),
str(gpList.loc[ind, 'enname']), str(gpList.loc[ind, 'market']),
str(gpList.loc[ind, 'exchange']), str(gpList.loc[ind, 'curr_type']),
str(gpList.loc[ind, 'list_status']),
str(gpList.loc[ind, 'list_date']), str(gpList.loc[ind, 'delist_date']),
str(gpList.loc[ind, 'is_hs']))
if ind == cnt:
insert_sql = prefix_sql.format(vals)
excute_sql(conn, cursor, insert_sql)
print('一共插入{0}条数据'.format(gpList['ts_code'].count))
#插入基本行情数据
def insert_daily_data(conn, cursor,df):
prefix_sql = '''
insert into stockdaily (ts_code, trade_date, open, high, low ,close ,pre_close ,
chg ,pct_chg, vol,amount) values {0}
'''
val = " ('{0}',{1},{2},{3},{4},{5},{6},{7},{8},{9},{10})"
vals = ''
cnt = df.ts_code.count() - 1
for ind in df.index:
#批量插入数据,每一百条插入一次
if ind == 0:
vals = val.format(str(df.loc[ind, 'ts_code']), str(df.loc[ind, 'trade_date']),
str(df.loc[ind, 'open']), str(df.loc[ind, 'high']),
str(df.loc[ind, 'low']), str(df.loc[ind, 'close']), str(df.loc[ind, 'pre_close']),
str(df.loc[ind, 'change']), str(df.loc[ind, 'pct_chg']), str(df.loc[ind, 'vol']),
str(df.loc[ind, 'amount']))
elif ind % 100 != 0:
vals = vals + ", " + val.format(str(df.loc[ind, 'ts_code']), str(df.loc[ind, 'trade_date']),
str(df.loc[ind, 'open']), str(df.loc[ind, 'high']),
str(df.loc[ind, 'low']), str(df.loc[ind, 'close']), str(df.loc[ind, 'pre_close']),
str(df.loc[ind, 'change']), str(df.loc[ind, 'pct_chg']), str(df.loc[ind, 'vol']),
str(df.loc[ind, 'amount']))
elif ind % 100 == 0:
#每100条数据一起插入表格
insert_sql = prefix_sql.format(vals)
excute_sql(conn, cursor, insert_sql)
#插入数据后本条数据作为下一批次插入数据的第一条
vals = val.format(str(df.loc[ind, 'ts_code']), str(df.loc[ind, 'trade_date']),
str(df.loc[ind, 'open']), str(df.loc[ind, 'high']),
str(df.loc[ind, 'low']), str(df.loc[ind, 'close']), str(df.loc[ind, 'pre_close']),
str(df.loc[ind, 'change']), str(df.loc[ind, 'pct_chg']), str(df.loc[ind, 'vol']),
str(df.loc[ind, 'amount']))
if ind == cnt:
insert_sql = prefix_sql.format(vals)
excute_sql(conn, cursor, insert_sql)
print('一共插入{0}条数据'.format(df['ts_code'].count()))
最后,每次数据库使用完以后要进行数据库的关闭工作,定义y一个函数:
#关闭数据库连接
def close_conn(conn, cursor):
try:
cursor.close()
conn.close()
except pymysql.Error as e:
print("数据库关闭失败!" + conn.host_info, e)
print("数据库关闭成功!")
本文有点长哈,基本把数据插入这部分搞完了,咱么为了简单,也不考虑什么数据唯一性啊啥的,先把数据下回来再说。
行文至此,准备工作基本完毕,基本就可以进行日线数据的获取了。