2021-02-17 使用Tushare下载日线数据的准备工作

最新推荐文章于 2022-06-11 20:13:58 发布

weixin_54389451

最新推荐文章于 2022-06-11 20:13:58 发布

阅读量344

点赞数

文章标签： mysql tushare python 数据库列表

本文链接：https://blog.csdn.net/weixin_54389451/article/details/113833613

版权

使用Tushare下载日线数据

Tushare是一个免费、开源的python财经数据接口包。主要实现对股票等金融数据从数据采集、清洗加工到数据存储的过程，能够为金融分析人员提供快速、整洁、和多样的便于分析的数据，为他们在数据获取方面极大地减轻工作量，使他们更加专注于策略和模型的研究与实现上。考虑到Python pandas包在金融量化分析中体现出的优势，Tushare返回的绝大部分的数据格式都是pandas DataFrame类型，非常便于用pandas/NumPy/Matplotlib进行数据分析和可视化。
注册一个tushare账号，到这里注册：https://tushare.pro/register?reg=350968

上一篇提到了每天下载日线数据，代码给出后有些人说跑不通，我看了一下，主要是因为数据存储的原因，在我的代码里，用了mysql来进行本地数据的存储。

为了使用mysql，首先要安装mysql： pip install pymysql
然后在文件中要进行mysql的引入
#获取股票信息，存入数据库
import pymysql
然后首先要链接数据库，初始化连接和游标，这里定义一个函数：

def init_cursor():
    conn = pymysql.connect('127.0.0.1', port=3306, user='lhjr', password='Lhjr%2020', db='stock', charset='utf8')
     return conn, conn.cursor()

 当然在mysql数据库中首先要创建一个stock数据库，可以在mysql的图形界面中创建一个，很简单，大家自行操作一下，端口号3306是默认的，用户名密码大家在自己的mysql中自行定义自行更改这里的user和password字符串就行。

后续每次要使用数据库，都可以采用本函数进行连接初始化。
有了连接，先要创建初始的数据表，在tushare中，基本的数据表有：股票列表和基本行情表，定义两个函数，如下：
#函数一，股票列表

#创建股票列表数据表
def createStockListTable():
    create_sql = '''
                  CREATE TABLE stocklist (
                            id int not null auto_increment PRIMARY KEY comment "编号",
                            ts_code varchar(12) not null comment "股票代码",
                            symbol varchar(8) comment "交易简码",
                            ts_name varchar(20) comment "交易名称",
                            area varchar(20) comment "所在区域",
                            industry varchar(100) comment "所在行业",    
                            fullname varchar(100) comment "股票全称",
                            enname	varchar(100) comment "英文全称",
                            market	varchar(100) comment "市场类型 （主板/中小板/创业板/科创板/CDR）",
                            exchange varchar(100) comment "交易所代码",
                            curr_type varchar(100) comment "交易货币",
                            list_status	varchar(100) comment "上市状态： L上市 D退市 P暂停上市",
                            list_date varchar(100) comment "上市日期",
                            delist_date varchar(100) comment "退市日期",
                            is_hs varchar(100) comment "是否沪深港通标的，N否 H沪股通 S深股通"                        
                            )
                  '''
    conn, cursor = init_cursor()
    excute_sql(conn, cursor, create_sql)
    close_conn(conn, cursor)

#函数二，基本行情表,日线

def createDailyTable():
    create_sql = '''
                  CREATE TABLE stockdaily (
                            id int not null auto_increment PRIMARY KEY comment "编号",
                            ts_code varchar(12) not null comment "股票代码",
                            trade_date DATE comment "交易日期",
                            open float comment "开盘价",
                            high float comment "最高价",
                            low float comment "最低价",
                            close float comment "收盘价",
                            pre_close float comment "昨日收价",
                            chg float comment "涨幅",
                            pct_chg float comment "涨幅比",
                            vol float comment "成交量",
                            amount float comment "成交额"                            
                            )
                  '''
    conn, cursor = init_cursor()
    excute_sql(conn, cursor, create_sql)
    close_conn(conn, cursor)

在上面两个函数中，有个 excute_sql(conn, cursor, create_sql)，这个函数主要负责sql语句的执行，这个函数主要用来进行sql语句的执行，如果成功，则更新数据，如果失败，则回滚数据。涉及到数据库的操作，大家自行查找相关资料。函数如下：

def excute_sql(conn, cursor, str_sql):
    try:
        cursor.execute(str_sql)
        conn.commit()
        pos = str_sql.index('(')
        #print("执行SQL成功", str_sql[ 12/ : pos])
    except pymysql.Error as e:
        conn.rollback()
        print("执行SQL失败！ \n" + str_sql[0:43] + '\n', e)

有了数据表stockdaily和stocklist，就可以进行数据插入更新了。
首先进行stocklist的插入更新，由于插入数据是很费时间的事情，所以我们将100条数据打包进行一次性更新，这样用来缩短时间，

#插入股票列表
def insert_gpList(conn, cursor,gpList):
    prefix_sql = "insert into stocklist (ts_code,symbol,ts_name,area,industry, fullname,enname,market," \
                 "exchange,curr_type,list_status,list_date,delist_date,is_hs) values {0} "
    val = " ('{0}','{1}','{2}','{3}','{4}','{5}','{6}','{7}','{8}','{9}','{10}','{11}','{12}','{13}')"
    vals = ''
    cnt = gpList.ts_code.count() - 1
    for ind in gpList.index:
        #批量插入数据，每一百条插入一次
        if ind == 0:
            vals = val.format(str(gpList.loc[ind, 'ts_code']), str(gpList.loc[ind, 'symbol']),
                              str(gpList.loc[ind, 'name']), str(gpList.loc[ind, 'area']),
                              str(gpList.loc[ind, 'industry']),str(gpList.loc[ind, 'fullname']) ,
                              str(gpList.loc[ind, 'enname']),str(gpList.loc[ind, 'market']),
                              str(gpList.loc[ind, 'exchange']),str(gpList.loc[ind, 'curr_type']),
                              str(gpList.loc[ind, 'list_status']),
                              str(gpList.loc[ind, 'list_date']),str(gpList.loc[ind, 'delist_date']),
                              str(gpList.loc[ind, 'is_hs']))
        elif ind % 1 != 0:
            vals = vals + ", " + val.format(str(gpList.loc[ind, 'ts_code']), str(gpList.loc[ind, 'symbol']),
                              str(gpList.loc[ind, 'name']), str(gpList.loc[ind, 'area']),
                              str(gpList.loc[ind, 'industry']),str(gpList.loc[ind, 'fullname']) ,
                              str(gpList.loc[ind, 'enname']),str(gpList.loc[ind, 'market']),
                              str(gpList.loc[ind, 'exchange']),str(gpList.loc[ind, 'curr_type']),
                              str(gpList.loc[ind, 'list_status']),
                              str(gpList.loc[ind, 'list_date']),str(gpList.loc[ind, 'delist_date']),
                              str(gpList.loc[ind, 'is_hs']))
        elif ind % 1 == 0:
            #每100条数据一起插入表格
            insert_sql = prefix_sql.format(vals)
            print(insert_sql)
            excute_sql(conn, cursor, insert_sql)
            #插入数据后本条数据作为下一批次插入数据的第一条
            vals = val.format(str(gpList.loc[ind, 'ts_code']), str(gpList.loc[ind, 'symbol']),
                              str(gpList.loc[ind, 'name']), str(gpList.loc[ind, 'area']),
                              str(gpList.loc[ind, 'industry']), str(gpList.loc[ind, 'fullname']),
                              str(gpList.loc[ind, 'enname']), str(gpList.loc[ind, 'market']),
                              str(gpList.loc[ind, 'exchange']), str(gpList.loc[ind, 'curr_type']),
                              str(gpList.loc[ind, 'list_status']),
                              str(gpList.loc[ind, 'list_date']), str(gpList.loc[ind, 'delist_date']),
                              str(gpList.loc[ind, 'is_hs']))

        if ind == cnt:
            insert_sql = prefix_sql.format(vals)
            excute_sql(conn, cursor, insert_sql)

    print('一共插入{0}条数据'.format(gpList['ts_code'].count))

#插入基本行情数据
def insert_daily_data(conn, cursor,df):
    prefix_sql = '''
                    insert into stockdaily (ts_code, trade_date, open, high, low ,close ,pre_close ,
                    chg ,pct_chg, vol,amount) values {0}     
                 '''
    val = " ('{0}',{1},{2},{3},{4},{5},{6},{7},{8},{9},{10})"
    vals = ''
    cnt = df.ts_code.count() - 1
    for ind in df.index:
        #批量插入数据，每一百条插入一次
        if ind == 0:
            vals = val.format(str(df.loc[ind, 'ts_code']), str(df.loc[ind, 'trade_date']),
                              str(df.loc[ind, 'open']), str(df.loc[ind, 'high']),
                              str(df.loc[ind, 'low']), str(df.loc[ind, 'close']), str(df.loc[ind, 'pre_close']),
                              str(df.loc[ind, 'change']), str(df.loc[ind, 'pct_chg']), str(df.loc[ind, 'vol']),
                              str(df.loc[ind, 'amount']))
        elif ind % 100 != 0:
            vals = vals + ", " + val.format(str(df.loc[ind, 'ts_code']), str(df.loc[ind, 'trade_date']),
                              str(df.loc[ind, 'open']), str(df.loc[ind, 'high']),
                              str(df.loc[ind, 'low']), str(df.loc[ind, 'close']), str(df.loc[ind, 'pre_close']),
                              str(df.loc[ind, 'change']), str(df.loc[ind, 'pct_chg']), str(df.loc[ind, 'vol']),
                              str(df.loc[ind, 'amount']))
        elif ind % 100 == 0:
            #每100条数据一起插入表格
            insert_sql = prefix_sql.format(vals)
            excute_sql(conn, cursor, insert_sql)
            #插入数据后本条数据作为下一批次插入数据的第一条
            vals = val.format(str(df.loc[ind, 'ts_code']), str(df.loc[ind, 'trade_date']),
                              str(df.loc[ind, 'open']), str(df.loc[ind, 'high']),
                              str(df.loc[ind, 'low']), str(df.loc[ind, 'close']), str(df.loc[ind, 'pre_close']),
                              str(df.loc[ind, 'change']), str(df.loc[ind, 'pct_chg']), str(df.loc[ind, 'vol']),
                              str(df.loc[ind, 'amount']))

        if ind == cnt:
            insert_sql = prefix_sql.format(vals)
            excute_sql(conn, cursor, insert_sql)
    print('一共插入{0}条数据'.format(df['ts_code'].count()))

最后，每次数据库使用完以后要进行数据库的关闭工作，定义y一个函数：

#关闭数据库连接
def close_conn(conn, cursor):
    try:
        cursor.close()
        conn.close()
    except pymysql.Error as e:
        print("数据库关闭失败！" + conn.host_info, e)
    print("数据库关闭成功！")