[Python]动态生成百万sql语句

最新推荐文章于 2024-04-23 15:49:27 发布

咸煌

最新推荐文章于 2024-04-23 15:49:27 发布

阅读量719

点赞数 1

分类专栏： Python

本文链接：https://blog.csdn.net/qq_21084687/article/details/104366207

版权

Python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

有一天，leader需要我造一个月大约300万的某演示数据，且要求每天的统计数据有规律，比如高峰低峰时段，工作日和周末数据差异等等，一天之内就要完成，开完这个会上午基本就过去了，吃完午饭的时候还没头绪咋弄呢。

一条一条往数据库里面手动插入是不可能的，一条数据字段就几十个了，真干起来不得累死，还干不完，于是就想着写个插入的sql语句就好了撒。写个一条完整的插入语句会的，但是要动态变化字段这就难为我了，毕竟还没那么了解数据库语句的使用。

数据处理不是python的拿手好戏么，正好前个周六的时候没事在家看了本python的书《Python全栈工程师养成攻略》，跟着前面章节用python统计了一下西游记里面字的个数，并进行了读取和写入的操作。嘻嘻，正好可以整个百万insert语句的文件嘛。

说干就干，分为几步：
1、读取目标文件
2、生成一条insert语句
3、写入目标文件
4、循环2、3步
5、关闭文件

得到的sql文件往数据库一跑，数据就哗哗的进去了。

生成百万数据的核心代码示例：

import date_util
import random
import time

# 类型
types = [5, 6, 12, 16, 17, 21]

# 删除原因
pre_delete_reasons = ['E0203', 'E0301', 'E0408', 'E0503', 'E0305']

frmt='%Y-%m-%d %H:%M:%S'

def generate_sql(id, count, start_time, end_time, file_name):
    fw = open(file_name, 'a', encoding='utf-8')

    while id < count:
        index = random.randint(0, len(illegal_types) - 1)
        # 随机类型
        illegal_type = illegal_types[index]
        # 随机状态
        pre_status = random.randint(1, 2)
        # 审核状态
        audit_status = 0
        if pre_status == 2:
            audit_status = 9
        # 处理时间
        pre_handle_time = date_util.random_timestamp(start_time, end_time) * 1000
        # 通过时间
        pass_time = "'" + time.strftime(frmt, time.localtime(pre_handle_time / 1000)) + ".001'"
        # 删除原因
        index = random.randint(0, len(pre_delete_reasons) - 1)
        pre_delete_reason = "'" + pre_delete_reasons[index] + "'"
        
        # 拼接sql语句
        sql = f"""INSERT INTO i.info VALUES ({id}, {pass_time}, NULL, NULL, NULL, {pre_handle_time}, {type}, 1, {audit_status}, NULL, 'B528', 5, 1, '50010600051211000001', '50010600051191000001', '1', 27, 'K33', NULL, 2, '["/images/dn@dn127/2/25-2631f8b0-1dfd8"]', NULL, NULL, 80.0000, NULL, NULL, NULL, NULL, NULL, 'PASS', NULL, {type}, 'B528', 5, 1, NULL, NULL, NULL, NULL, {pre_status}, {pre_delete_reason}, {type}, 'B528', 5, {pre_handle_time}, 44, 1, NULL, 0, 0, 0, 0, '1B2B3B', 0, NULL, 0, NULL, 0, NULL, NULL);"""
        id += 1
        fw.write(sql + '\n')

    fw.close()


def generate():
    count = 1

    # 生成半夜数据
    # id = count
    # count = id + 5000 + random.randint(0, 3000)
    # start_time = '2020-01-05 00:00:00'
    # end_time = '2020-01-05 6:59:59'
    # print(f"id: {id}, count: {count}, start_time: {start_time}, end_time: {end_time}")
    # generate_sql(id, count, start_time, end_time, '1_half_night.sql')

    # # 生成早高峰数据1
    # id = count
    # count = id + 5000 + random.randint(0, 200)
    # start_time = '2020-01-05 07:00:00'
    # end_time = '2020-01-05 07:59:59'
    # print(f"id: {id}, count: {count}, start_time: {start_time}, end_time: {end_time}")
    # generate_sql(id, count, start_time, end_time, '2_morning_top.sql')
    #
    # # 生成早高峰数据2
    # id = count
    # count = id + 8000 + random.randint(0, 200)
    # start_time = '2020-01-05 08:00:00'
    # end_time = '2020-01-05 08:59:59'
    # print(f"id: {id}, count: {count}, start_time: {start_time}, end_time: {end_time}")
    # generate_sql(id, count, start_time, end_time, '2_morning_top.sql')
    #
    # # 生成早高峰数据3
    # id = count
    # count = id + 7200 + random.randint(0, 200)
    # start_time = '2020-01-05 09:00:00'
    # end_time = '2020-01-05 09:59:59'
    # print(f"id: {id}, count: {count}, start_time: {start_time}, end_time: {end_time}")
    # generate_sql(id, count, start_time, end_time, '2_morning_top.sql')
    #
    #  生成白天日常数据
    id = count
    count = id + 3000 + random.randint(0, 500)
    start_time = '2020-01-05 10:00:00'
    end_time = '2020-01-05 16:59:59'
    print(f"id: {id}, count: {count}, start_time: {start_time}, end_time: {end_time}")
    generate_sql(id, count, start_time, end_time, '3_day.sql')
    #
    # # 生成晚高峰数据1
    # id = count
    # count = id + 7500 + random.randint(0, 200)
    # start_time = '2020-01-05 17:00:00'
    # end_time = '2020-01-05 17:59:59'
    # print(f"id: {id}, count: {count}, start_time: {start_time}, end_time: {end_time}")
    # generate_sql(id, count, start_time, end_time, '4_night_top.sql')
    #
    # # 生成晚高峰数据2
    # id = count
    # count = id + 10000 + random.randint(0, 200)
    # start_time = '2020-01-05 18:00:00'
    # end_time = '2020-01-05 18:59:59'
    # print(f"id: {id}, count: {count}, start_time: {start_time}, end_time: {end_time}")
    # generate_sql(id, count, start_time, end_time, '4_night_top.sql')
    #
    # # 生成晚高峰数据3
    # id = count
    # count = id + 6500 + random.randint(0, 200)
    # start_time = '2020-01-05 19:00:00'
    # end_time = '2020-01-05 19:59:59'
    # print(f"id: {id}, count: {count}, start_time: {start_time}, end_time: {end_time}")
    # generate_sql(id, count, start_time, end_time, '4_night_top.sql')
    #
    # # 生成晚上数据
    # id = count
    # count = id + 15000 + random.randint(0, 200)
    # start_time = '2020-01-05 20:00:00'
    # end_time = '2020-01-05 23:59:59'
    # print(f"id: {id}, count: {count}, start_time: {start_time}, end_time: {end_time}")
    # generate_sql(id, count, start_time, end_time, '5_night.sql')

    print(f"count: {count}")

generate()

本来想使用一个sql文件来承载这么多数据，没想到不得行，大概是内存不够，导致数据写不完，虽然程序正常执行完，检查数据的时候发现id到一定数之后的数据都没在里面，于是分了好几个文件来写不同时段的数据，一个文件几十万数据还是得行的。
还遇到的一个坑是，Insert语句values后面的字段要跟数据库里面的表字段顺序一致才行，不同数据库，同一张表可能顺序还不一样，语句就得调整，还好一条语句能用，n条都能用，字段太多的时候，眼睛都看花了。

回顾一下看还是觉得写得很挫，以后再改进吧。

咸煌

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
[Python]动态生成百万sql语句

有一天，leader需要我造一个月大约300万的某演示数据，且要求每天的统计数据有规律，比如高峰低峰时段，工作日和周末数据差异等等，一天之内就要完成，开完这个会上午基本就过去了，吃完午饭的时候还没头绪咋弄呢。一条一条往数据库里面手动插入是不可能的，一条数据字段就几十个了，真干起来不得累死，还干不完，于是就想着写个插入的sql语句就好了撒。写个一条完整的插入语句会的，但是要动态变化字段这就难为我了...
复制链接

扫一扫

专栏目录