通过python+内存表快速批量生成可重复的数据

这里写自定义目录标题

`import mysql.connector
import time
from faker import Faker

config = {
    "host": "192.168.100.128",
    "user": "xxxxx",
    "password": "xxxxx",
    "database": "xxxxx"
}

connection = mysql.connector.connect(**config)
curs = connection.cursor(dictionary=True)

'''
curs.callproc("hello_procedure", [1111])
for result in curs.stored_results():
    data = result.fetchall()
    print(data)
'''


def generate_sql():
    faker = Faker('zh-CN')
    name = faker.name()
    phone_number = faker.phone_number()
    address = faker.address()
    nick_name = faker.text(7)[0:6]
    email = faker.email()
    sql = 'INSERT INTO user_system_memory VALUES ' \
          '("%s", "%s", "%s", "%s", "%s", "0", NOW(), "", 0)'
    sql_formated = sql % (name, nick_name, address, email, phone_number)
    print(sql_formated)
    return sql_formated


def deal_memory_data(count):
    try:
        for m in range(count):
            for n in range(1000):
                sql_formated = generate_sql()
                curs.execute(sql_formated)
            sql = "INSERT INTO user_system (user_name, nick_name, address, email, phonenumber, user_status, create_time, avatar, sex) SELECT * FROM user_system_memory";
            curs.execute(sql)
            connection.commit()
            empty_sql = "DELETE FROM user_system_memory"
            curs.execute(empty_sql)
    finally:
        curs.close()
        connection.close()


if __name__ == "__main__":
    start_time = time.time()
    deal_memory_data(10)
    end_time = time.time()
    total_time = end_time-start_time
    print("total_time is "+str(total_time))

最终花费时间: total_time is 267.61084628105164 4.45分钟,时间还是太长了,待优化
遇到的小问题: connection没commit,导致数据库一直查询不到数据,还是用少了

def deal_memory_data_extra(count):
    try:
        for n in range(2000):
            sql_formated = generate_sql()
            curs.execute(sql_formated)
        for m in range(count):
            sql = "INSERT INTO user_system (user_name, nick_name, address, email, phonenumber, user_status, create_time, avatar, sex) SELECT * FROM user_system_memory";
            curs.execute(sql)
            connection.commit()
        empty_sql = "DELETE FROM user_system_memory"
        curs.execute(empty_sql)
    finally:
        curs.close()
        connection.close()

最终耗时54.26630163192749,优化的内容: 上面是每次清空内存表,再生成新的内存表数据,这里耗费太多时间了,如果没有强制要求数据不能重复,则此方案可行
每个内存表存10000条数据后,提交10次生成10万数据,耗时: 275.8327052593231。

创建10000条数据写入内存表的时间: 270.4083368778229
将内存表数据写入数据库的时间: 0.6571922302246094 性能恐怖如斯呀!
此处将数据库插入内存表的时间过长,是因为每次都要new一个faker对象,优化后,耗时如下:
memory data deal time is 7.69097900390625
copy memory data to database is 2.7691831588745117

def get_user_info(n):
    faker = Faker('zh-CN')
    info = []
    for i in range(n):
        name = faker.name()
        phone_number = faker.phone_number()
        address = faker.address()
        nick_name = faker.text(7)[0:6]
        email = faker.email()
        info.append((name, nick_name, address, email, phone_number))
    return info
def deal_memory_data_extra2(count):
    try:
        start_time1 = time.time()
        sql = 'INSERT INTO user_system_memory VALUES(%s, %s, %s, %s, %s, 0, NOW(), "", 0)'
        values = get_user_info(10000)
        curs.executemany(sql, values)
        connection.commit()
        end_time1 = time.time()
        total_time1 = end_time1 - start_time1
        print("memory data deal time is " + str(total_time1))

        start_time = time.time()
        for m in range(count):
            sql = "INSERT INTO user_system (user_name, nick_name, address, email, phonenumber, user_status, create_time, avatar, sex) SELECT * FROM user_system_memory";
            curs.execute(sql)
            connection.commit()
        empty_sql = "DELETE FROM user_system_memory"
        curs.execute(empty_sql)
        end_time = time.time()
        total_time = end_time - start_time
        print("copy memory data to database is " + str(total_time))
    finally:
        curs.close()
        connection.close()

生成10000数据插入到内存表的时候,通过批处理后,再次有了提升,时间如下:
memory data deal time is 2.1205103397369385
copy memory data to database is 3.463122844696045
生成10万的数据总共花费时间: 5秒多

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值