mysql并行插入数据,如何使用并行插入语句在MySQL表中插入大熊猫数据框?

本文介绍了如何通过Pandas的`to_sql`方法以多行插入的方式将包含数百万行的大数据帧写入MySQL,但遇到了MySQL查询长度限制的问题。解决方案包括使用`method='multi'`参数进行批量插入,以及利用`multiprocessing.dummy`库创建自定义的并行插入函数,通过工作池来分块并行插入数据,显著提高数据导入速度。
摘要由CSDN通过智能技术生成

I am working on a project where I have to write a data frame with Millions of rows and about 25 columns mostly of numeric type. I am using Pandas DataFrame to SQL Function to dump the dataframe in Mysql table. I have found this function creates an Insert statement that can insert multiple rows at once. This is a good approach but MySQL has a limitation on the length of query that can be built using this approach.

Is there a way such that insert that in parallel in the same table so that I can speed up the process?

解决方案

You can do a few things to achieve that.

One way is to use an additional argument while writing to sql.

df.to_sql(method = 'multi')

According to this documentation, passing 'multi' to method argument allows you to bulk insert.

Another solution is to construct a custom insert function using multiprocessing.dummy.

here is the link to the documentation :https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.dummy

import math

from multiprocessing.dummy import Pool as ThreadPool

...

def insert_df(df, *args, **kwargs):

nworkers = 4 # number of workers that executes insert in parallel fashion

chunk = math.floor(df.shape[0] / nworkers) # number of chunks

chunks = [(chunk * i, (chunk * i) + chunk) for i in range(nworkers)]

chunks.append((chunk * nworkers, df.shape[0]))

pool = ThreadPool(nworkers)

def worker(chunk):

i, j = chunk

df.iloc[i:j, :].to_sql(*args, **kwargs)

pool.map(worker, chunks)

pool.close()

pool.join()

....

insert_df(df, "foo_bar", engine, if_exists='append')

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值