python分块处理功能_python – 从大型文件分块数据进行多处理？

最新推荐文章于 2022-06-30 19:42:05 发布

weixin_39806065

最新推荐文章于 2022-06-30 19:42:05 发布

阅读量536

点赞数

文章标签： python分块处理功能

当fileobj很大时,list(file_obj)可能需要大量的内存.我们可以通过使用

itertools来根据需要抽出大量的线路来减少内存需求.

特别是我们可以使用

reader = csv.reader(f)

chunks = itertools.groupby(reader, keyfunc)

将文件拆分成可处理的块,和

groups = [list(chunk) for key, chunk in itertools.islice(chunks, num_chunks)]

result = pool.map(worker, groups)

让多处理池一次处理num_chunks块.

通过这样做,我们只需要足够的内存来在内存中保存几个(num_chunks)块,而不是整个文件.

import multiprocessing as mp

import itertools

import time

import csv

def worker(chunk):

# `chunk` will be a list of CSV rows all with the same name column

# replace this with your real computation

# print(chunk)

return len(chunk)

def keyfunc(row):

# `row` is one row of the CSV file.

# replace this with the name column.

return row[0]

def main():

pool = mp.Pool()

largefile = 'test.dat'

num_chunks = 10

results = []

with open(largefile) as f:

reader = csv.reader(f)

chunks = itertools.groupby(reader, keyfunc)

while True:

# make a list of num_chunks chunks

groups = [list(chunk) for key, chunk in

itertools.islice(chunks, num_chunks)]

if groups:

result = pool.map(worker, groups)

results.extend(result)

else:

break

pool.close()

pool.join()

print(results)

if __name__ == '__main__':

main()

weixin_39806065

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python分块处理功能_python – 从大型文件分块数据进行多处理？

当fileobj很大时,list(file_obj)可能需要大量的内存.我们可以通过使用itertools来根据需要抽出大量的线路来减少内存需求.特别是我们可以使用reader = csv.reader(f)chunks = itertools.groupby(reader, keyfunc)将文件拆分成可处理的块,和groups = [list(chunk) for key, chunk in ...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。