Python多进程并发［glob，concurrent和Pool]

最新推荐文章于 2022-12-11 21:45:53 发布

wwlhz

最新推荐文章于 2022-12-11 21:45:53 发布

阅读量365

点赞数

分类专栏： Python 多进程超大数据量

本文链接：https://blog.csdn.net/wwlhz/article/details/103506052

版权

当处理大量数据时，单核处理效率低下。通过使用concurrent库或Pool实现多进程并发能显著提升效率。Pool的map方法允许指定processes参数以自定义进程数量，未设置时将自动适应计算机性能。结合Pandas进行分批加载处理也是有效策略。

摘要由CSDN通过智能技术生成

简单的代码只能单核处理，当处理超大量数据时，会非常慢。

可以用concurrent 或Pool。

with concurrent.futures.ProcessPoolExecutor() as executor:
        roots = glob('/root/code/ocr/dataset/images/*.jpg')
   
         executor.map(verify.get_text, roots)

在这里插入图片描述

或者用Pool

from glob import glob
 import os
import concurrent.futures
from multiprocessing import Pool

def get_text(file_name):
	print(file_name)

roots = glob('images/*.jpg')
 79     start = time.time()
 80     pool = Pool(20) # 核数
 81     pool.map(get_text, roots) # 接口
 82     pool.close()
 83     pool.join()
 84     end = time.time()
 85     print(end - start)

这里我们看到进程池pool的map，有一个processes参数，这个参数可以不设置，如果不设置函数会跟根据计算机的实际情况来决定要运行多少个进程，我们也可自己设置，但是要考虑自己计算机的性能