python多进程池初始化_如何使用初始化程序设置我的多进程池？

最新推荐文章于 2023-08-10 16:48:14 发布

佳同学

最新推荐文章于 2023-08-10 16:48:14 发布

阅读量701

点赞数

文章标签： python多进程池初始化

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_42137723/article/details/114970672

版权

在Python中使用多进程Pool时，为了实现每个进程启动时打开数据库连接并复用，可以使用初始化程序。由于初始化函数不返回值，可以考虑使用全局变量、类封装、传递函数到初始化程序或利用`functools.lru_cache`进行缓存。这些方法可以避免在每个数据处理时频繁打开和关闭连接。

摘要由CSDN通过智能技术生成

如何使用初始化程序设置我的多进程池？

我正在尝试使用多进程Pool对象。我希望每个进程在启动时打开数据库连接，然后使用该连接来处理传入的数据。(而不是为每个数据位打开和关闭连接。)这看起来像是初始化程序为此，但我无法确定工作人员和初始化程序如何通信。所以我有这样的事情：

def get_cursor():

return psycopg2.connect(...).cursor()

def process_data(data):

# here I'd like to have the cursor so that I can do things with the data

if __name__ == "__main__":

pool = Pool(initializer=get_cursor, initargs=())

pool.map(process_data, get_some_data_iterator())

我(或我)如何将光标从get_cursor()返回到process_data()？

4个解决方案

83 votes

初始化函数因此被调用：

def worker(...):

...

if initializer is not None:

initializer(*args)

因此，任何地方都没有保存返回值。您可能会认为这注定了您的命运，但是没有！每个工人都处于单独的过程中。因此，您可以使用普通的global变量。

这并不完全漂亮，但是可以正常工作：

cursor = None

def set_global_cursor(...):

global cursor

cursor = ...

现在您可以在multiprocessing函数中使用psycopg2。每个单独进程中的multiprocessing变量与所有其他进程是分开的，因此它们不会互相踩踏。

(我不知道psycopg2496是否有不同的处理方式，而该方式首先不涉及使用multiprocessing；这是对multiprocessing模块的一般问题的一般回答。)

torek answered 2020-02-20T05:54:40Z

12 votes

torek已经很好地解释了为什么在这种情况下初始化程序不起作用。但是，我个人并不喜欢Global变量，因此我想在此粘贴另一个解决方案。

这个想法是使用一个类来包装函数，并使用“全局”变量初始化该类。

class Processor(object):

"""Process the data and save it to database."""

def __init__(self, credentials):

"""Initialize the class with 'global' variables"""

self.cursor = psycopg2.connect(credentials).cursor()

def __call__(self, data):

"""Do something with the cursor and data"""

self.cursor.find(data.key)

然后致电

p = Pool(5)

p.map(Processor(credentials), list_of_data)

因此，第一个参数使用凭证初始化了类，返回了该类的实例，并使用数据映射调用该实例。

尽管这不像全局变量解决方案那么简单，但我强烈建议避免使用全局变量，并以某种安全的方式封装变量。 (我真的很希望他们有一天能够支持lambda表达式，这将使事情变得更加容易...)

yeelan answered 2020-02-20T05:55:19Z

9 votes

您还可以将函数发送到初始化程序，并在其中创建连接。之后，将光标添加到函数中。

def init_worker(function):

function.cursor = db.conn()

现在，您无需使用全局变量就可以通过function.cursor访问数据库，例如：

def use_db(i):

print(use_db.cursor) #process local

pool = Pool(initializer=init_worker, initargs=(use_db,))

pool.map(use_db, range(10))

The Unfun Cat answered 2020-02-20T05:55:44Z

5 votes

鉴于在初始化器中定义全局变量通常是不可取的，因此我们可以避免使用它们，也可以避免在每个调用中通过在每个子进程中进行简单的缓存来重复进行昂贵的初始化：

from functools import lru_cache

from multiprocessing.pool import Pool

from time import sleep

@lru_cache(maxsize=None)

def _initializer(a, b):

print(f'Initialized with {a}, {b}')

def _pool_func(a, b, i):

_initializer(a, b)

sleep(1)

print(f'got {i}')

arg_a = 1

arg_b = 2

with Pool(processes=5) as pool:

pool.starmap(_pool_func, ((arg_a, arg_b, i) for i in range(0, 20)))

输出：

Initialized with 1, 2

Initialized with 1, 2

Initialized with 1, 2

Initialized with 1, 2

Initialized with 1, 2

got 1

got 0

got 4

got 2

got 3

got 5

got 7

got 8

got 6

got 9

got 10

got 11

got 12

got 14

got 13

got 15

got 16

got 17

got 18

got 19

mcguip answered 2020-02-20T05:56:08Z

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。