进程池的内部数据结构及不同的线程

Pool类可以提供指定数量的进程供用户调用,当有新的请求提交到Pool中时,如果池还没有满,就会创建一个新的进程来执行请求。如果池满,请求就会告知先等待,直到池中有进程结束,才会创建新的进程来执行这些请求。 

Pool函数返回的进程池对象中有下面一些数据结构:

self._inqueue  接收任务队列(SimpleQueue),用于主进程将任务发送给worker进程
self._outqueue  发送结果队列(SimpleQueue),用于worker进程将结果发送给主进程
self._taskqueue  同步的任务队列,保存线程池分配给主进程的任务
self._cache = {}  任务缓存
self._processes  worker进程个数
self._pool = []  woker进程队列
进程池工作时,任务的接收、分配。结果的返回,均由进程池内部的各个线程合作完成,来看看进程池内部有那些线程:

_work_handler线程,负责保证进程池中的worker进程在有退出的情况下,创建出新的worker进程,并添加到进程队列(pools)中,保持进程池中的worker进程数始终为processes个。_worker_handler线程回调函数为Pool._handler_workers方法,在进程池state==RUN时,循环调用_maintain_pool方法,监控是否有进程退出,并创建新的进程,append到进程池pools中,保持进程池中的worker进程数始终为processes个。

_task_handler线程,负责从进程池中的task_queue中,将任务取出,放入接收任务队列(Pipe)

_handle_results线程,负责将处理完的任务结果,从outqueue(Pipe)中读取出来,放在任务缓存cache中

_terminate,这里的_terminate并不是一个线程,而是一个Finalize对象

进程池中的数据结构、各个线程之间的合作关系如下图所示:


The multiprocessing.Pool modules tries to provide a similar interface.

Pool.apply is like Python apply, except that the function call is performed in a separate process. Pool.apply blocks until the function is completed.

Pool.apply_async is also like Python's built-in apply, except that the call returns immediately instead of waiting for the result. An ApplyResult object is returned. You call its get() method to retrieve the result of the function call. The get() method blocks until the function is completed. Thus, pool.apply(func, args, kwargs) is equivalent to pool.apply_async(func, args, kwargs).get().

In contrast to Pool.apply, the Pool.apply_async method also has a callback which, if supplied, is called when the function is complete. This can be used instead of calling get().

If you want the Pool of worker processes to perform many function calls asynchronously, use Pool.apply_async. The order of the results is not guaranteed to be the same as the order of the calls to Pool.apply_async.

Notice also that you could call a number of different functions with Pool.apply_async (not all calls need to use the same function).

In contrast, Pool.map applies the same function to many arguments. However, unlike Pool.apply_async, the results are returned in an order corresponding to the order of the arguments.

 recommend map_async for three reasons:

  1. It's cleaner looking code. This:

    pool = Pool(processes=proc_num)
    async_result = pool.map_async(post_processing_0.main, split_list)
    pool.close()
    pool.join()
    

    looks nicer than this:

    pool = Pool(processes=proc_num)
    P={}
    for i in range(0,proc_num):
        P['process_'+str(i)]=pool.apply_async(post_processing_0.main, [split_list[i]])
    pool.close()
    pool.join()
    
  2. With apply_async, if an exception occurs inside of post_processing_0.main, you won't know about it unless you explicitly call P['process_x'].get() on the failing AsyncResultobject, which would require iterating over all of P. With map_async the exception will be raised if you call async_result.get() - no iteration required.

  3. map_async has built-in chunking functionality, which will make your code perform noticeably better if split_list is very large.

Other than that, the behavior is basically the same if you don't care about the results.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值