直接上结论:
- 相似点:
apply()
是包装过的apply_async().get()
, - 不同点:处理
task
的时候apply_async().get()
可以实现同步执行多个,而apply()
只能一个一个执行。 - 发现:
apply_async().get
相对省时间。
一、为什么说apply()
是包装过的apply_async().get()
?
首先扯点历史(链接):
在python的早期,执行带参数的
function
是这样的:apply(function, args, kwargs) # `apply`在`python2.7`里还有,当然`python3.x`没有了
现在直接是:
function(*args, **kwargs)
Multiprocessing.Pool模块就是借鉴了相同的操作定义的函数。
然后再说apply
和apply_async
:
- 先上2段来自multiprocessing/pool.py官方文档的源代码(借鉴思路的来源链接):
def apply(self, func, args=(), kwds={}):
"""
Equivalent of `func(*args, **kwds)`. # 相当于内置的apply()
"""
assert self._state == RUN
return self.apply_async(func, args, kwds).get()
def apply_async(self, func, args=(), kwds={}, callback=None, error_callback=None):
"""
Asynchronous equivalent of `apply()` method.
"""
assert self._state == RUN
result = ApplyResult(self._cache, callback)
self._taskqueue.put(([(result._job, 0, func, args, kwds)], None))
return result
所以,文档很直观的显示,pool.apply(func, args, kwds)
几乎等价于pool.apply_async(func, args, kwds).get()
。
二、两者在处理task
上有什么不一样?
这牵扯到apply
是 什么 & 怎么运行的。因为在跑多个task的时候,apply
是一个接一个地分配到可用的Pool
,而apply_async
是一股脑放到队列里、然后由负责该队列的一个thread把它们分配到可用的Pool
。所以使用apply_async
可能存在不止1个process在运行。
举个栗子:
from multiprocessing import Pool
from time import sleep, time
def worker(i):
print('Entering worker ', i)
sleep(2)
print('Exiting worker')
return 'worker_response'
"""第一种,使用appy"""
if __name__ == '__main__': # 只能一个一个排着队执行,即使Pool(4)设置4;
print('Starting...')
start_time = time()
pool = Pool(4)
print([pool.apply(worker, (i, )) for i in range(8)])
# for i in a:
# i.wait()
print(f'The End. Time: {time() - start_time:.5f} s') # 所以时间近似 2*8 = 16秒
# 结果如下
Starting...
Entering worker 0
Exiting worker
Entering worker 1
Exiting worker
Entering worker 2
Exiting worker
Entering worker 3
Exiting worker
Entering worker 4
Exiting worker
Entering worker 5
Exiting worker
Entering worker 6
Exiting worker
Entering worker 7
Exiting worker
['worker_response', 'worker_response', 'worker_response', 'worker_response', 'worker_response', 'worker_response', 'worker_response', 'worker_response']
The End. Time: 16.10335 s
"""第二种,使用appy_async"""
if __name__ == '__main__': # 按照Pool(4)一次性执行4个
print('Starting...')
start_time = time()
pool = Pool(4)
a = [pool.apply_async(worker, (i, )) for i in range(8)]
print([res.get() for res in a])
print(f'The End. Time: {time() - start_time:.5f} s') # 所以时间近似 2*2 = 4秒
## 结果如下
Starting...
Entering worker 0
Entering worker 1
Entering worker 2
Entering worker 3 # 一共8个task。如果实际运行的话,第一阶段有这4行,代表一次性运行4个task
Exiting worker
Entering worker 4
Exiting worker
Entering worker 5
Exiting worker
Entering worker 6
Exiting worker
Entering worker 7 # 第二阶段接着有如上8行,即第一阶段的结果和第二阶段的4个task
Exiting worker
Exiting worker
Exiting worker
Exiting worker
['worker_response', 'worker_response', 'worker_response', 'worker_response', 'worker_response', 'worker_response', 'worker_response', 'worker_response']
The End. Time: 4.13072 s # 第三阶段,执行完毕。
三、意外发现
原因暂时没找到。处理手头一个的任务发现如下时间差异:
from multiprocessing.pool import Pool, ThreadPool
if __name__ == '__main__':
""" 耗时32 s"""
pool =Pool(processes = 8)
a = pool.apply(func, ())
""" 耗时28s"""
pool =Pool(processes = 8)
a = pool.apply_async(func, ()).get()
""" 耗时25s"""
pool =ThreadPool(processes = 8)
a = pool.apply(func, ())
""" 耗时23s"""
pool =ThreadPool(processes = 8)
a = pool.apply_async(func, ()).get()
备注: 似乎下边2种功能一模一样,实际使用起来没区别:
from multiprocessing import Pool
和from multiprocessing.pool import Pool