as_completed和wait源码分析
前言
在ThreadPoolExecutor引导的多线程开发中,有as_completed()
和wait()
两个辅助函数。下面结合源码分析它们各自作用。因后面多次提到事件锁,也许,你需要对它事先了解Python同步机制(semaphore,event,queue)。
(以下基于Python3.7)
as_complete
def greet():
print("hello world")
if __name__ == "__main__":
executor = ThreadPoolExecutor()
task = executor.submit(greet)
print(type(task)) # 输出:<class 'concurrent.futures._base.Future'>
results = as_completed([task])
print(type(results)) # 输出:<class 'generator'>
for result in results:
print(type(result)) # 输出:<class 'concurrent.futures._base.Future'>
最初知道有as_completed()
这个函数我是很不解的,因为如上面代码所示,submit()
提交任务后拿到的返回值是Future对象,经过as_completed包装后成了生成器,但是打开生成器一看,结果还是Future对象!这破玩意就这点用处?显示不是。官方文档对其做的说明,揭露了它的本质:
Returns an iterator over the Future instances (possibly created by different Executor instances) given by fs that yields futures as they complete (finished or were cancelled). Any futures given by fs that are duplicated will be returned once. Any futures that completed before as_completed() is called will be yielded first. The returned iterator raises a concurrent.futures.TimeoutError if _next_() is called and the result isn’t available after timeout seconds from the original call to as_completed(). timeout can be an int or float. If timeout is not specified or None, there is no limit to the wait time.
在这里边有两句话比较重要:
- Any futures given by fs that are duplicated will be returned once.
- Any futures that completed before as_completed() is called will be yielded first.
先看第一句:当future重复时,只返回一次。示例如下:
def greet(word):
return word
if __name__ == "__main__"
executor = ThreadPoolExecutor()
tasks = [executor.submit(greet, word) for word in ["hello", "world"]]
tasksDouble = tasks * 2 # futures x 2
for item in as_completed(tasksDouble):
print(item.result())
# 输出:
hello
world
可以看出,我们对tasks做了乘2操作,但是经手as_completed()
之后并没有重复打印hello或者word。说明在as_completed中有去重操作。Python内部仅仅做了一个很简单的处理——集合真是强大的去重助理。
# as_completed源码
def as_completed(fs, timeout=None):
...
fs = set(fs) # 去重操作
...
再看第二句:as_completed会先把该函数调用之前完成的furture依次yield出去。也就是说,返回结果不会顺序了。似乎莫名奇妙,但我们来看看用as_completed和不用的区别。
# 不使用as_completed
def print_num(order):
"""
i 表示线程启动次序
通过随机获取num, 使得线程与线程之间的结束时间可能不同
"""
num = random.randrange(10)
time.sleep(num)
ordict = collections.OrderedDict()
ordict["oroder"] = order
ordict["value"] = num
return ordict # 最后打印调用次序以及线程运行的近似时间
if __name__ == "__main__":
executor = ThreadPoolExecutor()
alltasks = [executor.submit(print_num, i) for i in