concurrent.futures —— 并发任务池的管理

concurrent.futures —— 并发任务池的管理

目的:使并发和并行任务运行管理更加简单
翻译自:concurrent.futures — Manage Pools of Concurrent Tasks

concurrent.futures 模块提供了使用线程池和进程池管理任务 worker 的接口。它们的 API 是一样的,只需要修改极少量的代码就可以在线程池和进程池之间切换。

这个模块提供了关于池的两种类型的接口类。Executors 被用来关于 worker 池,futures 被用来管理 worker 的计算结果。若要使用工作池,应用程序需先创建相应 executor 类的实例,然后提交要运行的任务。当每个任务启动时,将返回一个 Future 实例。当需要任务的结果时,应用程序可以使用 Future 来阻塞任务,直到结果可用为止。该模块提供了各种 API,以方便等待任务完成,从而无需直接管理 Future 的对象。

使用基础线程池的 map() 方法

ThreadPoolExecutor 用于管理一组 worker 线程,当任务可用于多个 worker 时,可以将任务传递给它们。本例使用 map()从可迭代输入中并发生成一组结果。具体的任务中使用 time.sleep() 暂停不同的时间量来演示,无论并发任务的执行顺序如何,map() 始终根据输入按顺序返回值。

# futures_thread_pool_map.py

from concurrent.futures import ThreadPoolExecutor as Executor
import threading
import time


def task(n):
    print('{}: sleeping {}'.format(threading.current_thread().name, n))
    time.sleep(n / 10)
    print('{}: done with {}'.format(threading.current_thread().name, n))
    return n / 10


ex = Executor(max_workers=2)
print('main: starting')
results = ex.map(task, range(5, 0, -1))
print('main: unprocessed results {}'.format(results))
print('main: waiting for real results')
real_results = list(results)
print('main: results: {}'.format(real_results))

map() 的返回值实际上是一种特殊类型的迭代器,它知道在主程序迭代每个响应时等待它。

/anaconda3/envs/python36/bin/python /Users/dfsj/learning_note/Project/pic_ocr_test/demo.py
main: starting
ThreadPoolExecutor-0_0: sleeping 5
ThreadPoolExecutor-0_1: sleeping 4
main: unprocessed results <generator object Executor.map.<locals>.result_iterator at 0x10f4670a0>
main: waiting for real results
ThreadPoolExecutor-0_1: done with 4
ThreadPoolExecutor-0_1: sleeping 3
ThreadPoolExecutor-0_0: done with 5
ThreadPoolExecutor-0_0: sleeping 2
ThreadPoolExecutor-0_1: done with 3
ThreadPoolExecutor-0_1: sleeping 1
ThreadPoolExecutor-0_0: done with 2
ThreadPoolExecutor-0_1: done with 1
main: results: [0.5, 0.4, 0.3, 0.2, 0.1]

单个任务的调度

除了使用 map(),还可以使用 submit() 使用执行器调度单个任务,并使用返回的 Future 实例等待该任务的结果。

# futures_thread_pool_submit.py

from concurrent.futures import ThreadPoolExecutor as Executor
import threading
import time


def task(n):
    print('{}: sleeping {}'.format(threading.current_thread().name, n))
    time.sleep(n / 10)
    print('{}: done with {}'.format(threading.current_thread().name, n))
    return n / 10


ex = Executor(max_workers=2)
print('main: starting')
f = ex.submit(task, 5)
print('main: future: {}'.format(f))
print('main: waiting for results')
result = f.result()
print('main: result: {}'.format(result))
print('main: future after result: {}'.format(f))

任务完成且结果可用后,future 的状态将发生更改。

/anaconda3/envs/python36/bin/python /Users/dfsj/learning_note/Project/pic_ocr_test/demo.py
main: starting
ThreadPoolExecutor-0_0: sleeping 5
main: future: <Future at 0x109daf898 state=running>
main: waiting for results
ThreadPoolExecutor-0_0: done with 5
main: result: 0.5
main: future after result: <Future at 0x109daf898 state=finished returned float>

任意顺序等待任务

直到任务完成(通过返回值或引发异常),或被取消才调用 Future 模块的 result() 方法。多个任务的结果可以按使用 map() 计划任务的顺序访问。如果结果的处理顺序无关紧要,请在每个任务完成时使用 as_completed() 方法处理它们。

# futures_as_completed.py

from concurrent import futures
import random
import time


def task(n):
    sleep_time = random.random()
    time.sleep(sleep_time)
    return n, "%.3f" % sleep_time


ex = futures.ThreadPoolExecutor(max_workers=5)
print('main: starting')

wait_for = [ex.submit(task, i) for i in range(5, 0, -1)]

for f in futures.as_completed(wait_for):
    print('main: result: {}'.format(f.result()))

由于池中的 worker 线程数与任务数相同,因此可以启动所有任务。它们以随机顺序完成,因此每次运行示例时由 as_completed() 生成的值都不同。

/anaconda3/envs/python36/bin/python /Users/dfsj/learning_note/Project/pic_ocr_test/demo.py
main: starting
main: result: (1, '0.059')
main: result: (5, '0.249')
main: result: (3, '0.276')
main: result: (4, '0.341')
main: result: (2, '0.804')

Future 回调

若要在任务完成时执行某些操作,而不显式等待结果,请使用 add_done_callback() 指定在 Future 完成时要调用的新函数。回调应该是可调用的,只接受一个参数,即 Future的实例。

# futures_future_callback.py

from concurrent import futures
import time


def task(n):
    print('{}: sleeping'.format(n))
    time.sleep(0.5)
    print('{}: done'.format(n))
    return n / 10


def done(fn):
    if fn.cancelled():
        print('{}: canceled'.format(fn.arg))
    elif fn.done():
        error = fn.exception()
        if error:
            print('{}: error returned: {}'.format(fn.arg, error))
        else:
            result = fn.result()
            print('{}: value returned: {}'.format(fn.arg, result))


if __name__ == '__main__':
    ex = futures.ThreadPoolExecutor(max_workers=2)
    print('main: starting')
    f = ex.submit(task, 5)
    f.arg = 5
    f.add_done_callback(done)
    result = f.result()

无论 Future 被视为“完成”的原因是什么,都会调用回调,因此在以任何方式使用回调之前,必须检查传入回调的对象的状态。

/anaconda3/envs/python36/bin/python /Users/dfsj/learning_note/Project/pic_ocr_test/demo.py
main: starting
5: sleeping
5: done
5: value returned: 0.5

任务取消

如果 Future 已提交但尚未启动,则可以通过调用其 cancel() 方法取消它。

# futures_future_callback_cancel.py

from concurrent import futures
import time


def task(n):
    print('{}: sleeping'.format(n))
    time.sleep(0.5)
    print('{}: done'.format(n))
    return n / 10


def done(fn):
    if fn.cancelled():
        print('{}: canceled'.format(fn.arg))
    elif fn.done():
        print('{}: not canceled'.format(fn.arg))


if __name__ == '__main__':
    ex = futures.ThreadPoolExecutor(max_workers=2)
    print('main: starting')
    tasks = []

    for i in range(10, 0, -1):
        print('main: submitting {}'.format(i))
        f = ex.submit(task, i)
        f.arg = i
        f.add_done_callback(done)
        tasks.append((i, f))

    for i, t in reversed(tasks):
        if not t.cancel():
            print('main: did not cancel {}'.format(i))

    ex.shutdown()

cancel() 返回一个布尔值,指示是否可以取消任务。

/anaconda3/envs/python36/bin/python /Users/dfsj/learning_note/Project/pic_ocr_test/demo.py
main: starting
main: submitting 10
10: sleeping
main: submitting 9
9: sleeping
main: submitting 8
main: submitting 7
main: submitting 6
main: submitting 5
main: submitting 4
main: submitting 3
main: submitting 2
main: submitting 1
1: canceled
2: canceled
3: canceled
4: canceled
5: canceled
6: canceled
7: canceled
8: canceled
main: did not cancel 9
main: did not cancel 10
9: done
9: not canceled
10: done
10: not canceled

任务中的异常

如果任务引发未处理的异常,则会将其保存到任务的 Future,并通过 result()exception() 方法使其可用。

# futures_future_exception.py

from concurrent import futures


def task(n):
    print('{}: starting'.format(n))
    raise ValueError('the value {} is no good'.format(n))


ex = futures.ThreadPoolExecutor(max_workers=2)
print('main: starting')
f = ex.submit(task, 5)

error = f.exception()
print('main: error: {}'.format(error))

try:
    result = f.result()
except ValueError as e:
    print('main: saw error "{}" when accessing result'.format(e))

如果在任务函数中引发未处理的异常后调用 result(),则在当前上下文中会重新引发相同的异常。

/anaconda3/envs/python36/bin/python /Users/dfsj/learning_note/Project/pic_ocr_test/demo.py
main: starting
5: starting
main: error: the value 5 is no good
main: saw error "the value 5 is no good" when accessing result

上下文管理

执行器作为上下文管理器工作,并发地运行任务并等待它们全部完成。当上下文管理器退出时,调用执行器的 shutdown() 方法。

# futures_context_manager.py

from concurrent import futures


def task(n):
    print(n)


with futures.ThreadPoolExecutor(max_workers=2) as ex:
    print('main: starting')
    ex.submit(task, 1)
    ex.submit(task, 2)
    ex.submit(task, 3)
    ex.submit(task, 4)

print('main: done')

当执行离开当前作用域时,应该清理线程或进程资源时,这种使用执行器的模式非常有用。

/anaconda3/envs/python36/bin/python /Users/dfsj/learning_note/Project/pic_ocr_test/demo.py
main: starting
1
2
3
4
main: done

进程池

ProcessPoolExecutor 的工作方式与 ThreadPoolExecutor 相同,但使用进程而不是线程。这允许 CPU 密集型操作使用单独的 CPU,而不被 CPython 解释器的全局解释器锁阻塞。

# futures_process_pool_map.py

from concurrent import futures
import os


def task(n):
    return n, os.getpid()


ex = futures.ProcessPoolExecutor(max_workers=2)
results = ex.map(task, range(5, 0, -1))
for n, pid in results:
    print('ran task {} in process {}'.format(n, pid))

与线程池一样,单个工作进程可用于多个任务。

/anaconda3/envs/python36/bin/python /Users/dfsj/learning_note/Project/pic_ocr_test/demo.py
ran task 5 in process 12096
ran task 4 in process 12097
ran task 3 in process 12096
ran task 2 in process 12097
ran task 1 in process 12096

如果某个 worker 进程发生意外导致其退出,则 ProcessPoolExecutor 被认为“被破坏”,不再调度任务。

# futures_process_pool_broken.py

from concurrent import futures
import os
import signal


with futures.ProcessPoolExecutor(max_workers=2) as ex:
    print('getting the pid for one worker')
    f1 = ex.submit(os.getpid)
    pid1 = f1.result()

    print('killing process {}'.format(pid1))
    os.kill(pid1, signal.SIGHUP)

    print('submitting another task')
    f2 = ex.submit(os.getpid)
    try:
        pid2 = f2.result()
    except futures.process.BrokenProcessPool as e:
        print('could not start new tasks: {}'.format(e))

brokenprocessspool 异常实际上是在处理结果时引发的,而不是在提交新任务时引发的。

/anaconda3/envs/python36/bin/python /Users/dfsj/learning_note/Project/pic_ocr_test/demo.py
getting the pid for one worker
killing process 28877
submitting another task
could not start new tasks: A process in the process pool was terminated abruptly while the future was running or pending.
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值