目录
- concurrent.futures使用方法
- 1、Executor
- 2、submit(self, fn, *args, **kwargs)
- 3、map(self, fn, *iterables, **kwargs)
- 4、as_completed(fs, timeout=None)
- 5、wait(fs, timeout=None, return_when=ALL_COMPLETED)
- 6、当 Python 版本 < 3.5 并且待处理的任务量较大时, 不应该使用进程池 ProcessPoolExecutor ,但线程池ThreadPoolExecutor无影响
concurrent.futures使用方法
1、Executor
- (1)Executor是一个抽象类,它提供了异步执行的调用方法;它不能直接使用,但是可以通过它的两个子类ThreadPoolExecutor和ProcessPoolExecutor来创建线程池和进程池。
- (2)Executor提供了3个抽象方法:
抽象方法 | 区别 |
---|
Executor.submit(fn, *args, **kwargs) | 提交的函数是不一样的, 或者执行的过程中可能出现异常, 就要使用到 submit(), 因为使用 map() 在执行过程中如果出现异常会直接抛出错误, 而 submit() 则会分开处理 |
Executor.map(func, *iterables, timeout=None) | 提交任务的函数是一样的,可以使用 map()方法 |
Executor.shutdown(wait=True) | 由于Executor实现了__enter__和__exit__,使得其对象可以使用with语句,可以避免必须显式调用shutdown()方法。使得当任务执行完成之后,自动执行shutdown函数,而无需编写相关释放代码。 |
- (3)进程池在windows系统里面注意两点,否则进程池无法启动,并可能引发错误:AttributeError: Can’t pickle local object …:
- a. 在if __name__ == ‘__main__’:下面启动函数
- b.如果调用的函数里面使用到数据库,数据库不能放在自定义类里面,如:self.db,self.collection这个在windows进程池里面不可使用,最好将数据库相关操作放在函数里面。
- (4)当 Python 版本 < 3.5 并且待处理的任务量较大时, 不应该使用 concurrent.futures。
2、submit(self, fn, *args, **kwargs)
- (1)submit方法用于提交一个可并行的方法,submit方法同时返回一个future实例。
- (2)future对象标识这个线程/进程异步进行,并在未来的某个时间执行完成。future实例表示线程/进程状态的回调。
- (3)ThreadPoolExecutor线程池使用submit()方法:
import time
from concurrent.futures import ThreadPoolExecutor, wait
def execute_thread():
def fib(n):
if n <= 2:
return 1
return fib(n - 1) + fib(n - 2)
start = time.time()
numbers = list(range(30, 40))
with ThreadPoolExecutor(max_workers=4) as executor:
futures = []
for num in numbers:
task = executor.submit(fib, num)
futures.append(task)
wait(futures)
for num, future in zip(numbers, futures):
print(f'fib({num}) = {future.result()}')
print(f'COST: {time.time() - start}s')
execute_thread()
>>>>> 输出如下:
fib(30) = 832040
fib(31) = 1346269
fib(32) = 2178309
fib(33) = 3524578
fib(34) = 5702887
fib(35) = 9227465
fib(36) = 14930352
fib(37) = 24157817
fib(38) = 39088169
fib(39) = 63245986
COST: 47.68672752380371s
- (4)ProcessPoolExecutor进程池使用submit()方法:
import time
from concurrent.futures import ProcessPoolExecutor, wait
def fib(n):
if n <= 2:
return 1
return fib(n - 1) + fib(n - 2)
def execute_process():
start = time.time()
numbers = list(range(30, 40))
with ProcessPoolExecutor(max_workers=4) as executor:
futures = []
for num in numbers:
task = executor.submit(fib, num)
futures.append(task)
wait(futures)
for num, future in zip(numbers, futures):
print(f'fib({num}) = {future.result()}')
print(f'COST: {time.time() - start}')
if __name__ == '__main__':
execute_process()
>>>>> 输出如下:
fib(30) = 832040
fib(31) = 1346269
fib(32) = 2178309
fib(33) = 3524578
fib(34) = 5702887
fib(35) = 9227465
fib(36) = 14930352
fib(37) = 24157817
fib(38) = 39088169
fib(39) = 63245986
COST: 33.01688861846924
3、map(self, fn, *iterables, **kwargs)
- (1)传入参数:函数名,序列(如:列表,元组)
- (2)返回的results列表是有序的生成器,顺序和*iterables迭代器的顺序一致。
- (3)ThreadPoolExecutor线程池使用map()方法:
import time
from concurrent.futures import ThreadPoolExecutor
def execute_thread():
def fib(n):
if n <= 2:
return 1
return fib(n - 1) + fib(n - 2)
start = time.time()
numbers = list(range(30, 40))
with ThreadPoolExecutor(max_workers=4) as executor:
result = executor.map(fib, numbers)
for num, value in zip(numbers, result):
print(f'fib({num}) = {value}')
print(f'COST: {time.time() - start}')
execute_thread()
>>>>> 输出如下:
fib(30) = 832040
fib(31) = 1346269
fib(32) = 2178309
fib(33) = 3524578
fib(34) = 5702887
fib(35) = 9227465
fib(36) = 14930352
fib(37) = 24157817
fib(38) = 39088169
fib(39) = 63245986
COST: 49.969857931137085
- (4)ProcessPoolExecutor进程池使用map()方法:
import time
from concurrent.futures import ProcessPoolExecutor
def fib(n):
if n <= 2:
return 1
return fib(n - 1) + fib(n - 2)
def execute_process():
start = time.time()
numbers = list(range(30, 40))
with ProcessPoolExecutor(max_workers=4) as executor:
result = executor.map(fib, numbers)
for num, value in zip(numbers, result):
print(f'fib({num}) = {value}')
print(f'COST: {time.time() - start}')
if __name__ == '__main__':
execute_process()
>>>>> 输出如下:
fib(30) = 832040
fib(31) = 1346269
fib(32) = 2178309
fib(33) = 3524578
fib(34) = 5702887
fib(35) = 9227465
fib(36) = 14930352
fib(37) = 24157817
fib(38) = 39088169
fib(39) = 63245986
COST: 33.468914270401
4、as_completed(fs, timeout=None)
- (1)submit函数返回future对象,future提供了跟踪任务执行状态的方法。比如判断任务是否执行中future.running(),判断任务是否执行完成future.done()等等。
- (2)as_completed方法传入futures迭代器和timeout两个参数,默认timeout=None,阻塞等待任务执行完成,并返回执行完成的future对象迭代器,迭代器是通过yield实现的。 timeout>0,等待timeout时间,如果timeout时间到仍有任务未能完成,不再执行并抛出异常TimeoutError
- (3)as_completed()方法是一个生成器,在没有任务完成的时候,会阻塞,在有某个任务完成的时候,会yield这个任务,就能执行for循环下面的语句,然后继续阻塞住,循环到所有的任务结束。从结果也可以看出,先完成的任务会先通知主线程。
from concurrent.futures import ThreadPoolExecutor, as_completed
def fib(n):
if n <= 2:
return 1
return fib(n - 1) + fib(n - 2)
with ThreadPoolExecutor(max_workers=2) as executor:
numbers = [17, 25, 28]
futures = [executor.submit(fib, n) for n in numbers]
for future in futures:
print(f'执行中:{future.running()}, 已完成:{future.done()}, 结果:{future.result()}')
print('#### 分界线 ####')
for future in as_completed(futures, timeout=2):
print(f'执行中:{future.running()}, 已完成:{future.done()}, 结果:{future.result()}')
>>>>> 输出如下:
执行中:False, 已完成:True, 结果:1597
执行中:False, 已完成:True, 结果:75025
执行中:True, 已完成:False, 结果:317811
执行中:False, 已完成:True, 结果:75025
执行中:False, 已完成:True, 结果:1597
执行中:False, 已完成:True, 结果:317811
5、wait(fs, timeout=None, return_when=ALL_COMPLETED)
- (1)wait方法接会返回一个tuple(元组),tuple中包含两个set(集合),一个是completed(已完成的)另外一个是uncompleted(未完成的)。
- (2)使用wait方法的一个优势就是获得更大的自由度,它接收三个参数FIRST_COMPLETED, FIRST_EXCEPTION和ALL_COMPLETE,默认设置为ALL_COMPLETED。
from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED
def fib(n):
if n <= 2:
return 1
return fib(n - 1) + fib(n - 2)
with ThreadPoolExecutor(max_workers=2) as executor:
numbers = [17, 25, 28]
futures = [executor.submit(fib, n) for n in numbers]
done, unfinished = wait(futures, return_when=ALL_COMPLETED)
for d in done:
print(f'执行中:{d.running()}, 已完成:{d.done()}, 结果:{d.result()}')
print('wait方法可以让主线程阻塞,直到满足设定的要求')
>>>>> 输出如下:
执行中:False, 已完成:True, 结果:317811
执行中:False, 已完成:True, 结果:75025
执行中:False, 已完成:True, 结果:1597
wait方法可以让主线程阻塞,直到满足设定的要求
6、当 Python 版本 < 3.5 并且待处理的任务量较大时, 不应该使用进程池 ProcessPoolExecutor ,但线程池ThreadPoolExecutor无影响
- (1)当 Python 版本 < 3.5 并且待处理的任务量较大时 , 不应该使用进程池 ProcessPoolExecutor ,但线程池ThreadPoolExecutor无影响
- (2)当处理的是一个很大的可迭代对象时, concurrent.futures 相对于 multiprocessing 消耗的时间会成倍扩大。
- (3)因为 multiprocessing.pool 是批量提交任务的, 这样可以节省 IPC(进程间通信) 的开销, 而 ProcessPoolExecutor 每次都只提交一个任务。这个问题在 Python 3.5 的时候得到了解决, 可以通过给 map() 方法添加一个chunksize 参数解决。
A. map() 方法
import time
from multiprocessing.pool import Pool
from concurrent.futures import ProcessPoolExecutor
NUMBERS = range(1, 100000)
def f(x):
r = 0
for k in range(1, 52):
r += x ** (1 / k**1.5)
return r
def muti_process():
start = time.time()
lis = []
pool = Pool(4)
for num, result in zip(NUMBERS, pool.map(f, NUMBERS)):
lis.append(result)
print(len(lis))
print('multiprocessing.pool.Pool_COST: {}\n'.format(time.time() - start))
def exec_process():
start = time.time()
lis = []
with ProcessPoolExecutor(max_workers=4) as executor:
for num, result in zip(NUMBERS, executor.map(f, NUMBERS)):
lis.append(result)
print(len(lis))
print('ProcessPoolExecutor without chunksize_COST: {}\n'.format(time.time() - start))
def exec_process_chuk():
start = time.time()
lis = []
with ProcessPoolExecutor(max_workers=4) as executor:
for num, result in zip(NUMBERS, executor.map(f, NUMBERS, chunksize=25000)):
lis.append(result)
print(len(lis))
print('ProcessPoolExecutor with chunksize_COST: {}\n'.format(time.time() - start))
if __name__ == '__main__':
muti_process()
exec_process()
exec_process_chuk()
>>>>> 输出如下:
99999
multiprocessing.pool.Pool_COST: 0.5101141929626465
99999
ProcessPoolExecutor without chunksize_COST: 67.97954988479614
99999
ProcessPoolExecutor with chunksize_COST: 0.45710229873657227
B. submit() 方法,apply_async()方法
import time
from multiprocessing.pool import Pool
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
NUMBERS = range(1, 100000)
def f(x):
r = 0
for k in range(1, 52):
r += x ** (1 / k**1.5)
return r
def muti_process():
start = time.time()
pool = Pool(4)
all_task = []
for i in range(1, 100000):
task = pool.apply_async(f, (i,))
all_task.append(task)
pool.close()
pool.join()
print('multiprocessing.pool.Pool_COST: {}\n'.format(time.time() - start))
def exec_process():
start = time.time()
futures = []
with ProcessPoolExecutor(max_workers=4) as executor:
for i in range(1, 100000):
task = executor.submit(f, i)
futures.append(task)
print('ProcessPoolExecutor: {}\n'.format(time.time() - start))
def exec_thread():
start = time.time()
futures = []
with ThreadPoolExecutor(max_workers=4) as executor:
for num, result in zip(NUMBERS, executor.map(f, NUMBERS)):
futures.append(result)
print('ThreadPoolExecutor_COST: {}\n'.format(time.time() - start))
if __name__ == '__main__':
muti_process()
exec_process()
exec_thread()
>>>>> 输出如下:
multiprocessing.pool.Pool_COST: 4.382080316543579
ProcessPoolExecutor: 63.726953983306885
ThreadPoolExecutor_COST: 4.074100494384766