Python_进程池、线程池

concurrent.futures使用方法
1、Executor
  • (1)Executor是一个抽象类,它提供了异步执行的调用方法;它不能直接使用,但是可以通过它的两个子类ThreadPoolExecutor和ProcessPoolExecutor来创建线程池和进程池。
  • (2)Executor提供了3个抽象方法:
抽象方法区别
Executor.submit(fn, *args, **kwargs)提交的函数是不一样的, 或者执行的过程中可能出现异常, 就要使用到 submit(), 因为使用 map() 在执行过程中如果出现异常会直接抛出错误, 而 submit() 则会分开处理
Executor.map(func, *iterables, timeout=None)提交任务的函数是一样的,可以使用 map()方法
Executor.shutdown(wait=True)由于Executor实现了__enter__和__exit__,使得其对象可以使用with语句,可以避免必须显式调用shutdown()方法。使得当任务执行完成之后,自动执行shutdown函数,而无需编写相关释放代码。
  • (3)进程池在windows系统里面注意两点,否则进程池无法启动,并可能引发错误:AttributeError: Can’t pickle local object …
    • a. 在if __name__ == ‘__main__’:下面启动函数
    • b.如果调用的函数里面使用到数据库,数据库不能放在自定义类里面,如:self.db,self.collection这个在windows进程池里面不可使用,最好将数据库相关操作放在函数里面。
  • (4)当 Python 版本 < 3.5 并且待处理的任务量较大时, 不应该使用 concurrent.futures。
2、submit(self, fn, *args, **kwargs)
  • (1)submit方法用于提交一个可并行的方法,submit方法同时返回一个future实例。
  • (2)future对象标识这个线程/进程异步进行,并在未来的某个时间执行完成。future实例表示线程/进程状态的回调。
  • (3)ThreadPoolExecutor线程池使用submit()方法:
import time
from concurrent.futures import ThreadPoolExecutor, wait
def execute_thread():
    def fib(n):
        if n <= 2:
            return 1
        return fib(n - 1) + fib(n - 2)
    start = time.time()
    numbers = list(range(30, 40))
    with ThreadPoolExecutor(max_workers=4) as executor:
        futures = []
        for num in numbers:
            task = executor.submit(fib, num)
            futures.append(task)
        wait(futures)
        for num, future in zip(numbers, futures):
            print(f'fib({num}) = {future.result()}')
    print(f'COST: {time.time() - start}s')
execute_thread()

>>>>> 输出如下:
fib(30) = 832040
fib(31) = 1346269
fib(32) = 2178309
fib(33) = 3524578
fib(34) = 5702887
fib(35) = 9227465
fib(36) = 14930352
fib(37) = 24157817
fib(38) = 39088169
fib(39) = 63245986
COST: 47.68672752380371s
  • (4)ProcessPoolExecutor进程池使用submit()方法:
import time
from concurrent.futures import ProcessPoolExecutor, wait
def fib(n):
    if n <= 2:
        return 1
    return fib(n - 1) + fib(n - 2)
def execute_process():
    start = time.time()
    numbers = list(range(30, 40))
    with ProcessPoolExecutor(max_workers=4) as executor:
        futures = []
        for num in numbers:
            task = executor.submit(fib, num)
            futures.append(task)
        wait(futures)
        for num, future in zip(numbers, futures):
            print(f'fib({num}) = {future.result()}')
    print(f'COST: {time.time() - start}')
if __name__ == '__main__':
    execute_process()
 
>>>>> 输出如下:
fib(30) = 832040
fib(31) = 1346269
fib(32) = 2178309
fib(33) = 3524578
fib(34) = 5702887
fib(35) = 9227465
fib(36) = 14930352
fib(37) = 24157817
fib(38) = 39088169
fib(39) = 63245986
COST: 33.01688861846924
3、map(self, fn, *iterables, **kwargs)
  • (1)传入参数:函数名,序列(如:列表,元组)
  • (2)返回的results列表是有序的生成器,顺序和*iterables迭代器的顺序一致。
  • (3)ThreadPoolExecutor线程池使用map()方法:
import time
from concurrent.futures import ThreadPoolExecutor

def execute_thread():
    def fib(n):
        if n <= 2:
            return 1
        return fib(n - 1) + fib(n - 2)
    start = time.time()
    numbers = list(range(30, 40))
    with ThreadPoolExecutor(max_workers=4) as executor:
        result = executor.map(fib, numbers)
        # print(f'result:{list(result)}')  # 猜测生成器取完数据后就为空了
        for num, value in zip(numbers, result):
            print(f'fib({num}) = {value}')
    print(f'COST: {time.time() - start}')
execute_thread()

>>>>> 输出如下:
fib(30) = 832040
fib(31) = 1346269
fib(32) = 2178309
fib(33) = 3524578
fib(34) = 5702887
fib(35) = 9227465
fib(36) = 14930352
fib(37) = 24157817
fib(38) = 39088169
fib(39) = 63245986
COST: 49.969857931137085
  • (4)ProcessPoolExecutor进程池使用map()方法:
import time
from concurrent.futures import ProcessPoolExecutor
def fib(n):
    if n <= 2:
        return 1
    return fib(n - 1) + fib(n - 2)
def execute_process():
    start = time.time()
    numbers = list(range(30, 40))
    with ProcessPoolExecutor(max_workers=4) as executor:
        result = executor.map(fib, numbers)
        # print(f'result:{list(result)}')  # 猜测生成器取完数据后就为空了
        for num, value in zip(numbers, result):
            print(f'fib({num}) = {value}')
    print(f'COST: {time.time() - start}')
if __name__ == '__main__':
    execute_process()
    
>>>>> 输出如下:
fib(30) = 832040
fib(31) = 1346269
fib(32) = 2178309
fib(33) = 3524578
fib(34) = 5702887
fib(35) = 9227465
fib(36) = 14930352
fib(37) = 24157817
fib(38) = 39088169
fib(39) = 63245986
COST: 33.468914270401
4、as_completed(fs, timeout=None)
  • (1)submit函数返回future对象,future提供了跟踪任务执行状态的方法。比如判断任务是否执行中future.running(),判断任务是否执行完成future.done()等等。
  • (2)as_completed方法传入futures迭代器和timeout两个参数,默认timeout=None,阻塞等待任务执行完成,并返回执行完成的future对象迭代器,迭代器是通过yield实现的。 timeout>0,等待timeout时间,如果timeout时间到仍有任务未能完成,不再执行并抛出异常TimeoutError
  • (3)as_completed()方法是一个生成器,在没有任务完成的时候,会阻塞,在有某个任务完成的时候,会yield这个任务,就能执行for循环下面的语句,然后继续阻塞住,循环到所有的任务结束。从结果也可以看出,先完成的任务会先通知主线程。
from concurrent.futures import ThreadPoolExecutor, as_completed
def fib(n):
    if n <= 2:
        return 1
    return fib(n - 1) + fib(n - 2)
with ThreadPoolExecutor(max_workers=2) as executor:
    numbers = [17, 25, 28]
    futures = [executor.submit(fib, n) for n in numbers]
    for future in futures:
        print(f'执行中:{future.running()}, 已完成:{future.done()}, 结果:{future.result()}')
    print('#### 分界线 ####')
    for future in as_completed(futures, timeout=2):
        print(f'执行中:{future.running()}, 已完成:{future.done()}, 结果:{future.result()}')

>>>>> 输出如下:
执行中:False, 已完成:True, 结果:1597
执行中:False, 已完成:True, 结果:75025
执行中:True, 已完成:False, 结果:317811
#### 分界线 ####
执行中:False, 已完成:True, 结果:75025
执行中:False, 已完成:True, 结果:1597
执行中:False, 已完成:True, 结果:317811
5、wait(fs, timeout=None, return_when=ALL_COMPLETED)
  • (1)wait方法接会返回一个tuple(元组),tuple中包含两个set(集合),一个是completed(已完成的)另外一个是uncompleted(未完成的)。
  • (2)使用wait方法的一个优势就是获得更大的自由度,它接收三个参数FIRST_COMPLETED, FIRST_EXCEPTION和ALL_COMPLETE,默认设置为ALL_COMPLETED。
from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED
def fib(n):
    if n <= 2:
        return 1
    return fib(n - 1) + fib(n - 2)
with ThreadPoolExecutor(max_workers=2) as executor:
    numbers = [17, 25, 28]
    futures = [executor.submit(fib, n) for n in numbers]
    done, unfinished = wait(futures, return_when=ALL_COMPLETED)
    for d in done:
        print(f'执行中:{d.running()}, 已完成:{d.done()}, 结果:{d.result()}')
    print('wait方法可以让主线程阻塞,直到满足设定的要求')
 
>>>>> 输出如下:
执行中:False, 已完成:True, 结果:317811
执行中:False, 已完成:True, 结果:75025
执行中:False, 已完成:True, 结果:1597
wait方法可以让主线程阻塞,直到满足设定的要求
6、当 Python 版本 < 3.5 并且待处理的任务量较大时, 不应该使用进程池 ProcessPoolExecutor ,但线程池ThreadPoolExecutor无影响
  • (1)当 Python 版本 < 3.5 并且待处理的任务量较大时 , 不应该使用进程池 ProcessPoolExecutor ,但线程池ThreadPoolExecutor无影响
  • (2)当处理的是一个很大的可迭代对象时, concurrent.futures 相对于 multiprocessing 消耗的时间会成倍扩大。
  • (3)因为 multiprocessing.pool 是批量提交任务的, 这样可以节省 IPC(进程间通信) 的开销, 而 ProcessPoolExecutor 每次都只提交一个任务。这个问题在 Python 3.5 的时候得到了解决, 可以通过给 map() 方法添加一个chunksize 参数解决。
A. map() 方法
import time
from multiprocessing.pool import Pool
from concurrent.futures import ProcessPoolExecutor
NUMBERS = range(1, 100000)
def f(x):
    r = 0
    for k in range(1, 52):
        r += x ** (1 / k**1.5)
    return r
# 第一种方法:multiprocessing.pool.Pool
def muti_process():
    start = time.time()
    lis = []
    pool = Pool(4)
    for num, result in zip(NUMBERS, pool.map(f, NUMBERS)):
        lis.append(result)
    print(len(lis))
    print('multiprocessing.pool.Pool_COST: {}\n'.format(time.time() - start))
# 第二种方法:ProcessPoolExecutor无chunksize
def exec_process():
    start = time.time()
    lis = []
    with ProcessPoolExecutor(max_workers=4) as executor:
        for num, result in zip(NUMBERS, executor.map(f, NUMBERS)):
            lis.append(result)
    print(len(lis))
    print('ProcessPoolExecutor without chunksize_COST: {}\n'.format(time.time() - start))
# 第三种方法:ProcessPoolExecutor有chunksize
def exec_process_chuk():
    start = time.time()
    lis = []
    with ProcessPoolExecutor(max_workers=4) as executor:
        for num, result in zip(NUMBERS, executor.map(f, NUMBERS, chunksize=25000)):
            lis.append(result)
    print(len(lis))
    print('ProcessPoolExecutor with chunksize_COST: {}\n'.format(time.time() - start))
    
if __name__ == '__main__':
    muti_process()
    exec_process()
    exec_process_chuk()

>>>>> 输出如下:
99999
multiprocessing.pool.Pool_COST: 0.5101141929626465

99999
ProcessPoolExecutor without chunksize_COST: 67.97954988479614

99999
ProcessPoolExecutor with chunksize_COST: 0.45710229873657227
 B. submit() 方法,apply_async()方法
import time
from multiprocessing.pool import Pool
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor

NUMBERS = range(1, 100000)
def f(x):
    r = 0
    for k in range(1, 52):
        r += x ** (1 / k**1.5)
    return r
# 第一种方法:multiprocessing.pool.Pool
def muti_process():
    start = time.time()
    pool = Pool(4)
    all_task = []
    for i in range(1, 100000):
        task = pool.apply_async(f, (i,))
        all_task.append(task)
    pool.close()
    pool.join()
    print('multiprocessing.pool.Pool_COST: {}\n'.format(time.time() - start))
# 第二种方法:ProcessPoolExecutor
def exec_process():
    start = time.time()
    futures = []
    with ProcessPoolExecutor(max_workers=4) as executor:
        for i in range(1, 100000):
            task = executor.submit(f, i)
            futures.append(task)
    print('ProcessPoolExecutor: {}\n'.format(time.time() - start))
# 第三种方法:ThreadPoolExecutor
def exec_thread():
    start = time.time()
    futures = []
    with ThreadPoolExecutor(max_workers=4) as executor:
        for num, result in zip(NUMBERS, executor.map(f, NUMBERS)):
            futures.append(result)
    print('ThreadPoolExecutor_COST: {}\n'.format(time.time() - start))
if __name__ == '__main__':
    muti_process()
    exec_process()
    exec_thread()
   

>>>>> 输出如下:
multiprocessing.pool.Pool_COST: 4.382080316543579

ProcessPoolExecutor: 63.726953983306885

ThreadPoolExecutor_COST: 4.074100494384766
  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值