使用concurrent.futures模块中的线程池与进程池

使用concurrent.futures模块中的线程池与进程池

线程池与进程池

以线程池举例,系统使用多线程方式运行时,会产生大量的线程创建与销毁,创建与销毁必定会带来一定的消耗,甚至导致系统资源的崩溃,这时使用线程池就是一个很好的解决方式。

“池”就说明了这里边维护了不止一个线程,线程池会提前创建好规定数量的线程,把需要使用多线程的任务提交给线程池,线程池会自己选择空闲的线程来执行提交的任务,任务完成后,线程并不会在池子中销毁,而是继续存在并等待完成下一个分配的任务。当线程池以满的时候,提交的线程会等待,也就是说线程池会有一个最大数量的运行线程限制。

进程池同样也是这个道理。

concurrent.futures模块为我们提供了ThreadPoolExecutor与ProcessPoolExecutor来使用线程进程池

ThreadPoolExecutor

下面是一个简单的例子

from concurrent.futures import ThreadPoolExecutor
import requests,time
url_list = ['https://www.cnblogs.com/', 'https://www.csdn.net/', 'https://github.com/']
def get_url(url):
    content = requests.get(url).content.decode()
    print(url+'已获取')

pool = ThreadPoolExecutor(max_workers=3)

start = time.time()
for url in url_list:
    future = pool.submit(get_url,url)
    # print(future)
end = time.time()
print(end-start)

输出的结果为:

0.0016434192657470703

https://www.cnblogs.com/已获取

https://www.csdn.net/已获取

https://github.com/已获取

例子中max_workers为指定线程个数,pool.submit为提交任务到线程执行,get_url为方法,url为参数

并且通过输出顺序可以看到线程池的执行并不会阻塞主线程的运行

print(future)被打了注释,现在我们取消注释运行一下:

<Future at 0x7ff6cfaa8860 state=running>

<Future at 0x7ff6ce965860 state=running>

<Future at 0x7ff6ce96e278 state=running>

0.006175518035888672

https://www.cnblogs.com/已获取

https://www.csdn.net/已获取

https://github.com/已获取

每提交一个任务后都会返回一个future对象,通过它可以查看任务运行的状态,state=running表示正在运行

future对象还有许多方法:

future.done()

from concurrent.futures import ThreadPoolExecutor
import requests,time
url_list = ['https://www.cnblogs.com/', 'https://www.csdn.net/', 'https://github.com/']
def get_url(url):
    content = requests.get(url).content.decode()
    print(url+'已获取')

pool = ThreadPoolExecutor(max_workers=3)
future_list = []
start = time.time()
for url in url_list:
    future = pool.submit(get_url,url)
    print(future.done())
    future_list.append(future)
end = time.time()

print(end-start)
time.sleep(5)
for future in future_list:
    print(future.done())

这里添加了future_list,为了显示效果中间添加sleep,最后结果为:

False

False

False

0.001546621322631836

https://www.cnblogs.com/已获取

https://www.csdn.net/已获取

https://github.com/已获取

True

True

True

future.done()可以显示当前允许状态

future.result()

from concurrent.futures import ThreadPoolExecutor
import requests,time
url_list = ['https://www.cnblogs.com/', 'https://www.csdn.net/', 'https://github.com/']
def get_url(url):
    content = requests.get(url).content.decode()
    print(url+'已获取')
    return url

pool = ThreadPoolExecutor(max_workers=3)
future_list = []
start = time.time()
for url in url_list:
    future = pool.submit(get_url,url)
    print(future.result())
    future_list.append(future)
end = time.time()

print(end-start)
for future in future_list:
    print(future.result())

结果为:

https://www.cnblogs.com/已获取

https://www.cnblogs.com/

https://www.csdn.net/已获取

https://www.csdn.net/

https://github.com/已获取

https://github.com/

2.0975613594055176

https://www.cnblogs.com/

https://www.csdn.net/

https://github.com/

可见result()方法可以得到任务的返回值,但会阻塞,因为不运行完怎么会得到返回值呢?

除此之外还有很多方法:

在这里插入图片描述

使用map方法

from concurrent.futures import ThreadPoolExecutor
import requests,time
url_list = ['https://www.cnblogs.com/', 'https://www.csdn.net/', 'https://github.com/']
def get_url(url):
    content = requests.get(url).content.decode()
    print(url+'已获取')
    return url

pool = ThreadPoolExecutor(max_workers=3)

pool.map(get_url,url_list)

与内建函数用法类似

使用wait方法

from concurrent.futures import ThreadPoolExecutor,wait
import requests,time
url_list = ['https://www.cnblogs.com/', 'https://www.csdn.net/', 'https://github.com/']
def get_url(url):
    content = requests.get(url).content.decode()
    print(url+'已获取')
    return url

pool = ThreadPoolExecutor(max_workers=3)
future_list = []
start = time.time()
for url in url_list:
    future = pool.submit(get_url,url)
    future_list.append(future)

print(wait(future_list))
end = time.time()
print(end-start)

https://www.cnblogs.com/已获取

https://www.csdn.net/已获取

https://github.com/已获取

DoneAndNotDoneFutures(done={<Future at 0x7f7506447da0 state=finished returned str>, <Future at 0x7f75074c9828 state=finished returned str>, <Future at 0x7f75064477f0 state=finished returned str>}, not_done=set())

6.678021430969238

wait返回值是一个元组,元组里是已完成和未完成的两个集合,它的return_when参数接受3个选项FIRST_COMPLETED, FIRST_EXCEPTION 和ALL_COMPLETE,默认是ALL_COMPLETE,意味着所有都完成,FIRST_COMPLETED意味着有一个完成了就可以了, FIRST_EXCEPTION是第一个出现异常就会停止wait

例如:

from concurrent.futures import ThreadPoolExecutor,wait
import requests,time
url_list = ['https://www.cnblogs.com/', 'https://www.csdn.net/', 'https://github.com/']
def get_url(url):
    content = requests.get(url).content.decode()
    print(url+'已获取')
    return url

def error(url):
    gg
    
pool = ThreadPoolExecutor(max_workers=4)
future_list = []
start = time.time()
future_list.append(pool.submit(error,'https://www.cnblogs.com/'))
for url in url_list:
    future = pool.submit(get_url,url)
    future_list.append(future)

print(wait(future_list,return_when='FIRST_EXCEPTION'))
end = time.time()
print(end-start)

DoneAndNotDoneFutures(done={<Future at 0x7fd1a5b95320 state=finished raised NameError>}, not_done={<Future at 0x7fd1a4b11a90 state=running>, <Future at 0x7fd1a4b11a20 state=running>, <Future at 0x7fd1a4c897f0 state=running>})

0.001996755599975586

https://www.cnblogs.com/已获取

https://www.csdn.net/已获取

https://github.com/已获取

ProcessPoolExecutor

进程池与线程池的使用方式基本相同,套用即可

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值