回调函数的好处是解耦合
不用回调函数:
from concurrent.futures import ProcessPoolExecutor
import requests, time,os
def get(url):#下载网页信息
print("pid:%s is getting %s" % (os.getpid(), url))
time.sleep(2)
response = requests.get(url)
if response.status_code == 200:
res = response.text
print("pid:%s is parse,parse result:%s" % (os.getpid(), len(res)))
if __name__ == "__main__":
urls = [
"https://www.taobao.com/",
"https://www.jd.com/",
"https://www.python.org/",
"https://www.baidu.com/",
"https://www.openstack.org/"
]
p = ProcessPoolExecutor(4)#默认为cpu数
objs = []
for url in urls:
obj = p.submit(get, url)#子线程负责下载页面信息
print("main-------")
结果:
main-------
pid:552 is getting https://www.taobao.com/
pid:13688 is getting https://www.jd.com/
pid:12356 is getting https://www.python.org/
pid:6320 is getting https://www.baidu.com/
pid:552 is parse,parse result:149483
pid:552 is getting https://www.openstack.org/
pid:13688 is parse,parse result:109103
pid:6320 is parse,parse result:2443
pid:552 is parse,parse result:65078
pid:12356 is parse,parse result:48821
子进程负责下载并自己解析网页,虽然速度很快,但是get里包含了parse的功能,两个函数耦合性在一起,不利于后期改进,所以需要解耦合。
多进程里的回调函数:
from concurrent.futures import ProcessPoolExecutor
import requests, time,os
def get(url):#下载网页信息
print("pid:%s is getting %s" % (os.getpid(), url))
time.sleep(2)
response = requests.get(url)
if response.status_code == 200:
res = response.text
return res
def parse(obj):#解析网页
res = obj.result()#p.submit()的结果返回给parse
print("pid:%s is parse,parse result:%s" % (os.getpid(), len(res)))
if __name__ == "__main__":
urls = [
"https://www.taobao.com/",
"https://www.jd.com/",
"https://www.python.org/",
"https://www.baidu.com/",
"https://www.openstack.org/"
]
p = ProcessPoolExecutor(8)
for url in urls:
obj = p.submit(get, url)#子进程负责下载页面信息
obj.add_done_callback(parse)#主进程负责解析,parse是回调的函数名
print("main-------")
结果为:
main-------
pid:2792 is getting https://www.taobao.com/
pid:10480 is getting https://www.jd.com/
pid:1220 is getting https://www.python.org/
pid:3232 is getting https://www.baidu.com/
pid:4432 is getting https://www.openstack.org/
pid:12120 is parse,parse result:149483
pid:12120 is parse,parse result:109091
pid:12120 is parse,parse result:2443
pid:12120 is parse,parse result:48821
pid:12120 is parse,parse result:65078
从中可以看出 12120是主进程,负责解析网页,子进程负责下载网页。子进程已下载好就由主进程解析,异步调用。
线程与进程的回调函数有一个小区别。
from concurrent.futures import ThreadPoolExecutor
import requests, time,os
from threading import current_thread
def get(url):#下载网页信息
print("pid:%s is getting %s" % (current_thread().getName(), url))
time.sleep(2)
response = requests.get(url)
if response.status_code == 200:
res = response.text
return res
def parse(obj):#解析网页
res = obj.result()#p.submit()的结果返回给parse
print("pid:%s is parse,parse result:%s" % (current_thread().getName(), len(res)))
if __name__ == "__main__":
urls = [
"https://www.taobao.com/",
"https://www.jd.com/",
"https://www.python.org/",
"https://www.baidu.com/",
"https://www.openstack.org/"
]
t = ThreadPoolExecutor(8)#线程池默认为cpu数*5
for url in urls:
obj = t.submit(get, url)#子线程负责下载页面信息
obj.add_done_callback(parse)#子线程负责解析,parse是回调的函数名
print("main-------")
结果为:
pid:ThreadPoolExecutor-0_0 is getting https://www.taobao.com/
pid:ThreadPoolExecutor-0_1 is getting https://www.jd.com/
pid:ThreadPoolExecutor-0_2 is getting https://www.python.org/
pid:ThreadPoolExecutor-0_3 is getting https://www.baidu.com/
main-------
pid:ThreadPoolExecutor-0_4 is getting https://www.openstack.org/
pid:ThreadPoolExecutor-0_3 is parse,parse result:2443
pid:ThreadPoolExecutor-0_0 is parse,parse result:149483
pid:ThreadPoolExecutor-0_1 is parse,parse result:109092
pid:ThreadPoolExecutor-0_2 is parse,parse result:48821
pid:ThreadPoolExecutor-0_4 is parse,parse result:65078
从中看出,负责下载网页信息的是子线程,负责解析的也是子线程。使用上,进程和线程的回调函数没有区别。