概念
协程,又被称为微线程,在io密集型任务中,能起到很好的作用,具体每个名词概念,如:事件循环、task、future等这里不详细说了,可自行百度。直接上干货
async/await
在python中,早几个版本有过使用yield、async装饰器等进行协程编写,但在python3.5之后,新增async/await关键字,也成为官方推荐的异步语法,我们此处只介绍这个。
异步方法编写
与常规方法不同的是,异步方法需要有async关键词,代码如下:
async def asd():
await asyncio.sleep(2)
return 123
其中,async是值该方法为一个异步方法,await后面跟随一切io操作。 什么叫一切io操作,所有耗费io的比如网络请求、task、future都是io操作,这些io操作全部需要手动执行await来挂起,否则无法执行异步。
- 需要注意的是,在异步方法里,执行任何同步模块都会是程序变为同步。
异步方法调用
好,方法编写完了,如何调用呢。如果你在网上找,大部分答案都是这样的
loop = asyncio.get_event_loop()
movie_url_list = loop.run_until_complete(asd())
或者
l = asyncio.get_event_loop()
task = asyncio.create_task(asd())
l.run_until_complete(task)
这种方法当然正确,但太过繁琐,因为这是python3.7之前的写法,在python3.7之后,只需要使用
asyncio.run(asd())
即可,这个run方法里默认创建事件循环。那么我们接下来开始测试
测试结果
首先,查看同步执行的耗时
def asd():
time.sleep(2)
print(123)
def asd2():
time.sleep(2)
print(456)
def main():
asd()
asd2()
start = time.time()
main()
print('程序运行时长:',time.time()-start)
C:\Users\main.py
123
456
程序运行时长: 4.010389566421509
进程已结束,退出代码0
可以看到,程序耗时4秒,一目了然,一个方法sleep两秒,两个方法sleep四秒,那么我们在看看异步耗时
async def asd():
await asyncio.sleep(2)
# l = asyncio.get_running_loop()
# future = l.run_in_executor(None, print_, 3,4)
# await future
return 123
async def asd3():
await asyncio.sleep(2)
# l = asyncio.get_running_loop()
# future = l.run_in_executor(None, print_, 3,5)
# await future
return 456
async def main():
# x = asd()
# done = asyncio.run(x)
# time.sleep(3)
task_list = [asyncio.create_task(asd()), asyncio.create_task(asd3())]
done,pending = await asyncio.wait(task_list)
# for i in done:
# print(i.result())
# print(done)
start = time.time()
asyncio.run(main())
print(time.time()-start)
C:\Users\main.py
2.0098962783813477
进程已结束,退出代码0
可以看到,仅仅耗时两秒,即可以理解为:两个方法的等待时间是重合的。而这,就是异步协程的快捷的地方。
异步的方式执行同步方法
上面我们说了,异步的方法里不能使用同步的模块,否则全部按照同步执行,那么我们在生产中经常要用到不支持异步的模块,难道要放弃大好异步性能吗?答案显然不是的,代码就是我上面注释掉的部分,即
def print_(i,y):
time.sleep(i)
print('print_', i,y)
l = asyncio.get_running_loop()
future = l.run_in_executor(None, print_, 3, 4)
await future
可以看到,上面我写了一个print_的方法,没有任何逻辑,只是睡眠指定时间,然后打印,很明显是个同步方法,然后我们用手动创建事件循环,然后调用 run_in_executor这个方法即可成功注册为异步方式,第一个参数为执行的线程或进程,不启用多线程的话我们直接传None即可,然后方法名,方法参数。接下来我们来测试
首先是同步调用的程序执行
import asyncio
def print_(i,y):
time.sleep(i)
print('print_', i,y)
async def asd():
await asyncio.sleep(2)
# l = asyncio.get_running_loop()
# future = l.run_in_executor(None, print_, 3,4)
# await future
print_(3,4)
return 123
async def asd3():
await asyncio.sleep(2)
# l = asyncio.get_running_loop()
# future = l.run_in_executor(None, print_, 3,5)
# await future
print_(3, 4)
return 456
async def main():
task_list = [asyncio.create_task(asd()), asyncio.create_task(asd3())]
done,pending = await asyncio.wait(task_list)
# for i in done:
# print(i.result())
# print(done)
start = time.time()
asyncio.run(main())
print(time.time()-start)
C:\Users\main.py
print_ 3 4
print_ 3 4
8.032764434814453
进程已结束,退出代码0
可以看到,程序耗时八秒,不难理解,异步方法asd和asd2的sleep时间共享,同步模块print_共调用两次,每次耗时三秒,3+3+2=8 .那么我们在看看异步调用的方式耗时
import asyncio
def print_(i,y):
time.sleep(i)
print('print_', i,y)
async def asd():
await asyncio.sleep(2)
l = asyncio.get_running_loop()
future = l.run_in_executor(None, print_, 3,4)
await future
return 123
async def asd3():
await asyncio.sleep(2)
l = asyncio.get_running_loop()
future = l.run_in_executor(None, print_, 3,5)
await future
return 456
async def main():
task_list = [asyncio.create_task(asd()), asyncio.create_task(asd3())]
done,pending = await asyncio.wait(task_list)
# for i in done:
# print(i.result())
# print(done)
start = time.time()
asyncio.run(main())
print(time.time()-start)
C:\Users\main.py
print_print_ 3 5
3 4
5.022708892822266
进程已结束,退出代码0
可以看到,耗时5秒,即print_的sleep时间也共享,3+2=5
补充一:异步调用同步方法结合线程池进程池使用
上面提到过,run_in_executor的第一个参数为executor,那么这个executor是哪来的呢,就是线程池/进程池的。代码如下:
import asyncio
def print_(i,y):
time.sleep(i)
print('print_', i,y)
async def asd():
await asyncio.sleep(2)
return 123
async def asd3():
await asyncio.sleep(2)
return 456
async def main():
task_list = [asyncio.create_task(asd()), asyncio.create_task(asd3())]
done,pending = await asyncio.wait(task_list)
loop = asyncio.get_running_loop()
with concurrent.futures.ThreadPoolExecutor() as pool:
result = await loop.run_in_executor(pool, print_, 3, 4)
# with concurrent.futures.ProcessPoolExecutor() as pool:
# result = await loop.run_in_executor(pool, print_, 3, 4)
# for i in done:
# print(i.result())
# print(done)
start = time.time()
asyncio.run(main())
print(time.time()-start)
补充二:爬虫异步库aiohttp
async def company_detail(path):
url = path
proxy = await self.proxy()
header = {'User-Agent': random.choice(self.user_agent_list),
'Connection': 'close',
}
proxies = f'http://{proxies_user}:{proxies_pass}@{proxy["ip"]}:{proxy["port"]}'
async with aiohttp.ClientSession(headers=header) as session:
async with session.get(url=url, headers=header, proxies=proxies, timeout=20) as response:
if response.status != 200:
logger.error('返回状态码异常,程序异常退出')
raise requests.exceptions.ConnectionError
response_data = await response.text()
data = list_get(re.findall('props = ({.*})', response_data))
# print(data)
return data
async def main()
done = await company_detail()
asyncio.run(main())
和常规的requests的区别
- 获取html requests:response.text aiohttp:response.text()
- 获取二进制内容 requests response.content() aiohttp response.read()
- 设置代理, requests {‘http’: …,‘https’:…} aiohttp ‘http://ip:port’