Python 异步协程：asyncio、async/await、aiohttp

擒贼先擒王

已于 2024-11-25 15:27:53 修改

阅读量3.9k

点赞数 6

分类专栏： Python 文章标签： python 开发语言

于 2019-03-02 22:16:14 首次发布

本文链接：https://blog.csdn.net/freeking101/article/details/85286199

版权

Python 专栏收录该内容

41 篇文章

订阅专栏

Python：使用 Future、asyncio 处理并发：https://blog.csdn.net/sinat_38682860/article/details/105419842
知乎：从 0 到 1，Python 异步编程的演进之路(通过爬虫演示进化之路)：https://zhuanlan.zhihu.com/p/25228075

1、协程 (Coroutines)

消息模型

消息模型 其实早在应用在桌面应用程序中了。一个 GUI 程序的主线程就负责不停地读取消息并处理消息。所有的键盘、鼠标等消息都被发送到GUI程序的消息队列中，然后由GUI程序的主线程处理。

由于GUI 线程处理键盘、鼠标等消息的速度非常快，所以用户感觉不到延迟。某些时候，GUI线程在一个消息处理的过程中遇到问题导致一次消息处理时间过长，此时，用户会感觉到整个GUI程序停止响应了，敲键盘、点鼠标都没有反应。这种情况说明在消息模型中，处理一个消息必须非常迅速，否则，主线程将无法及时处理消息队列中的其他消息，导致程序看上去停止响应。

消息模型是如何解决同步IO 必须等待IO操作这一问题的呢？

在消息处理过程中，当遇到 IO 操作时，代码只负责发出IO请求，不等待IO结果，然后直接结束本轮消息处理，进入下一轮消息处理过程。当IO操作完成后，将收到一条“IO完成”的消息，处理该消息时就可以直接获取IO操作结果。

在 “发出IO请求” 到收到 “IO完成” 的这段时间里，同步IO模型下，主线程只能挂起，但异步IO模型下，主线程并没有休息，而是在消息循环中继续处理其他消息。这样，在异步IO模型下，一个线程就可以同时处理多个IO请求，并且没有切换线程的操作。对于大多数IO密集型的应用程序，使用异步IO将大大提升系统的多任务处理能力。

什么是协程

协程又称 微线程，英文名 Coroutine。是一种运行在用户态的轻量级线程。

协程拥有自己的寄存器上下文和栈。协程在调度切换时，将寄存器上下文和栈保存到其他地方，等切回来的时候。再恢复先前保存的寄存器上下文和栈。因此，协程能保留上一次调用时的状态，当执行时会自动进入上一次调用时的状态。

协程本质上是个单进程，相对于多进程来说，它没有线程上下文切换的开销，没有原子操作锁定及同步的开销，编程模型也非常简单。可以使用协程来实现异步操作，例如在网络爬虫场景下，发出一个请求之后，需要等待一定时间才能得到响应，但其实在这个等待过程中，程序可以干许多其他事情，等得到响应之后再切换回来继续处理，这样可以充分利用CPU和其他资源。

函数的调用：函数在所有语言中都是层级调用。比如： A 调用 B，B 在执行过程中又调用了 C，C 执行完毕返回，B 执行完毕返回，最后是 A 执行完毕。所以子程序即函数的调用是通过栈实现的，一个线程就是执行一个子程序。子程序调用总是一个入口，一次返回，调用顺序是明确的。
协程的调用：协程就是使用 async 修饰的函数。但是在执行过程中，在函数内部可中断，然后转到别的协程函数内执行，在适当的时候再返回来接着执行。协程特点：是一个线程中执行切换。在一个协程函数中正在执行的代码处中断，然后去执行其他协程函数内的代码，但是却 "不是函数调用"

协程比多线程的优势

1. 最大的优势就是协程极高的执行效率。因为 函数切换不是线程间切换，而是由程序自身控制，因此，没有线程切换的开销。
2. 不需要多线程的锁机制。因为只有一个线程，也不存在同时写变量冲突，在协程中控制共享资源不加锁，只需要判断状态就好了，所以执行效率比多线程高很多。

因为协程是一个线程执行，那怎么利用多核CPU呢？

最简单的方法是多进程 + 协程，既充分利用多核，又充分发挥协程的高效率，可获得极高的性能。

示例：验证 "协程是在一个线程中执行"。

通过打印当前线程，可以看到两个 coroutine 是由同一个线程并发执行的。

import threading
import asyncio


async def hello():
    print(f'hello_1 ---> {threading.current_thread()}')
    await asyncio.sleep(5)
    print(f'hello_2 ---> {threading.current_thread()}')


loop = asyncio.get_event_loop()
task_list = [asyncio.ensure_future(hello()), asyncio.ensure_future(hello())]
# task_list = [asyncio.create_task(hello()), asyncio.create_task(hello())]
loop.run_until_complete(asyncio.wait(task_list))
loop.close()

"""
hello_1 ---> <_MainThread(MainThread, started 2332)>
hello_1 ---> <_MainThread(MainThread, started 2332)>
hello_2 ---> <_MainThread(MainThread, started 2332)>
hello_2 ---> <_MainThread(MainThread, started 2332)>
"""

Python 对协程的支持

Python 对协程的支持是通过 generator （生成器）实现的

在 generator 中，我们不但可以通过 for 循环来迭代，还可以不断调用 next() 函数获取由 yield 语句返回的下一个值。但是 Python 的 yield 不但可以返回一个值，它还可以接收调用者发出的参数。

示例：传统的生产者-消费者模型是一个线程写消息，一个线程取消息，通过锁机制控制队列和等待，但一不小心就可能死锁。如果改用协程，生产者生产消息后，直接通过 yield 跳转到消费者开始执行，待消费者执行完毕后，切换回生产者继续生产，效率极高：

def func_consumer():
ret = ''
while True:
n = yield ret
if not n:
return
print('[CONSUMER] Consuming %s...' % n)
ret = '200 OK'

def func_produce(c):
c.send(None)
n = 0
while n < 5:
n = n + 1
print('[PRODUCER] Producing %s...' % n)
r = c.send(n)
print('[PRODUCER] Consumer return: %s' % r)
c.close()

c = func_consumer()
func_produce(c)

可以看到 func_consumer 函数是一个 generator，把一个 consumer 传入 func_produce 后：

首先调用 c.send(None) 启动生成器；
然后，一旦生产了东西，通过 c.send(n) 切换到 consumer 执行；
consumer 通过 yield拿到消息，处理，又通过yield把结果传回；
produce 拿到 consumer 处理的结果，继续生产下一条消息；
produce 决定不生产了，通过 c.close() 关闭 consumer，整个过程结束。

整个流程无锁，由一个线程执行，produce和consumer协作完成任务，所以称为 "协程"，而非线程的抢占式多任务。最后套用 Donald Knuth 的一句话总结协程的特点：子程序就是协程的一种特例。参考源码：https://github.com/michaelliao/learn-python3/blob/master/samples/async/coroutine.py

使用协程 ( 异步函数 )

创建一个协程仅仅只需使用 async / await 关键字，或者使用 @asyncio.coroutine 装饰器。但是@asyncio.coroutine 和 yield from 方式已经被弃用并移除。下面两种方式等价。

import asyncio

# 方式 1
async def ping_server(ip):
pass

# 方式 2：

# @asyncio.coroutine 把一个 generator 标记为 coroutine 类型，

# 然后在 coroutine 内部用 yield from 调用另一个 coroutine 实现异步操作。
@asyncio.coroutine
def load_file(path):
pass

注意：async 和 await 是针对 coroutine 的新语法。使用新语法只需要做两步替换即可：

把 @asyncio.coroutine 替换为 async
把 yield from 替换为 await

上面这两个特殊的函数，在调用时会返回协程对象。熟悉 JavaScript 中 Promise 的同学，可以把这个返回对象当作跟 Promise 差不多。调用他们中的任意一个，实际上并未立即运行，而是返回一个协程对象，然后将其传递到 Eventloop 中，之后再执行。

如何判断一个 函数是不是协程？ asyncio 提供了 asyncio.iscoroutinefunction(func) 方法。
如何判断一个 函数返回的是不是协程对象？ 可以使用 asyncio.iscoroutine(obj) 。

Python 3.5 以前

import asyncio


@asyncio.coroutine
def hello():
    print("Hello world!")
    r = yield from asyncio.sleep(1)
    print("Hello again!")

Python 3.5 以后

为了简化并更好地标识异步 IO，从 Python 3.5 开始引入了新的语法 async 和 await，可以让 coroutine 的代码更简洁易读。

import asyncio


async def hello():
    print("Hello world!")
    r = await asyncio.sleep(1)
    print("Hello again!")

2、Python 协程库

asyncio

asyncio 是 Python 3.4+ 引入的标准库，直接内置了对 异步 IO 的支持。asyncio 官方只实现了比较底层的协议，比如TCP，UDP。所以 HTTP 协议需要借助第三方库，比如 aiohttp 。

asyncio 是使用 async / await 语法开发的协程库，而不是有 asyncio 才能用 async/await，除了asyncio 之外，curio 和 trio 是更加轻量级的替代物，而且也更容易使用。

asyncio 的编程模型就是一个 消息循环。我们从 asyncio 模块中直接获取一个 EventLoop 的引用，然后把需要执行的协程扔到 EventLoop 中执行，就实现了异步IO。

python 用 asyncio 模块实现异步编程，该模块最大特点就是 "只存在一个线程"。由于只有一个线程，就不可能多个任务同时运行。asyncio 是 "多任务合作" 模式（cooperative multitasking），允许异步任务交出执行权给其他任务，等到其他任务完成，再收回执行权继续往下执行

事件循环

asyncio 模块在单线程上启动一个事件循环（event loop），时刻监听新进入循环的事件，加以处理，并不断重复这个过程，直到异步任务结束。

什么是事件循环？

单线程就意味着所有的任务需要在单线程上排队执行，也就是前一个任务没有执行完成，后一个任务就没有办法执行。在CPU密集型的任务之中，这样其实还行，但是如果我们的任务都是IO密集型的呢？也就是我们大部分的任务都是在等待网络的数据返回，等待磁盘文件的数据，这就会造成CPU一直在等待这些任务的完成再去执行下一个任务。

有没有什么办法能够让单线程的任务执行不这么笨呢？其实我们可以将这些需要等待IO设备的任务挂在一边嘛！这时候，如果我们的任务都是需要等待的任务，那么单线程在执行时遇到一个就把它挂起来，这里可以通过一个数据结构（例如队列）将这些处于执行等待状态的任务放进去，为什么是执行等待状态呢？因为它们正在执行但是又不得不等待例如网络数据的返回等等。直到将所有的任务都放进去之后，单线程就可以开始它的接连不断的表演了：有没有任务完成的小伙伴呀！快来我这里执行！

此时如果有某个任务完成了，它会得到结果，于是发出一个信号：我完成了。那边还在循环追问的单线程终于得到了答复，就会去看看这个任务有没有绑定什么回调函数呀？如果绑定了回调函数就进去把回调函数给执行了，如果没有，就将它所在的任务恢复执行，并将结果返回。

asyncio 就是一个协程库

事件循环 (event loop)。事件循环需要实现两个功能，一是顺序执行协程代码；二是完成协程的调度，即一个协程 "暂停" 时，决定接下来执行哪个协程。
协程上下文的切换。基本上Python 生成器的 yeild 已经能完成切换，Python3 中还有特定语法支持协程切换。

注意：不可以直接调用协程，需要一个event loop去调用。

asyncio (高级、低级) API

Python 的异步IO：API。官方文档：https://docs.python.org/zh-cn/3/library/asyncio.html

Python 的 asyncio 是使用 async / await 语法编写并发代码的标准库。Python3.7 这个版本，asyncio 又做了比较大的调整，把这个库的 API 分为了 高层级API 和 低层级API，并引入asyncio.run() 这样的高级方法，让编写异步程序更加简洁。

这里先从全局认识 Python 这个异步IO库。

asyncio 的 高层级 API 主要提高如下几个方面：

并发地运行Python协程并完全控制其执行过程；
执行网络IO和IPC；
控制子进程；
通过队列实现分布式任务；
同步并发代码。

asyncio 的 低层级API 用以支持开发异步库和框架：

创建和管理事件循环（event loop），提供异步的API用于网络，运行子进程，处理操作系统信号等；
通过 transports 实现高效率协议；
通过 async/await 语法桥架基于回调的库和代码。

asyncio 高级 API (任务、流、同步、子进程、队列、异常)

普通写异步IO的应用程序，只需熟悉高级 API，

需要写异步IO的库和框架时，才需要理解低级API。

高级 API 让我们更方便的编写基于 asyncio的应用程序。这些API包括：

（1）协程、任务

使用 async/await 语法来声明 "协程(协程函数)"，然后通过 asyncio.run(coro, *, debug=False) 函数来运行 "协程(协程函数)"，asyncio.run 函数负责管理事件循环并完结异步生成器，被用作asyncio程序的主入口点，相当于main函数，应该只被调用一次。
使用 asyncio.create_task() 可以把一个 "协程(协程函数)"变成一个任务，任务被用于并发调度协程，可用于网络爬虫的并发。打包成任务的协程会自动安排并很快运行。可以使用高层级的 asyncio.create_task() 函数来创建 Task 对象，也可用低层级的 loop.create_task() 或 asyncio.ensure_future() 函数。不建议手动实例化 Task 对象。
"协程，任务、Future" 都是可等待对象。其中 Future 是低层级的可等待对象，表示一个异步操作的最终结果。

示例代码：

import asyncio
import httpx
import datetime

pool_size_limit = httpx.Limits(max_keepalive_connections=300, max_connections=500)


async def fetch(url=None):
    async with httpx.AsyncClient(limits=pool_size_limit) as client:
        resp = await client.get('https://www.example.com/')
        print(resp.status_code)


async def main():
    url = 'https://www.httpbin.org/delay/5'
    task_list = []
    for index in range(100):
        task_list.append(asyncio.create_task(fetch(url)))
    await asyncio.wait(task_list)


if __name__ == '__main__':
    time_1 = datetime.datetime.now()
    asyncio.run(main())
    time_2 = datetime.datetime.now()
    print((time_2 - time_1).seconds)

（2）流

流是用于网络连接的高层级的使用 async/await的原语。流允许在不使用回调或低层级协议和传输的情况下发送和接收数据。异步读写TCP有客户端函数 asyncio.open_connection() 和服务端函数 asyncio.start_server() 。它还支持 Unix Sockets： asyncio.open_unix_connection() 和 asyncio.start_unix_server()。

（3）同步原语

asyncio同步原语的设计类似于threading模块的原语，有两个重要的注意事项：
asyncio原语不是线程安全的，因此它们不应该用于OS线程同步（而是用threading）
这些同步原语的方法不接受超时参数; 使用asyncio.wait_for()函数执行超时操作。
asyncio具有以下基本同步原语：

Lock
Event
Condition
Semaphore
BoundedSemaphore

（4）子进程

asyncio提供了通过 async/await 创建和管理子进程的API。不同于Python标准库的subprocess，asyncio的子进程函数都是异步的，并且提供了多种工具来处理这些函数，这就很容易并行执行和监视多个子进程。创建子进程的方法主要有两个：

coroutine asyncio.create_subprocess_exec()
coroutine asyncio.create_subprocess_shell()

（5）队列

asyncio 队列的设计类似于标准模块queue的类。虽然asyncio队列不是线程安全的，但它们被设计为专门用于 async/await 代码。需要注意的是，asyncio队列的方法没有超时参数，使用 asyncio.wait_for()函数进行超时的队列操作。
因为和标注模块queue的类设计相似，使用起来跟queue无太多差异，只需要在对应的函数前面加 await 即可。asyncio 队列提供了三种不同的队列：

class asyncio.Queue 先进先出队列
class asyncio.PriorityQueue 优先队列
class asyncio.LifoQueue 后进先出队列

（6）异常

asyncio提供了几种异常，它们是：

TimeoutError，
CancelledError，
InvalidStateError，
SendfileNotAvailableError
IncompleteReadError
LimitOverrunError

asyncio 低级 API (事件循环、Futures、传输和协议、策略)

低层级API为编写基于 asyncio 的库和框架提供支持，有意编写异步库和框架的大牛们需要熟悉这些低层级API。主要包括：

（1）事件循环

事件循环是每个asyncio应用程序的核心。事件循环运行异步任务和回调，执行网络IO操作以及运行子进程。

应用程序开发人员通常应该使用高级asyncio函数，例如asyncio.run()，并且很少需要引用循环对象或调用其方法。

Python 3.7 新增了 asyncio.get_running_loop()函数。

（2）Futures

Future对象用于将基于低层级回调的代码与高层级的 async/await 代码进行桥接。
Future表示异步操作的最终结果。不是线程安全的。
Future是一个可等待对象。协程可以等待Future对象，直到它们有结果或异常集，或者直到它们被取消。
通常，Futures用于启用基于低层级回调的代码（例如，在使用asyncio传输实现的协议中）以与高层级 async/await 代码进行互操作。

（3）传输和协议（Transports和Protocols）

Transport 和 Protocol由低层级事件循环使用，比如函数loop.create_connection()。它们使用基于回调的编程风格，并支持网络或IPC协议（如HTTP）的高性能实现。

在最高级别，传输涉及字节的传输方式，而协议确定要传输哪些字节（在某种程度上何时传输）。

换种方式说就是：传输是套接字（或类似的I/O端点）的抽象，而协议是从传输的角度来看的应用程序的抽象。

另一种观点是传输和协议接口共同定义了一个使用网络I/O和进程间I/O的抽象接口。

传输和协议对象之间始终存在1：1的关系：协议调用传输方法来发送数据，而传输调用协议方法来传递已接收的数据。

大多数面向连接的事件循环方法（例如loop.create_connection()）通常接受protocol_factory参数，该参数用于为接受的连接创建Protocol对象，由Transport对象表示。这些方法通常返回（传输，协议）元组。

（4）策略（Policy）

事件循环策略是一个全局的按进程划分的对象，用于控制事件循环的管理。每个事件循环都有一个默认策略，可以使用策略API对其进行更改和自定义。

策略定义了上下文的概念，并根据上下文管理单独的事件循环。默认策略将上下文定义为当前线程。

通过使用自定义事件循环策略，可以自定义get_event_loop()，set_event_loop()和new_event_loop()函数的行为。

（5）平台支持

asyncio模块设计为可移植的，但由于平台的底层架构和功能，某些平台存在细微的差异和限制。在Windows平台，有些是不支持的，比如 loop.create_unix_connection() and loop.create_unix_server()。而Linux和比较新的macOS全部支持。

asyncio.ensure_future、loop.create_task、asyncio.create_task

在 asyncio 模块中，ensure_future() 和 create_task() 都用于创建一个 Future 对象来封装协程对象，并将其加入到事件循环中。主要区别如下：

asyncio.ensure_future(coroutine)：

该函数接受一个协程对象 coroutine 作为参数，并返回一个 Future 对象。
如果 coroutine 已经是 Future 对象，则直接返回该对象，否则将 coroutine 包装为一个新的 Future 对象。
这个函数可以接受任何可等待对象，不仅限于协程对象。
ensure_future() 函数是 Python 3.4 引入的，旨在向后兼容性，因为在 Python 3.4 中没有 create_task() 函数。

loop.create_task(coroutine)：

该方法是一个事件循环（Event Loop）对象的方法，用于创建一个任务（Task）。将协程对象 coroutine 封装为一个任务，并将任务添加到事件循环中。
它和 ensure_future() 函数类似，都是用于将协程对象包装为 Future 对象并加入事件循环中。
只能在事件循环对象上调用，即通过 asyncio.get_event_loop() 或 asyncio.get_running_loop() 获得的事件循环对象。
create_task() 是在 Python 3.7 中引入的，并且它提供了更好的性能和语义。因此推荐使用 create_task() 方法来创建任务，而不是使用 ensure_future()。

asyncio.create_task(coroutine)：

loop.create_task() 和 asyncio.create_task() 都是用于创建任务的方法
asyncio.create_task 是 asyncio 模块中的顶级函数，用于创建一个任务（Task）。它将协程对象 coroutine 封装为一个任务，并返回该任务。可以在任何地方调用，不需要特定的事件循环对象。
asyncio.create_task() 是一个很有用的函数，在爬虫中它可以帮助我们实现大量并发去下载网页。在 Python 3.6中与它对应的是 ensure_future()。

示例：演示 loop.create_task() 和 asyncio.create_task() 如何创建任务：

import asyncio


async def my_task():
    print("Task started")
    await asyncio.sleep(2)
    print("Task completed")


async def main():
    loop = asyncio.get_event_loop()

    # 使用 loop.create_task()
    task1 = loop.create_task(my_task())

    # 使用 asyncio.create_task()
    task2 = asyncio.create_task(my_task())

    # 等待任务完成
    await asyncio.gather(task1, task2)

    # 任务已完成
    print("All tasks have finished")


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

asyncio 动态添加任务

import asyncio


async def task1():
    print("Task 1 started")
    await asyncio.sleep(2)
    print("Task 1 finished")


async def task2():
    print("Task 2 started")
    await asyncio.sleep(1)
    print("Task 2 finished")


async def dynamic_task():
    for i in range(3):
        new_task = asyncio.create_task(task1())  # 创建任务task1并添加到事件循环
        await asyncio.sleep(1)


async def main():
    asyncio.create_task(task2())  # 创建任务task2并添加到事件循环
    await dynamic_task()


if __name__ == "__main__":
    asyncio.run(main())

asyncio 的源代码

打开 asyncio 的源代码，可以发现asyncio中的需要用到的文件如下：

下面的则是接下来要总结的文件

文件	解释
base_events	基础的事件，提供了BaseEventLoop事件
coroutines	提供了封装成协程的类
events	提供了事件的抽象类，比如 BaseEventLoop 继承了 AbstractEventLoop
futures	提供了 Future类
tasks	提供了Task类和相关的方法

Future 类的相关方法如下，设置 future 的例子如下：

import asyncio


async def slow_operation(future):
    await asyncio.sleep(1)  # 睡眠
    future.set_result('Future is done!')  # future设置结果


loop = asyncio.get_event_loop()
future = asyncio.Future()  # 创建future对象
asyncio.ensure_future(slow_operation(future))  # 创建任务
loop.run_until_complete(future)  # 阻塞直到future执行完才停止事件
print(future.result())
loop.close()

run_until_complete 方法在内部通过调用了 future 的 add_done_callback，当执行 future 完毕的时候，就会通知事件。

下面这个例子则是通过使用 future 的 add_done_callback 方法实现和上面例子一样的效果：

import asyncio


async def slow_operation(future):
    await asyncio.sleep(1)
    future.set_result('Future is done!')


def got_result(future):
    print(future.result())
    loop.stop()  # 关闭事件


loop = asyncio.get_event_loop()
future = asyncio.Future()
asyncio.ensure_future(slow_operation(future))
future.add_done_callback(got_result)  # future执行完毕就执行该回调
try:
    loop.run_forever()
finally:
    loop.close()

一旦 slow_operation 函数执行完毕的时候，就会去执行 got_result 函数，里面则调用了关闭事件，所以不用担心事件会一直执行。

task ( Future的一个子类 )

Task类是 Future 的一个子类， Future中的方法在 task中都可以使用，类方法如下：

并行执行三个任务的例子：

import asyncio


async def factorial(name, number):
    f = 1
    for i in range(2, number + 1):
        print("Task %s: Compute factorial(%s)..." % (name, i))
        await asyncio.sleep(1)
        f *= i
    print("Task %s: factorial(%s) = %s" % (name, number, f))


loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.gather(
    factorial("A", 2),
    factorial("B", 3),
    factorial("C", 4),
))
loop.close()

执行结果为

Task A: Compute factorial(2)...Task B: Compute factorial(2)...Task C: Compute factorial(2)...Task A: factorial(2) = 2Task B: Compute factorial(3)...Task C: Compute factorial(3)...Task B: factorial(3) = 6Task C: Compute factorial(4)...Task C: factorial(4) = 24

可以发现，ABC同时执行，直到future执行完毕才退出。

下面一些方法是和task相关的方法

方法	解释
as_completed(fs, *, loop=None, timeout=None)	返回是协程的迭代器
ensure_future(coro_or_future, *, loop=None)	调度执行一个 coroutine object：并且它封装成future。返回任务对象
async(coro_or_future, *, loop=None)	丢弃的方法，推荐使用ensure_future
wrap_future(future, *, loop=None)	Wrap a concurrent.futures.Future object in a Future object.
gather(*coros_or_futures, loop=None, return_exceptions=False)	从给定的协程或者future对象数组中返回future汇总的结果
sleep(delay, result=None, *, loop=None)	创建一个在给定时间（以秒为单位）后完成的协程
shield(arg, *, loop=None)	等待future，屏蔽future被取消
wait(futures, *, loop=None, timeout=None, return_when=ALL_COMPLETED)	等待由序列futures给出的Futures和协程对象完成。协程将被包裹在任务中。返回含两个集合的Future：（done，pending）
wait_for(fut, timeout, *, loop=None)	等待单个Future或coroutine object完成超时。如果超时为None，则阻止直到future完成

3、Python 异步 IO 编程步骤

虽然异步编程没有同步编程的生态那么强大，但是如果有高并发的需求可以试试。

下面是一些比较成熟的异步库

aiohttp：异步 http client/server框架。github地址: https://github.com/aio-libs/aiohttp
sanic：速度更快的类 flask web框架。github地址：https://github.com/channelcat/sanic
uvloop 快速，内嵌于 asyncio 事件循环的库，使用 cython 基于 libuv 实现。github地址: https://github.com/MagicStack/uvloop

创建协程

首先创建一个协程函数：打印一行 "你好"，等待1秒钟后再打印 "大家同好"。

import asyncio


async def say_hi():
    print('你好')
    await asyncio.sleep(1)
    print('大家同好')

asyncio.run(say_hi())

"""
你好
大家同好
"""

say_hi() 函数通过 async 声明为协程函数，较之前的修饰器声明更简洁明了。

在实践过程中，什么样的函数需要用 async 声明为协程函数呢？

就是那些能发挥异步IO性能的函数，比如读写文件、读写网络、读写数据库，这些都是浪费时间的IO操作，把它们协程化、异步化从而提高程序的整体效率（速度）。

say_hi() 函数是通过 asyncio.run()来运行的，而不是直接调用这个函数（协程）。因为，直接调用并不会把它加入调度日程，而只是简单的返回一个协程对象：

print(say_hi()) # <coroutine object say_hi at 0x000001264DB3FCC0>

真正运行一个协程

那么，如何真正运行一个协程呢？

asyncio 提供了三种机制：

（1）asyncio.run() 函数。这是异步程序的主入口，相当于C语言中的 main 函数。
（2）用 await 等待协程。比如上例中的 await asyncio.sleep(1)

再看下面的例子，我们定义了协程say_delay()，在 main() 协程中调用两次，第一次延迟1秒后打印“你好”，第二次延迟2秒后打印 "大家同好"。这样我们通过 await 运行了两个协程。

import asyncio
import datetime


async def say_delay(msg=None, delay=None):
    await asyncio.sleep(delay)
    print(msg)


async def main():
    print(f'begin at {datetime.datetime.now().replace(microsecond=0)}')
    await say_delay('你好', 2)
    await say_delay('大家同好', 1)
    print(f'end at {datetime.datetime.now().replace(microsecond=0)}')

asyncio.run(main())

'''
begin at 2020-12-19 00:55:01
你好
大家同好
end at 2020-12-19 00:55:04
'''

从起止时间可以看出，两个协程是顺序执行的，总共耗时1+2=3秒。

（3）通过 asyncio.create_task() 函数并发运行作为 asyncio 任务（Task）的多个协程。下面，我们用 create_task() 来修改上面的 main() 协程，从而让两个 say_delay() 协程并发运行：

import asyncio
import datetime


async def say_delay(msg=None, delay=None):
    await asyncio.sleep(delay)
    print(msg)


async def main_1():
    task_list = [
        asyncio.create_task(say_delay('你好', 2)),
        asyncio.create_task(say_delay('大家同好', 1))
    ]
    print(f'begin at {datetime.datetime.now().replace(microsecond=0)}')
    for item in task_list:
        await item
    print(f'end at {datetime.datetime.now().replace(microsecond=0)}')


async def main_2():
    task_list = [
        asyncio.create_task(say_delay('你好', 2)),
        asyncio.create_task(say_delay('大家同好', 1))
    ]
    print(f'begin at {datetime.datetime.now().replace(microsecond=0)}')
    # asyncio.create_task 是把协程对象加入到 事件循环中，所以只要碰到 await 就会启动事件循环
    await task_list[0]
    print(f'end at {datetime.datetime.now().replace(microsecond=0)}')


async def main_3():
    task_list = [
        asyncio.create_task(say_delay('你好', 2)),
        asyncio.create_task(say_delay('大家同好', 1))
    ]
    print(f'begin at {datetime.datetime.now().replace(microsecond=0)}')
    # asyncio.create_task 是把协程对象加入到 事件循环中，所以只要碰到 await 就会启动事件循环
    await asyncio.sleep(2)
    print(f'end at {datetime.datetime.now().replace(microsecond=0)}')


asyncio.run(main_1())
print("*" * 50)
asyncio.run(main_2())
print("*" * 50)
asyncio.run(main_3())

从运行结果的起止时间可以看出，两个协程是并发执行的了，总耗时等于最大耗时2秒。

asyncio.create_task() 是一个很有用的函数，在爬虫中它可以帮助我们实现大量并发去下载网页。在 Python 3.6中与它对应的是 ensure_future()。

生产者、消费者

示例 1：

import asyncio


async def consumer(n, q):
    print(f'消费者 {n}: 开始')
    while True:
        print(f'消费者 {n}: 等待任务')
        item = await q.get()
        print(f'消费者 {n}: 获取任务 ---> {item}')
        if item is None:
            # None is the signal to stop.
            q.task_done()
            break
        else:
            await asyncio.sleep(0.01 * item)
            q.task_done()
    print(f'消费者 {n}: 结束')


async def producer(q, num_workers):
    print('生产者: 开始')
    # Add some numbers to the queue to simulate jobs
    for i in range(num_workers * 3):
        await q.put(i)
        print(f'生产者: 添加任务 ---> {i}')
    # Add None entries in the queue
    # to signal the consumers to exit
    print('生产者: 添加 None 到队列, 相当于一个停止信号')
    for i in range(num_workers):
        await q.put(None)
    print('生产者: 等待队列为空')
    await q.join()
    print('生产者: 结束')


async def main(num_consumers=1):
    q = asyncio.Queue(maxsize=num_consumers)
    consumer_list = [
        asyncio.create_task(consumer(i, q)) for i in range(num_consumers)
    ]
    produce_list = [asyncio.create_task(producer(q, num_consumers))]
    task_list = consumer_list + produce_list
    await asyncio.wait(task_list)


if __name__ == '__main__':
    asyncio.run(main(num_consumers=3))
    pass

示例 2：

import asyncio


async def print_hello():
    while True:
        print("hello")
        await asyncio.sleep(1)  # 协程暂停1秒


async def print_goodbye():
    while True:
        print("bye bye")
        await asyncio.sleep(2)  # 协程暂停2秒


# 创建协程对象
co1 = print_hello()
co2 = print_goodbye()
task_list = [co1, co2]
# 获取事件循环
loop = asyncio.get_event_loop()  # epoll
loop.run_until_complete(asyncio.gather(co1, co2))  # 监听事件循环
# loop.run_until_complete(asyncio.gather(*task_list))  # 监听事件循环

示例 3：

import time
import asyncio


async def producer(event):
    n = 0
    while True:
        print("Running producer...")
        await asyncio.sleep(0.5)
        n += 1
        if n == 2:
            event.set()
            break


async def consumer(event):
    await event.wait()
    print("Running consumer...")
    await asyncio.sleep(0.5)


async def main():
    event = asyncio.Event()
    tasks = [asyncio.create_task(producer(event))] + [
        asyncio.create_task(consumer(event)) for _ in range(3)
    ]

    await asyncio.gather(*tasks)


while True:
    asyncio.run(main())
    print("\nSleeping for 1 sec...\n")
    time.sleep(1)

示例 4：

import asyncio
import random


async def cro_scheduler():
    page = 1
    while True:
        url = f'https://www.xxx.com/{page}'
        asyncio.create_task(cron_job(url))  # 创建新任务并注册到事件循环，和当前协程并发
        await asyncio.sleep(0)  # 这里不是阻塞，而是主动让度线程，可以让job打印日志
        page += 1


async def cron_job(url):
    tick = random.randint(1, 3)  # 模拟下载延迟
    await asyncio.sleep(tick)  # 阻塞协程，模拟下载
    print("下载结束：", url)


if __name__ == '__main__':
    asyncio.run(cro_scheduler())

示例 5：

import asyncio
import random
# import uvloop  # makes asyncio 2-4 times faster
# asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())


# Async
async def produce(queue, n):
    for x in range(1, n + 1):
        # produce an item
        print('producing {}/{}'.format(x, n))
        # simulate i/o operation using sleep
        await asyncio.sleep(random.random())
        item = str(x)
        # put the item in the queue
        await queue.put(item)

    # indicate the producer is done
    await queue.put(None)


async def consume(queue):
    while True:
        # wait for an item from the producer
        item = await queue.get()
        if item is None:
            # the producer emits None to indicate that it is done
            break

        # process the item
        print('consuming item {}...'.format(item))
        # simulate i/o operation using sleep
        await asyncio.sleep(random.random())


def main_1():
    loop = asyncio.get_event_loop()
    queue = asyncio.Queue()
    producer_coro = produce(queue, 10)
    consumer_coro = consume(queue)
    loop.run_until_complete(asyncio.gather(producer_coro, consumer_coro))
    loop.close()


async def temp():
    queue = asyncio.Queue()
    producer_coro = produce(queue, 10)
    consumer_coro = consume(queue)
    await asyncio.gather(*(producer_coro, consumer_coro))


def main_2():
    asyncio.run(temp())


if __name__ == '__main__':
    main_1()
    print("*" * 100)
    main_2()

示例：

import time
import random
import asyncio
import math

"""
判断一个数是否为素数
"""


def is_prime(num: int):
    if num == 2 or num == 3:
        return True
    if num % 6 != 1 and num % 6 != 5:
        return False
    for i in range(5, int(math.sqrt(num)) + 1, 6):
        if num % i == 0 or num % (i + 2) == 0:
            return False
    return True


def big_number():
    # 生成大于20亿的随机数，上限自定义
    return random.randint(2 * 10 ** 10, 2 * 10 ** 15)


class ProducerConsumerModel(object):
    def __init__(self, c_num=1, p_num=1, size=1000000, is_print=False):
        """
        生产者消费者模型
        :param c_num: 消费者个数
        :param p_num: 生产者个数
        :param size: 需要处理的数据大小
        :param is_print: 是否打印日志
        """
        self.consumer_num = c_num
        self.producer_num = p_num
        self.size = size
        self.print_log = is_print

    async def consumer(self, buffer, name):
        for _ in iter(int, 1):  # 死循环，秀一波python黑魔法
            try:
                # 从缓冲区取数，如果超过设定时间取不到数则证明协程任务结束
                value = await asyncio.wait_for(buffer.get(), timeout=0.5)
                if is_prime(value):
                    if self.print_log:
                        print('[{}]{} is Prime'.format(name, value))
                else:
                    if self.print_log:
                        print('[{}]{} is not Prime'.format(name, value))
            except asyncio.TimeoutError:
                break
            await asyncio.sleep(0)

    async def producer(self, buffer, name):
        for i in range(self.size // self.producer_num):  # 将处理数据总数按生产者个数进行切分
            big_num = big_number()  # 生成大随机数
            await buffer.put(big_num)  # 放入缓冲区
            if self.print_log:
                print('[{}] {} is Produced'.format(name, big_num))
            await asyncio.sleep(0)

    async def main(self):
        buffer = asyncio.Queue()  # 定义缓冲区
        worker_list = []  # 工作列表
        # 将生成者和消费者都加入工作列表
        for i in range(self.consumer_num):
            # 给消费者传入公共缓冲区和该消费者名字
            worker_list.append(asyncio.create_task(self.consumer(buffer, 'Consumer' + str(i + 1))))
        for i in range(self.producer_num):
            # 给消费者传入公共缓冲区和该消费者名字
            worker_list.append(asyncio.create_task(self.producer(buffer, 'Producer' + str(i + 1))))

        # 打工人开始上班了
        await asyncio.gather(*worker_list)
        # await asyncio.wait(worker_list)


if __name__ == '__main__':
    start_time = time.perf_counter()  # 时间计数
    pc_model = ProducerConsumerModel(c_num=2, p_num=2, size=100, is_print=True)
    asyncio.run(pc_model.main())  # 开启协程服务
    end_time = time.perf_counter()
    print("此次程序耗时：【{:.3f}】秒 ".format(end_time - start_time))

可等待对象（awaitables）

可等待对象，就是可以在 await 表达式中使用的对象，前面我们已经接触了两种可等待对象的类型：协程和任务，还有一个是低层级的 Future。

asyncio 模块的许多 API 都需要传入可等待对象，比如 run(), create_task() 等等。

（1）协程

协程是可等待对象，可以在其它协程中被等待。

（2）任务

当一个协程通过 asyncio.create_task() 被打包为一个任务，该协程将自动加入调度队列中，但是还未执行。

create_task() 的基本使用前面例子已经讲过。它返回的 task 通过 await 来等待其运行完。如果，我们不等待，会发生什么？“准备立即运行”又该如何理解呢？先看看下面这个例子：

运行这段代码的情况是这样的：首先，1秒钟后打印一行，这是第13，14行代码运行的结果：

calling:0, now is 09:15:15

接着，停顿1秒后，连续打印4行：

calling:1, now is 09:15:16
calling:2, now is 09:15:16
calling:3, now is 09:15:16
calling:4, now is 09:15:16

从这个结果看，asyncio.create_task()产生的4个任务，我们并没有 await，它们也执行了。关键在于第18行的 await，如果把这一行去掉或是 sleep 的时间小于1秒（比whattime()里面的sleep时间少即可），就会只看到第一行的输出结果而看不到后面四行的输出。这是因为，main() 不 sleep 或 sleep 少于1秒钟，main() 就在 whattime() 还未来得及打印结果（因为，它要sleep 1秒）就退出了，从而整个程序也退出了，就没有 whattime() 的输出结果。

再来理解一下 "准备立即执行" 这个说法。它的意思就是，create_task() 只是打包了协程并加入调度队列还未执行，并准备立即执行，什么时候执行呢？在 "主协程" 挂起的时候，这里的“挂起”有两个方式：

一是，通过 await task 来执行这个任务；
另一个是，主协程通过 await sleep 挂起，事件循环就去执行task了。

我们知道，asyncio 是通过事件循环实现异步的。在主协程 main()里面，没有遇到 await 时，事件就是执行 main() 函数，遇到 await 时，事件循环就去执行别的协程，即 create_task() 生成的 whattime() 的4个任务，这些任务一开始就是 await sleep 1秒。这时候，主协程和4个任务协程都挂起了，CPU空闲，事件循环等待协程的消息。

如果 main() 协程只 sleep了 0.1秒，它就先醒了，给事件循环发消息，事件循环就来继续执行 main() 协程，而 main() 后面已经没有代码，就退出该协程，退出它也就意味着整个程序退出，4个任务就没机会打印结果；

如果 main()协程sleep时间多余1秒，那么4个任务先唤醒，就会得到全部的打印结果；

如果main()的18行sleep等于1秒时，和4个任务的sleep时间相同，也会得到全部打印结果。这是为什么呢？

我猜想是这样的：4个任务生成在前，第18行的sleep在后，事件循环的消息响应可能有个先进先出的顺序。后面深入asyncio的代码专门研究一下这个猜想正确与否。

示例：

# -*- coding: utf-8 -*-

"""
@File    : aio_test.py
@Author  : XXX
@Time    : 2020/12/25 23:54
"""

import asyncio
import datetime


async def hi(msg=None, sec=None):
    print(f'enter hi(), {msg} @{datetime.datetime.now().replace(microsecond=0)}')
    await asyncio.sleep(sec)
    print(f'leave hi(), {msg} @{datetime.datetime.now().replace(microsecond=0)}')
    return sec


async def main_1():
    print(f'main() begin at {datetime.datetime.now().replace(microsecond=0)}')
    task_list = [asyncio.create_task(hi(i, i)) for i in range(5, -1, -1)]
    for task in task_list:
        ret_val = await task
        print(f'ret_val:{ret_val}')
    print(f'main() end at {datetime.datetime.now().replace(microsecond=0)}')


async def main_2():
    # *****  注意：main_2 中睡眠了2秒，导致睡眠时间大于2秒的协程没有执行完成 *****
    print(f'main() begin at {datetime.datetime.now().replace(microsecond=0)}')
    task_list = [asyncio.create_task(hi(i, i)) for i in range(5, -1, -1)]
    await asyncio.sleep(2)
    print(f'main() end at {datetime.datetime.now().replace(microsecond=0)}')


async def main_2_1():
    # 改进。防止因主线程执行完毕，从而导致协程没有执行而直接推出
    print(f'main() begin at {datetime.datetime.now().replace(microsecond=0)}')
    task_list = [asyncio.create_task(hi(i, i)) for i in range(5, -1, -1)]
    await asyncio.wait(task_list)
    print(f'main() end at {datetime.datetime.now().replace(microsecond=0)}')


async def main_3():
    # *****  注意：main_3方法并没有实现并发执行，只是顺序执行 *****
    print(f'main() begin at {datetime.datetime.now().replace(microsecond=0)}')
    tasks = []
    for i in range(1, 5):
        tsk = asyncio.create_task(hi(i, i))
        await tsk
    print(f'main() end at {datetime.datetime.now().replace(microsecond=0)}')


print('*' * 50)
asyncio.run(main_1())
print('*' * 50)
asyncio.run(main_2())
asyncio.run(main_2_1())
print('*' * 50)
asyncio.run(main_3())
print('*' * 50)

await asyncio.wait(tasks) # await asyncio.gather(*tasks) 也可以

dones, pendings = await asyncio.wait(tasks)

不使用 asyncio 的消息循环让协程运行

先看下 不使用 asyncio 的消息循环 怎么调用协程，让协程运行：

async def func_1():
    print("func_1 start")
    print("func_1 end")


async def func_2():
    print("func_2 start")
    print("func_2 a")
    print("func_2 b")
    print("func_2 c")
    print("func_2 end")


f_1 = func_1()
print(f_1)

f_2 = func_2()
print(f_2)


try:
    print('f_1.send')
    f_1.send(None)
except StopIteration as e:
    # 这里也是需要去捕获StopIteration方法
    pass

try:
    print('f_2.send')
    f_2.send(None)
except StopIteration as e:
    pass

运行结果：

<coroutine object func_1 at 0x0000020121A07C40>
<coroutine object func_2 at 0x0000020121B703C0>
f_1.send
func_1 start
func_1 end
f_2.send
func_2 start
func_2 a
func_2 b
func_2 c
func_2 end

示例代码2：

async def test(x):
    return x * 2

print(test(100))

try:
    # 既然是协程，我们像之前yield协程那样
    test(100).send(None)
except BaseException as e:
    print(type(e))
    ret_val = e.value
    print(ret_val)

示例代码3：

def simple_coroutine():
    print('-> start')
    x = yield
    print('-> recived', x)


sc = simple_coroutine()

next(sc)

try:
    sc.send('zhexiao')
except BaseException as e:
    print(e)

对上述例子的分析：yield 的右边没有表达式，所以这里默认产出的值是None。刚开始先调用了next(...)是因为这个时候生成器还没有启动，没有停在yield那里，这个时候也是无法通过send发送数据。所以当我们通过 next(...)激活协程后，程序就会运行到x = yield，这里有个问题我们需要注意， x = yield这个表达式的计算过程是先计算等号右边的内容，然后在进行赋值，所以当激活生成器后，程序会停在yield这里，但并没有给x赋值。当我们调用 send 方法后 yield 会收到这个值并赋值给 x，而当程序运行到协程定义体的末尾时和用生成器的时候一样会抛出StopIteration异常

如果协程没有通过 next(...) 激活(同样我们可以通过send(None)的方式激活)，但是我们直接send，会提示如下错误：

最先调用 next(sc) 函数这一步通常称为“预激”（prime）协程（即，让协程执行到第一个 yield 表达式，准备好作为活跃的协程使用）。

协程在运行过程中有四个状态：

GEN_CREATE: 等待开始执行
GEN_RUNNING: 解释器正在执行，这个状态一般看不到
GEN_SUSPENDED: 在yield表达式处暂停
GEN_CLOSED: 执行结束

通过下面例子来查看协程的状态：

示例代码4：（使用协程计算移动平均值）

def averager():
    total = 0.0
    count = 0
    avg = None

    while True:
        num = yield avg
        total += num
        count += 1
        avg = total / count


# run
ag = averager()
# 预激协程
print(next(ag))  # None

print(ag.send(10))  # 10
print(ag.send(20))  # 15

这里是一个死循环，只要不停 send 值给协程，可以一直计算下去。

解释：

1. 调用 next(ag) 函数后，协程会向前执行到 yield 表达式，产出 average 变量的初始值 None。
2. 此时，协程在 yield 表达式处暂停。
3. 使用 send() 激活协程，把发送的值赋给 num，并计算出 avg 的值。
4. 使用 print 打印出 yield 返回的数据。

单步调试上面程序。

使用 asyncio 的消息循环让协程运行

使用 asyncio 异步 IO 调用协程

示例代码 1：

import asyncio


async def func_1():
    print("func_1 start")
    print("func_1 end")
    # await asyncio.sleep(1)


async def func_2():
    print("func_2 start")
    print("func_2 a")
    print("func_2 b")
    print("func_2 c")
    print("func_2 end")
    # await asyncio.sleep(1)


f_1 = func_1()
print(f_1)

f_2 = func_2()
print(f_2)


# 获取 EventLoop:
loop = asyncio.get_event_loop()
tasks = [func_1(), func_2()]

# 执行 coroutine
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

示例代码 2：

import asyncio
import time

start = time.time()


def tic():
    return 'at %1.1f seconds' % (time.time() - start)


async def gr1():
    # Busy waits for a second, but we don't want to stick around...
    print('gr1 started work: {}'.format(tic()))
    # 暂停两秒，但不阻塞时间循环，下同
    await asyncio.sleep(2)
    print('gr1 ended work: {}'.format(tic()))


async def gr2():
    # Busy waits for a second, but we don't want to stick around...
    print('gr2 started work: {}'.format(tic()))
    await asyncio.sleep(2)
    print('gr2 Ended work: {}'.format(tic()))


async def gr3():
    print("Let's do some stuff while the coroutines are blocked, {}".format(tic()))
    await asyncio.sleep(1)
    print("Done!")

# 事件循环
ioloop = asyncio.get_event_loop()

# tasks中也可以使用 asyncio.ensure_future(gr1())..
tasks = [
    ioloop.create_task(gr1()),
    ioloop.create_task(gr2()),
    ioloop.create_task(gr3())
]
ioloop.run_until_complete(asyncio.wait(tasks))
ioloop.close()


"""
结果：
gr1 started work: at 0.0 seconds
gr2 started work: at 0.0 seconds
Let's do some stuff while the coroutines are blocked, at 0.0 seconds
Done!
gr2 Ended work: at 2.0 seconds
gr1 ended work: at 2.0 seconds
"""

多个协程任务并发

asyncio.wait() 和 asyncio.gather() 都是用于等待多个协程任务完成的方法。

asyncio.wait(tasks, *, loop=None, timeout=None, return_when=ALL_COMPLETED)：

asyncio.wait() 接收一个可迭代对象 tasks，其中每个元素都是一个协程任务（Task）。
它返回一个由 (done, pending) 组成的元组，分别表示已完成和未完成的任务集合。这些任务集合是一个 set 类型的对象。"无序返回"
可以使用 return_when 参数指定返回的条件，默认为 ALL_COMPLETED，表示所有任务都完成后才返回。

asyncio.gather(*coroutines_or_futures, loop=None, return_exceptions=False)：

asyncio.gather() 接收多个协程对象或 Future 对象作为参数。
它返回一个包含所有协程或 Future 结果的列表，保持与参数顺序相对应。就是"有序返回"
如果指定了 return_exceptions=True，则在异常发生时，不会引发异常，而是将异常添加到结果列表中；如果为 False（默认），则遇到异常时会立即引发异常并终止运行。

ioloop.run_until_complete

方法的参数是一个 future 或协程。如果是协程，run_until_complete方法与 wait 函数一样，把协程包装进一个 Task 对象中。

示例：

import asyncio
import time
import aiohttp
import async_timeout


headers = {
    'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6)'
}

url_list = [f'https://www.xxx.com.cn/{i}.html' for i in range(5400, 5500)]


async def fetch(session, url):
    with async_timeout.timeout(10):
        async with session.get(url) as response:
            return response.status


async def main(url):
    async with aiohttp.ClientSession() as session:
        status = await fetch(session, url)
        return status


if __name__ == '__main__':
    start = time.time()
    loop = asyncio.get_event_loop()
    task_list = [main(url) for url in url_list]
    # 返回一个列表,内容为各个tasks的返回值
    status_list = loop.run_until_complete(asyncio.gather(*task_list))
    # status_list = loop.run_until_complete(asyncio.wait(task_list))
    print(len([status for status in status_list if status == 200]))
    end = time.time()
    print("cost time:", end - start)

一个案例

编程语言排行榜（Tiobe、Github、Stack Overflow、IEEE Spectrum、PYPL）

先从 Reddit 抓取 JSON 数据，解析它，然后打印出当天来自 /r/python，/r/programming 和 /r/C++ 的置顶帖。

所示的第一个方法 get_json() ，由 get_reddit_top() 调用，然后只创建一个 GET 请求到适当的网址。当这个方法和 await 一起调用后， Eventloop 便能够继续为其他的协程服务，同时等待 HTTP 响应达到。一旦响应完成， JSON 数据就返回到 get_reddit_top() ，得到解析并打印出来。

import signal
import sys
import asyncio
import aiohttp
import json

global_loop = asyncio.get_event_loop()
global_client = aiohttp.ClientSession(loop=global_loop)


async def get_json(client, url):
    async with client.get(url) as response:
        assert response.status == 200
        return await response.read()


async def get_reddit_top(subreddit, client):
    r_url = f'https://www.reddit.com/r/{subreddit}/top.json?sort=top&t=day&limit=5'
    data1 = await get_json(client, r_url)

    j = json.loads(data1.decode('utf-8'))
    for i in j['data']['children']:
        score = i['data']['score']
        title = i['data']['title']
        link = i['data']['url']
        print(str(score) + ': ' + title + ' (' + link + ')')

    print('DONE:', subreddit + '\n')


def signal_handler(arg_signal, arg_frame):
    global_loop.stop()
    global_client.close()
    sys.exit(0)


signal.signal(signal.SIGINT, signal_handler)

asyncio.ensure_future(get_reddit_top('python', global_client))
asyncio.ensure_future(get_reddit_top('programming', global_client))
asyncio.ensure_future(get_reddit_top('C++', global_client))
global_loop.run_forever()

多次运行这段代码，打印出来的 subreddit 数据在顺序上会有些许变化。这是因为每调用一次代码都会释放对线程的控制，容许线程去处理另一个 HTTP 调用。这将导致谁先获得响应，谁就先打印出来。

目标, 明朝那些事儿 http://www.mingchaonaxieshier.com/

import asyncio
import aiohttp
import aiofiles
import requests
from lxml import etree
import os


def get_chapter_info(url):
    resp = requests.get(url)
    resp.encoding = 'utf-8'
    page_source = resp.text
    resp.close()

    result = []

    # 解析page_soruce
    tree = etree.HTML(page_source)
    mulus = tree.xpath("//div[@class='main']/div[@class='bg']/div[@class='mulu']")
    for mulu in mulus:
        trs = mulu.xpath("./center/table/tr")
        title = trs[0].xpath(".//text()")
        chapter_name = "".join(title).strip()

        chapter_hrefs = []
        for tr in trs[1:]:  # 循环内容
            hrefs = tr.xpath("./td/a/@href")
            chapter_hrefs.extend(hrefs)

        result.append(
            {"chapter_name": chapter_name, "chapter_hrefs": chapter_hrefs}
        )

    return result


async def download_one(name, href):
    async with aiohttp.ClientSession() as session:
        async with session.get(href) as resp:
            hm = await resp.text(encoding="utf-8", errors="ignore")
            # 处理hm
            tree = etree.HTML(hm)
            title = tree.xpath("//div[@class='main']/h1/text()")[0].strip()
            content_list = tree.xpath("//div[@class='main']/div[@class='content']/p/text()")
            content = "\n".join(content_list).strip()
            async with aiofiles.open(f"{name}/{title}.txt", mode="w", encoding="utf-8") as f:
                await f.write(content)

    print(title)

# 方案一
async def download_chapter(chapter):
    chapter_name = chapter['chapter_name']

    if not os.path.exists(chapter_name):
        os.makedirs(chapter_name)
    tasks = []
    for href in chapter['chapter_hrefs']:
        tasks.append(asyncio.create_task(download_one(chapter_name, href)))
    await asyncio.wait(tasks)


# 方案二
async def download_all(chapter_info):
    tasks = []
    for chapter in chapter_info:
        name = chapter['chapter_name']
        if not os.path.exists(name):
            os.makedirs(name)
        for url in chapter['chapter_hrefs']:
            task = asyncio.create_task(download_one(name, url))
            tasks.append(task)

    await asyncio.wait(tasks)


def main():
    url = "http://www.mingchaonaxieshier.com/"
    # 获取每一篇文章的名称和url地址
    chapter_info = get_chapter_info(url)

    # 可以分开写. 也可以合起来写.
    # 方案一，分开写:
    # for chapter in chapter_info:
    #     asyncio.run(download_chapter(chapter))

    # 方案e，合起来下载:
    asyncio.run(download_all(chapter_info))


if __name__ == '__main__':
    main()

示例：

import asyncio

import aiohttp  # pip install aiohttp   => requests
import aiofiles  # pip install aiofiles   => open

async def download(url):
    print("我要开始下载了", url)
    file_name = url.split("/")[-1]
    # 我要发送请求
    # 如果with后面用的是一个异步的包. 那么绝大多数这里前面要加async
    async with aiohttp.ClientSession() as session:  # 理解: session = requests.session()
        async with session.get(url) as resp:  # 理解: resp = session.get()
            # 等待服务器返回结果了????
            # 页面源代码
            # page_source = await resp.text(encoding="utf-8")
            # 需要json
            # dic = await resp.json()
            # 字节
            content = await resp.content.read()
            # 有了结果要干嘛??
            # 在异步协程中. 可以用同步代码
            # open()  # 慢
            # with open(file_name, mode="wb") as f:
            #     f.write(content)
            async with aiofiles.open(file_name, mode="wb") as f:
                await f.write(content)

    print("一张图下载完毕!")


async def main():
    urls = [
        "https://www.xiurenji.vip/uploadfile/202110/20/1F214426892.jpg",
        "https://www.xiurenji.vip/uploadfile/202110/20/91214426753.jpg"
    ]
    tasks = []
    for url in urls:
        tasks.append(asyncio.create_task(download(url)))
    await asyncio.wait(tasks)


if __name__ == '__main__':
    # asyncio.run(main())
    event_loop = asyncio.get_event_loop()
    event_loop.run_until_complete(main())

4、aiohttp 使用示例

官网文档

安装 aiohttp：pip install aiohttp

asyncio 实现了TCP、UDP、SSL等协议，aiohttp则是基于asyncio实现的 HTTP 框架。

github 地址：https://github.com/aio-libs/aiohttp

官网文档：https://docs.aiohttp.org/en/stable/

client

Client

为什么要使用 client ？

如果使用顶级 API 发出请求时，会为每个请求建立新连接（不会重复使用连接）。随着对主机的请求数量的增加，这很快就会变得低效。最好的用法：每个应用程序都需要一个会话来一起执行所有请求。更复杂的情况可能需要每个站点一个会话，例如一个用于Github，另一个用于Facebook api。无论如何，为每个请求创建一个会话是一个非常糟糕的主意。
"Client实例" 使用 HTTP 连接池。这意味着，当您向同一主机发出多个请求时，将重用基础 TCP 连接，而不是为每个请求重新创建一个。

与使用顶级 API 相比，这可以带来显著的性能改进，包括：

减少了跨请求的延迟（无握手）。
减少了 CPU 使用率和往返。
减少网络拥塞。

官网示例：client 示例

import aiohttp
import asyncio


async def main():
    async with aiohttp.ClientSession() as session:
        async with session.get('http://python.org') as response:
            print("Status:", response.status)
            print("Content-type:", response.headers['content-type'])

            html = await response.text()
            print("Body:", html[:15], "...")


asyncio.run(main())

一般情况下只需要创建一个 session，然后使用这个 session 执行所有的请求。

import asyncio
import aiohttp


async def download(cs=None, url=None, name=None):
    async with cs.get(url) as resp:
        with open(name, mode='w', encoding='utf-8') as f:
            f.write(await resp.text())


async def main_1():
    url_map = {
        'baidu': "https://www.baidu.com",
        'bilibili': "https://www.bilibili.com",
        '163': "https://www.163.com"
    }
    async with aiohttp.ClientSession() as cs:
        tasks = [asyncio.create_task(download(cs, v, k)) for k, v in url_map.items()]
        await asyncio.wait(tasks)


async def main_2():
    url_map = {
        'baidu': "https://www.baidu.com",
        'bilibili': "https://www.bilibili.com",
        '163': "https://www.163.com"
    }
    cs = aiohttp.ClientSession()
    tasks = [asyncio.create_task(download(cs, v, k)) for k, v in url_map.items()]
    await asyncio.wait(tasks)
    await cs.close()


if __name__ == "__main__":
    asyncio.run(main_1())
    # asyncio.run(main_2())

自定义 cookies 应该放在 ClientSession中，而不是 session.get() 中
自定义的 headers 跟正常的 requests 一样放在 session.get() 中
默认响应时间为5分钟，通过 timeout 可以重新设定，其放在session.get()中
代理也是在 session.get() 中配置
禁用 SSL 验证
async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(verify_ssl=False)) as session:

每个请求创建一个 aiohttp.ClientSession()，随着对主机的请求数量的增加，这很快就会变得低效

import asyncio
import aiohttp


async def download(url, name):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            with open(name, mode='w', encoding='utf-8') as f:
                f.write(await resp.text())


async def main():
    url_map = {
        'baidu': "https://www.baidu.com",
        'bilibili': "https://www.bilibili.com",
        '163': "https://www.163.com"
    }
    tasks = [asyncio.create_task(download(v, k)) for k, v in url_map.items()]
    await asyncio.wait(tasks)


if __name__ == "__main__":
    asyncio.run(main())

顶级 API 示例 ( 不推荐 )：aiohttp.request。随着对主机的请求数量的增加，这很快就会变得低效

import asyncio
import aiohttp


async def aiohttp_requests(url):  # aiohttp的requests函数
    async with aiohttp.request("GET", url=url) as response:
        return await response.text(encoding='UTF-8')


async def main():  # 主函数用于异步函数的启动
    url = 'https://www.baidu.com'
    html = await aiohttp_requests(url)  # await修饰异步函数
    print(html)


if __name__ == '__main__':
    asyncio.run(main())

server

Server

官网示例：server 示例

from aiohttp import web


async def handle(request):
    name = request.match_info.get('name', "Anonymous")
    text = "Hello, " + name
    return web.Response(text=text)


app = web.Application()
app.add_routes([web.get('/', handle),
                web.get('/{name}', handle)])

if __name__ == '__main__':
    web.run_app(app)

示例：编写一个HTTP服务器，分别处理对应URL：

import asyncio

from aiohttp import web


async def index(request):
    await asyncio.sleep(0.5)
    return web.Response(body=b'<h1>Index</h1>')


async def hello(request):
    await asyncio.sleep(0.5)
    text = '<h1>hello, %s!</h1>' % request.match_info['name']
    return web.Response(body=text.encode('utf-8'))


async def init(loop):
    app = web.Application(loop=loop)
    app.router.add_route('GET', '/', index)
    app.router.add_route('GET', '/hello/{name}', hello)
    srv = await loop.create_server(app.make_handler(), '127.0.0.1', 8000)
    print('Server started at http://127.0.0.1:8000...')
    return srv


loop = asyncio.get_event_loop()
loop.run_until_complete(init(loop))
loop.run_forever()

Utilities

Utilities

FAQ

Miscellaneous

Miscellaneous

Who uses aiohttp?

requests + ThreadPoolExecutor、aiohttp 对比

import requests
import timeit
from concurrent.futures import ThreadPoolExecutor
import aiohttp
import asyncio

session = requests.session()
url = "https://www.baidu.com"

request_count = 50


def req(url: str):
    resp = requests.get(url)
    if 200 != resp.status_code:
        print(f'status_code: {resp.status_code}')


def requests_test():
    """
    第一组:循环的方式
    :return:
    """
    for i in range(request_count):
        req(url)


def pool_requests_test():
    """
    第二组:线程池的方式
    :return:
    """
    url_list = [url for _ in range(request_count)]
    with ThreadPoolExecutor(max_workers=20) as pool:
        pool.map(req, url_list)


async def fetch(url: str):
    async with aiohttp.TCPConnector(ssl=False) as tc:
        async with aiohttp.ClientSession(connector=tc) as session:
            async with session.get(url) as resp:
                if 200 != resp.status:
                    print(f'status_code: {resp.status}')


async def start():
    tasks = [asyncio.create_task(fetch(url)) for _ in range(request_count)]
    await asyncio.wait(tasks)


def aiohttp_test():
    """
    第三组:aiohttp 的方式
    :param url:
    :return:
    """
    asyncio.run(start())


if __name__ == '__main__':
    # 循环的
    print(timeit.timeit(stmt=requests_test, number=1))
    # 使用线程池的
    print(timeit.timeit(stmt=pool_requests_test, number=1))
    # 使用 aiohttp 的
    print(timeit.timeit(stmt=aiohttp_test, number=1))

asyncio.queue 的使用

import asyncio
import aiohttp

template = 'http://exercise.kingname.info/exercise_middleware_ip/{page}'


async def get(session, queue):
    while True:
        try:
            page = queue.get_nowait()
        except asyncio.QueueEmpty:
            return
        url = template.format(page=page)
        resp = await session.get(url)
        print(f'session id ---> {id(session)}')
        print(await resp.text(encoding='utf-8'))


async def main():
    async with aiohttp.ClientSession() as session:
        queue = asyncio.Queue()
        for page in range(1000):
            queue.put_nowait(page)
        tasks = []
        for _ in range(100):
            task = get(session, queue)
            tasks.append(task)
        await asyncio.wait(tasks)


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

aiohttp 与 aiomultiprocess (异步多线程)

import asyncio
import aiohttp
import time
from aiomultiprocess import Pool

start = time.time()


async def get(url):
    session = aiohttp.ClientSession()
    response = await session.get(url)
    result = await response.text()
    session.close()
    return result


async def main():
    url = 'http://127.0.0.1:5000'
    urls = [url for _ in range(100)]
    async with Pool() as pool:
        result = await pool.map(get, urls)
        return result


if __name__ == '__main__':
    coroutine = main()
    task = asyncio.ensure_future(coroutine)
    loop = asyncio.get_event_loop()
    loop.run_until_complete(task)
    end = time.time()
    print('Cost time:', end - start)

在子进程中执行协程

import asyncio
from aiohttp import request
from aiomultiprocess import Process


async def put(url, params):
    async with request("PUT", url, params=params) as response:
        pass


async def main():
    p = Process(target=put, args=("https://jreese.sh", {}))
    await p


if __name__ == "__main__":
    asyncio.run(main())

如果您想从协程中获取结果Worker，请使用以下方法：

import asyncio
from aiohttp import request
from aiomultiprocess import Worker


async def get(url):
    async with request("GET", url) as response:
        return await response.text("utf-8")


async def main():
    p = Worker(target=get, args=("https://jreese.sh",))
    response = await p


if __name__ == "__main__":
    asyncio.run(main())

如果您需要一个托管的工作进程池，请使用Pool：

import asyncio
from aiohttp import request
from aiomultiprocess import Pool


async def get(url):
    async with request("GET", url) as response:
        return await response.text("utf-8")


async def main():
    urls = ["https://jreese.sh", "https://www.baidu.com"]
    async with Pool() as pool:
        result = await pool.map(get, urls)
        print(result)


if __name__ == "__main__":
    asyncio.run(main())

控制并发量、异步写 Mongodb

import asyncio
import aiohttp
import logging
import json

# Motor提供了一个基于协程的API，用于对MongoDB的非阻塞访问。
from motor.motor_asyncio import AsyncIOMotorClient

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s: %(message)s'
)

INDEX_URL = 'https://spa5.scrape.center/api/book/?limit=18&offset={offset}'
DETAIL_URL = 'https://spa5.scrape.center/api/book/{id}'
PAGE_SIZE = 18  # 页面大小
PAGE_NUMBER = 3  # 页码数量
CONCURRENCY = 5  # 并发量（信号量）

semaphore = asyncio.Semaphore(CONCURRENCY)  # 控制并发量
session = None

MONGO_CONNECTION_STRING = 'mongodb://localhost:27017'
MONGO_DB_NAME = 'books'
MONGO_COLLECTION_NAME = 'books'

client = AsyncIOMotorClient(MONGO_CONNECTION_STRING)
db = client[MONGO_DB_NAME]
collection = db[MONGO_COLLECTION_NAME]


# 向url发送请求返回json数据
async def scrape_api(url):
    async with semaphore:  # 引入信号量作为上下文
        try:
            logging.info('scraping %s', url)
            async with session.get(url) as response:  # 发送请求 和requests使用方法类似
                return await response.json()  # 返回json数据
        except aiohttp.ClientError:
            logging.error('error occurred while scraping %s', url, exc_info=True)


# 爬取列表页
async def scrape_index(page):
    url = INDEX_URL.format(offset=PAGE_SIZE * (page - 1))
    return await scrape_api(url)


# 爬取详情页
async def scrape_detail(id):
    url = DETAIL_URL.format(id=id)
    data = await scrape_api(url)
    await save_data(data)


# 保存数据
async def save_data(data):
    logging.info('saving data %s', data)
    if data:
        return await collection.update_one({'id': data.get('id')}, {'$set': data}, upsert=True)


async def main():
    global session
    session = aiohttp.ClientSession()  # 定义客户端会话
    # 定义任务列表 列表页
    """asyncio.ensure_future 定义task对象"""
    scrape_index_tasks = [asyncio.ensure_future(scrape_index(page)) for page in range(1, PAGE_NUMBER + 1)]
    json_data = await asyncio.gather(*scrape_index_tasks)
    # logging.info('results %s', json.dumps(json_data, ensure_ascii=False, indent=2))
    ids = []
    for index_data in json_data:
        if not index_data: continue
        for item in index_data.get('results'):
            ids.append(item.get('id'))
    # 详情页
    scrape_detail_tasks = [asyncio.ensure_future(scrape_detail(id)) for id in ids]
    await asyncio.wait(scrape_detail_tasks)
    await session.close()


if __name__ == '__main__':
    loop = asyncio.get_event_loop()  # 定义事件循环
    loop.run_until_complete(main())  # 执行直到完成
    # asycio.run(main()) # python 3.7+ 可以代替前面两行

linux 打开文件的最大数默认是1024，windows默认是509，如果异步操作文件的数量超过最大值会引起报错ValueError: too many file descriptors in select()，可以用 asyncio.Semaphore(100) 限制并发数量。有了信号量的控制之后，同时运行的 task 数量就会被控制，这样就能给 aiohttp 限制速度了

使用 uvloop 加速

uvloop基于libuv，libuv是一个使用C语言实现的高性能异步I/O库，uvloop用来代替asyncio默认事件循环，可以进一步加快异步I/O操作的速度。

uvloop 的使用非常简单，只要在获取事件循环前，调用如下方法，将 asyncio 的事件循环策略设置为 uvloop 的事件循环策略。

asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

Aiohttp 与 Scrapy 绕过 JA3 指纹反爬机制

：https://mp.weixin.qq.com/s?__biz=MzI2MzEwNTY3OQ==&mid=2648981871&idx=1&sn=109d482a9bd656bd6be93870c155f52d

aiohttp、aiomysql

"""
asyncio是没有实现request异步的,没有涉及http协议
没必要自己实现http协议，可以直接使用aiohttp
aiohttp已实现高并发的webserver
sanic的高性能是号称媲美go语言的，实现了高并发的web服务器

使用 aiohttp的 client 端实现爬虫
asyncio爬虫，去重、入库（用异步驱动完成数据库的入库，使用aiomysql）

爬取目标：www.jobbole.com
爬取策略：获取页面中的所有URL，判断是否为文章详情页
"""
import aiohttp
import asyncio
import re
from pyquery import PyQuery
import aiomysql
from pyquery import PyQuery

stopping = False  # 设置变量stopping作为事件循环的控制
start_url = "http://www.jobbole.com"
waiting_urls = []  # 可以用list，也可以用queue
seen_urls = set()  # 已爬取的url，如果有上亿条数据，就不适合用set了

# 做3个并发
sem = asyncio.Semaphore(3)


# 从服务器返回html
async def fetch(url, session):
    # 由于并发比较高，所以不要每次获取数据都要建立连接，可以使用同一个session，通过传参的方式就好
    # async with aiohttp.ClientSession() as session:
    async with sem:
        await asyncio.sleep(1)
        try:
            async with session.get(url) as resp:
                # 获取状态码进行判断
                print("url status:{}".format(resp.status))
                if resp.status in [200, 201]:
                    data = await resp.text()
                    return resp.text
        except Exception as e:
            print(e)


# 实现爬取策略，解析获取可爬取的url
def extract_urls(html):
    urls = []
    pq = PyQuery(html)
    for link in pq.items("a"):
        url = link.attr("href")
        if url and url.startswith("/caijing") and url not in seen_urls:
            urls.append(url)
            waiting_urls.append(url)
    return urls


# 异步获取可爬取的url
async def init_urls(url, session):
    html = await fetch(url, session)
    seen_urls.add(url)
    # 无需获取返回，因为在extract_urls中，已将url加入到waitting_urls中了
    extract_urls(html)


async def article_handler(url, session, pool):
    # 获取文章详情并解析入库
    html = await fetch(url, session)
    extract_urls(html)

    pq = PyQuery(html)
    title = pq("title").text()
    # pool.acquire()是获取一个连接
    async with pool.acquire() as conn:
        async with conn.cursor() as cur:
            await cur.execute("SELECT 42;")
            # 可以使用navicat进行对数据库的操作(建库建表)
            insert_sql = "insert into article_test(title) values('{}')".format(title)
            await cur.execute(insert_sql)


# 消费者consumer:从waiting_urls中不停地爬取数据，取到数据就扔到协程asyncio中，
async def consumer(pool):
    async with aiohttp.ClientSession() as session:
        while not stopping:
            # 当队列为空的时候，等待一下，否则pop时会报错
            if len(waiting_urls) == 0:
                await asyncio.sleep(0.5)
                continue  # 避免频繁发送请求
            # 否则，如果不在seen_urls中，
            url = waiting_urls.pop()
            print("start get url:{}".format(url))
            # 判断是否为详情页的url，如果是且不在seen_urls中，则对url进行文章提取解析
            if re.match(r'http://.*?jobbole.com/\d+/', url):
                if url not in seen_urls:
                    asyncio.ensure_future(article_handler(url, session, pool))
                    await asyncio.sleep(30)  # 避免发送过多请求
                else:
                    if url not in seen_urls:
                        asyncio.ensure_future(init_urls(url, session))
            else:
                if url not in seen_urls:
                    asyncio.ensure_future(init_urls(url))


async def main():
    # 等待mysql连接建立好，要设置 chartset 才能插入中文数据，autocommit也必须要设置才能提交数据
    pool = await aiomysql.create_pool(
        host='127.0.0.1', port=3306,
        user='root', passwor='', db='aiomysql_test',
        loop=loop, charset="utf8", autocommit=True
    )
    # 由于async with在创建完session后，会自动调用close()将session关闭，因此可以在前期就创建好
    async with aiohttp.ClientSession() as session:
        html = await fetch(start_url, session)
        seen_urls.add(start_url)
        # 无需获取返回，因为在extract_urls中，已将url加入到waitting_urls中了
        extract_urls(html)
    # asyncio.ensure_future(init_urls(start_url))
    asyncio.ensure_future(consumer(pool))


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    asyncio.ensure_future(main(loop))
    loop.run_forever()

Scrapy 使用 aiohttp

要在Scrapy里面启用asyncio，需要额外在settings.py文件中，添加一行配置：

TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'

：https://mp.weixin.qq.com/s?__biz=MzI2MzEwNTY3OQ==&mid=2648978965&idx=1&sn=9cf95229f79bd544ec4565ca34283f69

多进程 + 协程

# -*- coding: utf-8 -*-
from multiprocessing.dummy import Process

import time
import asyncio
from aiohttp import ClientSession
from loguru import logger


def get_data_index() -> list:
    loop, results = None, None
    try:
        new_loop = asyncio.new_event_loop()
        asyncio.set_event_loop(new_loop)
        loop = asyncio.get_event_loop()
        urls = ["https://www.baidu.com/", "https://www.so.com"] * 4
        tasks = []
        for url in urls:
            task = asyncio.ensure_future(request_with_aio(url))
            tasks.append(task)

        results = loop.run_until_complete(asyncio.gather(*tasks))
    finally:
        if loop.is_closed() is False:
            loop.close()
    return results


async def request_with_aio(url):
    logger.info('request begin time:%s' % time.time())
    async with ClientSession() as session:
        async with session.get(url, timeout=10, verify_ssl=False) as response:
            response = await response.read()
            logger.info('request end time:%s' % time.time())
            return response


if __name__ == '__main__':
    for i in range(1, 3):
        p = Process(target=get_data_index)
        p.start()