【进阶】 --- 多线程、多进程、异步IO实用例子

最新推荐文章于 2023-07-17 17:25:17 发布

「已注销」

最新推荐文章于 2023-07-17 17:25:17 发布

阅读量1k

点赞数 1

文章标签： python 爬虫开发语言

全网优质文章转载收藏，均不代表本人立场！

本文链接：https://blog.csdn.net/lyshark_lyshark/article/details/125848137

版权

【进阶】 --- 多线程、多进程、异步IO实用例子：https://blog.csdn.net/lu8000/article/details/82315576

python之爬虫_并发(串行、多线程、多进程、异步IO)：https://www.cnblogs.com/fat39/archive/2004/01/13/9044474.html

Python 并发总结，多线程，多进程，异步IO：https://www.cnblogs.com/junmoxiao/p/11948993.html

asyncio --- 异步 I/O 官方文档：https://docs.python.org/zh-cn/3.10/library/asyncio.html
关于asyncio异步io并发编程：https://zhuanlan.zhihu.com/p/158641367

支持 asyncio 的异步Python库：https://github.com/aio-libs

知乎专栏： https://zhuanlan.zhihu.com/zarten 之 Python中协程异步IO(asyncio)详解( https://zhuanlan.zhihu.com/p/59621713 )

asyncio:异步I/O、事件循环和并发工具：https://www.cnblogs.com/sidianok/p/12210857.html

在编写爬虫时，性能的消耗主要在IO请求中，当单进程单线程模式下请求URL时必然会引起等待，从而使得请求整体变慢。以下代码默认运行环境为 python3。

httpie：HTTPie 使用详解：https://zhuanlan.zhihu.com/p/45093545
grequests，Requests + Gevent，访问：https://github.com/kennethreitz/grequests
gevent，一个高并发的网络性能库，访问：http://www.gevent.org/
twisted，基于事件驱动的网络引擎框架。访问：https://twistedmatrix.com/trac/

一、多线程、多进程
        1.同步执行
        2.多线程执行
        3.多线程 + 回调函数执行
        4.多进程执行
        5.多进程 + 回调函数执行

二、异步
        1.asyncio 示例 1
          asyncio 示例 2
          python 异步编程之 asyncio(百万并发)
         学习 python 高并发模块 asynio
        2.asyncio + aiohttp
        3.asyncio + requests
        4.gevent + requests
        5.grequests
        6.Twisted示例
        7.Tornado
        8.Twisted更多
        9.史上最牛逼的异步 IO 模块

一、多线程、多进程

1. 同步执行

示例 1( 同步执行 )：

import requests
import time
from lxml import etree

urls = [
    'https://blog.csdn.net/Jmilk/article/details/103218919',
    'https://blog.csdn.net/stven_king/article/details/103256724',
    'https://blog.csdn.net/csdnnews/article/details/103154693',
    'https://blog.csdn.net/dg_lee/article/details/103951021',
    'https://blog.csdn.net/m0_37907797/article/details/103272967',
    'https://blog.csdn.net/zzq900503/article/details/49618605',
    'https://blog.csdn.net/weixin_44339238/article/details/103977138',
    'https://blog.csdn.net/dengjin20104042056/article/details/103930275',
    'https://blog.csdn.net/Mind_programmonkey/article/details/103940511',
    'https://blog.csdn.net/xufive/article/details/102993570',
    'https://blog.csdn.net/weixin_41010294/article/details/104009722',
    'https://blog.csdn.net/yunqiinsight/article/details/103137022',
    'https://blog.csdn.net/qq_44210563/article/details/102826406',
]


def get_title(url: str):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
                      '(KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36'
    }
    r = requests.get(url, headers=headers)
    if 200 == r.status_code:
        title = etree.HTML(r.content).xpath('//h1[@class="title-article"]/text()')[0]
        print(title)
    else:
        print(f'[status_code:{r.status_code}]:{r.url}')


def main():
    for url in urls:
        get_title(url)


if __name__ == '__main__':
    start = time.time()
    main()
    print(f'cost time: {time.time() - start}s')

使用 httpx 模块的同步调用( httpx 即可同步，也可异步 )

import time
import httpx


def make_request(client):
    resp = client.get('https://httpbin.org/get')
    result = resp.json()
    print(f'status_code : {resp.status_code}')
    assert 200 == resp.status_code


def main():
    session = httpx.Client()
    # 100 次调用
    for _ in range(10):
        make_request(session)


if __name__ == '__main__':
    # 开始
    start = time.time()
    main()
    # 结束
    end = time.time()
    print(f'同步：发送100次请求，耗时：{end - start}')

2. 多线程执行(线程池)

from concurrent.futures import ThreadPoolExecutor
import requests


def fetch_sync(r_url):
    response = requests.get(r_url)
    print(f"{r_url} ---> {response.status_code}")


url_list = [
    'https://www.baidu.com',
    'https://www.bing.com'
]
pool = ThreadPoolExecutor(5)
for url in url_list:
    pool.submit(fetch_sync, url)
pool.shutdown(wait=True)

3. 多线程 + 回调函数执行

from concurrent.futures import ThreadPoolExecutor
import requests


def fetch_sync(r_url):
    response = requests.get(r_url, verify=False)
    return response


def callback(future):
    resp = future.result()
    print(f"{resp.url} ---> {resp.status_code}")


url_list = [
    'https://www.baidu.com',
    'https://www.bing.com'
]
pool = ThreadPoolExecutor(5)
for url in url_list:
    v = pool.submit(fetch_sync, url)
    v.add_done_callback(callback)
pool.shutdown(wait=True)

4. 多进程执行

import requests
from concurrent import futures


def fetch_sync(r_url):
    response = requests.get(r_url)
    return response


if __name__ == '__main__':
    url_list = [
        'https://www.baidu.com',
        'https://www.bing.com'
    ]
    with futures.ProcessPoolExecutor(5) as executor:
        res = [executor.submit(fetch_sync, url) for url in url_list]
    print(res)

示例：

import requests
from concurrent import futures
import time
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)


def fetch_sync(args):
    i, r_url = args
    print(f'index : {i}')
    response = requests.get(r_url, verify=False)
    time.sleep(2)
    return response.status_code


def callback(future):
    print(future.result())


if __name__ == '__main__':
    # url_list = ['https://www.github.com', 'https://www.bing.com']
    url = 'https://www.github.com'
    with futures.ProcessPoolExecutor(5) as executor:
        for index in range(1000):
            v = executor.submit(fetch_sync, (index, url))
            v.add_done_callback(callback)
    pass

5. 多进程 + 回调函数执行

import requests
from concurrent import futures


def fetch_sync(r_url):
    response = requests.get(r_url, verify=False)
    return response


def callback(future):
    print(future.result())


if __name__ == '__main__':
    url_list = ['https://www.github.com', 'https://www.bing.com']
    with futures.ProcessPoolExecutor(5) as executor:
        for url in url_list:
            v = executor.submit(fetch_sync, url)
            v.add_done_callback(callback)
    pass

二、异步

对于 事件循环，可以动态的增加协程 到 事件循环 中， 而不是在一开始就确定所有需要协程。协程只运行在 事件循环 中。
默认情况下 asyncio.get_event_loop() 是一个 select模型 的 事件循环。
默认的 asyncio.get_event_loop() 事件循环属于主线程。

参考 python asyncio 协程：https://blog.csdn.net/dashoumeixi/article/details/81001681

提示：一般是一个线程一个事件循环，为什么要一个线程一个事件循环

如果你要使用多个事件循环，创建线程后调用

lp = asyncio.new_event_loop() # 创建一个新的事件循环
asyncio.set_event_loop(lp)    # 设置当前线程的事件循环

核心思想: yield from / await 就这2个关键字，运行(驱动)一个协程，同时交出当前函数的控制权，让事件循环执行下个任务。

yield from 的实现原理：https://blog.csdn.net/dashoumeixi/article/details/84076812

要搞懂 asyncio 协程，还是先把生成器弄懂，如果对生成器很模糊，比如 yield from 生成器对象，这个看不懂的话,建议先看：python生成器 yield from：https://blog.csdn.net/dashoumeixi/article/details/80936798

有 2 种方式让协程运行起来，协程(生成器)本身是不运行的

1. await / yield from 协程，这一组能等待协程完成。

2. asyncio.ensure_future / async(协程) ，这一组不需要等待协程完成。

注意：

1. 协程就是生成器的增强版 ( 多了send 与 yield 的接收 )，在 asyncio 中的协程与生成器对象不同点:
asyncio协程：函数内部不能使用 yield [如果使用会抛RuntimeError]，只能使用 yield from / await,
一般的生成器： yield 或 yield from 2个都能用，至少使用一个。这 2个本来就是一回事，协程只是不能使用 yield

2. 在 asycio 中所有的协程都被自动包装成一个个 Task / Future 对象，但其本质还是一个生成器，因此可以 yield from / await Task/Futrue

基本流程：

1. 定义一个协程 (async def 或者 @asyncio.coroutine 装饰的函数)
2. 调用上述函数，获取一个协程对象【不能使用yield，除非你自己写异步模块，毕竟最终所调用的还是基于yield的生成器函数】。通过 asyncio.ensure_future 或 asyncio.async 函数调度协程(这部意味着要开始执行了) ，返回了一个 Task 对象，Task对象是 Future对象的子类， ( 这步可作可不作，只要是一个协程对象，一旦扔进事件队列中，将自动给你封装成Task对象 )
3. 获取一个事件循环 asyncio.get_event_loop() ，默认此事件循环属于主线程
4. 等待事件循环调度协程

后面的例子着重说明了一下 as_completed，附加了源码。先说明一下：

1. as_completed 每次迭代返回一个协程，
2. 这个协程内部从 Queue 中取出先完成的 Future 对象
3. 然后我们再 await coroutine

示例 1：

import asyncio

"""
 第一个例子
 没什么用.
 注意: 协程 与 生成器 的用法是一样的. 需要调用之后才产生对象. 
"""


async def func():
    print('hi')

lp = asyncio.get_event_loop()  # 获取事件循环

# 放到进事件循环里.注意,func() 而不是func. 需要调用之后才是协程对象.
lp.run_until_complete(func())

示例 2：

import asyncio

"""
 用async def (新语法) 定义一个函数,同时返回值
 asyncio.sleep 模拟IO阻塞情况 ; await 相当于 yield from.
 await 或者 yield from 交出函数控制权(中断),让事件循环执行下个任务 ,一边等待后面的协程完成
"""


async def func(i):
    print('start')
    await asyncio.sleep(i)  # 交出控制权
    print('done')
    return i


co = func(2)  # 产生协程对象
print(co)
lp = asyncio.get_event_loop()     # 获取事件循环
task = asyncio.ensure_future(co)  # 开始调度
lp.run_until_complete(task)       # 等待完成
print(task.result())              # 获取结果

添加回调

示例 1：

import asyncio

"""
 添加一个回调:add_done_callback
"""


async def func(i):
    print('start')
    await asyncio.sleep(i)
    return i


def call_back(v):
    print('callback , arg:', v, 'result:', v.result())


if __name__ == '__main__':
    co = func(2)  # 产生协程对象
    lp = asyncio.get_event_loop()  # 获取事件循环
    # task = asyncio.run_coroutine_threadsafe(co)  # 开始调度
    task = asyncio.ensure_future(co)  # 开始调度
    task.add_done_callback(call_back)  # 增加回调
    lp.run_until_complete(task)  # 等待
    print(task.result())  # 获取结果

子协程调用原理图

官方的一个实例如下

从下面的原理图我们可以看到

1 当 事件循环 处于运行状态的时候，任务Task 处于pending(等待)，会把控制权交给委托生成器 print_sum

2  委托生成器 print_sum 会建立一个双向通道为Task和子生成器，调用子生成器compute并把值传递过去

3  子生成器compute会通过委托生成器建立的双向通道把自己当前的状态suspending(暂停)传给Task，Task 告诉 loop 它数据还没处理完成

4  loop 会循环检测 Task ，Task 通过双向通道去看自生成器是否处理完成

5 子生成器处理完成后会向委托生成器抛出一个异常和计算的值，并关闭生成器

6 委托生成器再把异常抛给任务(Task)，把任务关闭

7  loop 停止循环

call_soon、call_at、call_later、call_soon_threadsafe

call_soon 循环开始检测时，立即执行一个回调函数
call_at 循环开始的第几秒s执行
call_later 循环开始后10s后执行
call_soom_threadsafe 立即执行一个安全的线程

import asyncio

import time


def call_back(str_var, loop):
    print("success time {}".format(str_var))


def stop_loop(str_var, loop):
    time.sleep(str_var)
    loop.stop()


# call_later, call_at
if __name__ == "__main__":
    event_loop = asyncio.get_event_loop()
    event_loop.call_soon(call_back, 'loop 循环开始检测立即执行', event_loop)
    now = event_loop.time()  # loop 循环时间
    event_loop.call_at(now + 2, call_back, 2, event_loop)
    event_loop.call_at(now + 1, call_back, 1, event_loop)
    event_loop.call_at(now + 3, call_back, 3, event_loop)
    event_loop.call_later(6, call_back, "6s后执行", event_loop)
    # event_loop.call_soon_threadsafe(stop_loop, event_loop)
    event_loop.run_forever()

不同线程中的事件循环

事件循环中维护了一个队列(FIFO, Queue) ，通过另一种方式来调用：

import time
import datetime
import asyncio

"""
 事件循环中维护了一个FIFO队列
 通过call_soon 通知事件循环来调度一个函数.
"""


def func(x):
    print(f'x:{x}, start time:{datetime.datetime.now().replace(microsecond=0)}')
    time.sleep(x)
    print(f'func invoked:{x}')


loop = asyncio.get_event_loop()
loop.call_soon(func, 1)  # 调度一个函数
loop.call_soon(func, 2)
loop.call_soon(func, 3)
loop.run_forever()  # 阻塞

'''
x:1, start time:2020-10-01 15:45:46
func invoked:1
x:2, start time:2020-10-01 15:45:47
func invoked:2
x:3, start time:2020-10-01 15:45:49
func invoked:3
'''

可以看到以上操作是同步的。下面通过 asyncio.run_coroutine_threadsafe 函数可以把上述函数调度变成异步执行：

import time
import datetime
import asyncio

"""
    1.首先会调用asyncio.run_coroutine_threadsafe 这个函数.
    2.之前的普通函数修改成协程对象
"""


async def func(x):
    print(f'x:{x}, start time:{datetime.datetime.now().replace(microsecond=0)}')
    await asyncio.sleep(x)
    print(f'func invoked:{x}, now:{datetime.datetime.now().replace(microsecond=0)}')


loop = asyncio.get_event_loop()
co1 = func(1)
co2 = func(2)
co3 = func(3)
asyncio.run_coroutine_threadsafe(co1, loop)  # 调度
asyncio.run_coroutine_threadsafe(co2, loop)
asyncio.run_coroutine_threadsafe(co3, loop)
loop.run_forever()  # 阻塞

'''
x:1, start time:2020-10-01 15:49:32
x:2, start time:2020-10-01 15:49:32
x:3, start time:2020-10-01 15:49:32
func invoked:1, now:2020-10-01 15:49:33
func invoked:2, now:2020-10-01 15:49:34
func invoked:3, now:2020-10-01 15:49:35
'''

上面 2 个例子只是告诉你 2 件事情。

1. run_coroutine_threadsafe是异步线程安全 ，call_soon是同步。
2. run_coroutine_threadsafe 这个函数对应 ensure_future (只能作用于同一线程中)。

可以在一个子线程中运行一个事件循环，然后在主线程中动态的添加协程，这样既不阻塞主线程执行其他任务，子线程也可以异步的执行协程。

注意：默认情况下获取的 event_loop 是主线程的，所以要在子线程中使用 event_loop 需要 new_event_loop 。如果在子线程中直接获取 event_loop 会抛异常。

源代码中的判断：isinstance(threading.current_thread(), threading._MainThread)

示例：

import os
import sys
import queue
import threading
import time
import datetime
import asyncio

"""
    1. call_soon , call_soon_threadsafe 是同步的
    2. asyncio.run_coroutine_threadsafe(coro, loop) -> 对应 asyncio.ensure_future
       是在 事件循环中 异步执行。
"""


# 在子线程中执行一个事件循环 , 注意需要一个新的事件循环
def thread_loop(loop: asyncio.AbstractEventLoop):
    print('线程开启 tid:', threading.currentThread().ident)
    asyncio.set_event_loop(loop)  # 设置一个新的事件循环
    loop.run_forever()            # run_forever 是阻塞函数，所以，子线程不会退出。


async def func(x, q):
    current_time = datetime.datetime.now().replace(microsecond=0)
    msg = f'func: {x}, time:{current_time}, tid:{threading.currentThread().ident}'
    print(msg)
    await asyncio.sleep(x)
    q.put(x)


if __name__ == '__main__':
    temp_queue = queue.Queue()

    lp = asyncio.new_event_loop()  # 新建一个事件循环, 如果使用默认的, 则不能放入子线程
    thread_1 = threading.Thread(target=thread_loop, args=(lp,))
    thread_1.start()
    co1 = func(2, temp_queue)  # 2个协程
    co2 = func(3, temp_queue)
    asyncio.run_coroutine_threadsafe(co1, lp)  # 开始调度在子线程中的事件循环
    asyncio.run_coroutine_threadsafe(co2, lp)
    print(f'开始事件:{datetime.datetime.now().replace(microsecond=0)}')
    while 1:
        if temp_queue.empty():
            print('队列为空，睡1秒继续...')
            time.sleep(1)
            continue
        x = temp_queue.get()  # 如果为空，get函数会直接阻塞，不往下执行
        current_time = datetime.datetime.now().replace(microsecond=0)
        msg = f'main :{x}, time:{current_time}'
        print(msg)
        time.sleep(1)

下面例子中 asyncio.ensure_future/async 都可以换成 asyncio.run_coroutine_threadsafe 【在不同线程中的事件循环】：

ThreadPollExecutor 和 asyncio 完成阻塞 IO 请求

在 asyncio 中集成线程池处理耗时IO

在协程中同步阻塞的写法，但有些时候不得已就是一些同步耗时的接口

可以把 线程池 集成到 asynico 模块中

import asyncio
from concurrent import futures

task_list = []
loop = asyncio.get_event_loop()
executor = futures.ThreadPoolExecutor(3)


def get_url(t_url=None):
    print(t_url)


for url in range(20):
    url = "http://shop.projectsedu.com/goods/{}/".format(url)
    task = loop.run_in_executor(executor, get_url, url)
    task_list.append(task)

loop.run_until_complete(asyncio.wait(task_list))

示例代码：

# 使用多线程：在 协程 中集成阻塞io
import asyncio
from concurrent.futures import ThreadPoolExecutor
import socket
from urllib.parse import urlparse


def get_url(url):
    # 通过socket请求html
    url = urlparse(url)
    host = url.netloc
    path = url.path
    if path == "":
        path = "/"

    # 建立socket连接
    client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    # client.setblocking(False)
    client.connect((host, 80))  # 阻塞不会消耗cpu

    # 不停的询问连接是否建立好， 需要while循环不停的去检查状态
    # 做计算任务或者再次发起其他的连接请求

    client.send("GET {} HTTP/1.1\r\nHost:{}\r\nConnection:close\r\n\r\n".format(path, host).encode("utf8"))

    data = b""
    while True:
        d = client.recv(1024)
        if d:
            data += d
        else:
            break

    data = data.decode("utf8")
    html_data = data.split("\r\n\r\n")[1]
    print(html_data)
    client.close()


if __name__ == "__main__":
    import time

    start_time = time.time()
    loop = asyncio.get_event_loop()
    executor = ThreadPoolExecutor(3)
    tasks = []
    for url in range(20):
        url = "http://shop.projectsedu.com/goods/{}/".format(url)
        task = loop.run_in_executor(executor, get_url, url)
        tasks.append(task)
    loop.run_until_complete(asyncio.wait(tasks))
    print("last time:{}".format(time.time() - start_time))

不用集成也是可以的，但是要在函数的前面加上 async 使同步变成异步写法

#使用多线程：在携程中集成阻塞io
import asyncio
from concurrent.futures import ThreadPoolExecutor
import socket
from urllib.parse import urlparse
import time

async def get_html(url):
    #通过socket请求html
    url = urlparse(url)
    host = url.netloc
    path = url.path
    if path == "":
        path = "/"

    #建立socket连接
    client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    # client.setblocking(False)
    client.connect((host, 80)) #阻塞不会消耗cpu

    #不停的询问连接是否建立好， 需要while循环不停的去检查状态
    #做计算任务或者再次发起其他的连接请求

    client.send("GET {} HTTP/1.1\r\nHost:{}\r\nConnection:close\r\n\r\n".format(path, host).encode("utf8"))

    data = b""
    while True:
        d = client.recv(1024)
        if d:
            data += d
        else:
            break

    data = data.decode("utf8")
    html_data = data.split("\r\n\r\n")[1]
    print(html_data)
    client.close()

if __name__ == "__main__":
    start_time = time.time()
    loop = asyncio.get_event_loop()
    tasks = [get_html("http://shop.projectsedu.com/goods/2/") for i in range(10)]
    loop.run_until_complete(asyncio.wait(tasks))
    print(time.time() - start_time)

asyncio 的同步和通信

在多少线程中考虑安全性，需要加锁，在协程中是不需要的

import asyncio

total = 0
lock = None


async def add():
    global total
    for _ in range(1000):
        total += 1


async def desc():
    global total, lock
    for _ in range(1000):
        total -= 1


if __name__ == '__main__':
    tasks = [add(), desc()]
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait(tasks))
    print(total)

在有些情况中，对协程还是需要类似锁的机制

示例：parse_response 和 use_response 有共同调用的代码，get_response、parse_response 去请求的时候如果 get_response 也去请求，会触发网站的反爬虫机制.
这就需要我们像上诉代码那样加 lock，同时 get_response 和 use_response 中都调用了parse_response，我们想在 get_response 中只请求一次，下次用缓存，所以要用到锁

import asyncio
import aiohttp
from asyncio import Lock

cache = {}
lock = Lock()


async def get_response(url):
    async with lock:  # 等价于 with await lock:   还有async for 。。。类似的用法
        # 这里使用async with 是因为 Lock中有__await__ 和 __aenter__两个魔法方法
        # 和线程一样， 这里也可以用 await lock.acquire() 并在结束时 lock.release
        if url in cache:
            return cache[url]
        print("第一次请求")
        response = aiohttp.request('GET', url)
        cache[url] = response
        return response


async def parse_response(url):
    response = await get_response(url)
    print('parse_response', response)
    # do some parse


async def use_response(url):
    response = await get_response(url)
    print('use_response', response)
    # use response to do something interesting


if __name__ == '__main__':
    tasks = [parse_response('baidu'), use_response('baidu')]
    loop = asyncio.get_event_loop()
    # loop.run_until_complete将task放到loop中，进行事件循环, 这里必须传入的是一个list
    loop.run_until_complete(asyncio.wait(tasks))

输出结果如下　

asyncio 通信 queue

协程是单线程的，所以协程中完全可以使用全局变量实现 queue 来相互通信，但是如果想要在 queue 中定义存放有限的最大数目，需要在 put 和 get 的前面都要加 await

from asyncio import Queue

queue = Queue(maxsize=3)
await queue.get()
await queue.put()

一个事件循环中执行多个 task，实现并发执行

future 和 task：

future 是一个结果的容器，结果执行完后在内部会回调 call_back 函数
task 是 future 的子类，可以用来激活协程。( task 是协程和 Future 的桥梁 )

wait、gather、await

1. wait、gather 这2个函数都是用于获取结果的，且都不阻塞，直接返回一个生成器对象可用于 yield from / await

2. 两种用法可以获取执行完成后的结果：
第一种: result = asyncio.run_until_completed(asyncio.wait/gather) 执行完成所有之后获取结果
第二种: result = await asyncio.wait/gather 在一个协程内获取结果

3. as_completed 与并发包 concurrent 中的行为类似，哪个任务先完成哪个先返回，内部实现是 yield from Queue.get()

4. 嵌套：await / yield from 后跟协程，直到后面的协程运行完毕，才执行 await / yield from 下面的代码，整个过程是不阻塞的

wait 和 gather 区别

这两个都可以添加多个任务到事件循环中

一般使用 asyncio.wait(tasks) 的地方也可以使用 asyncio.gather(tasks) ，但是 wait 接收一堆 task，gather接收一个 task 列表。

asyncio.wait(tasks)方法返回值是两组 task/future的 set。dones, pendings = await asyncio.wait(tasks) 其中

dones 是 task的 set,
pendings 是 future 的 set。

asyncio.gather(tasks) 返回一个结果的 list。

gather 比 wait 更加的高级

可以对任务进行分组
可以取消任务

import asyncio
import time


async def get_html(url):
    global index
    print(f"{index}  start get url")
    await asyncio.sleep(2)
    index += 1
    print(f"{index}  end get url")


if __name__ == "__main__":
    start_time = time.time()

    index = 1
    loop = asyncio.get_event_loop()
    tasks = [get_html("http://www.imooc.com") for i in range(10)]

    # gather和wait的区别
    # tasks = [get_html("http://www.imooc.com") for i in range(10)]
    # loop.run_until_complete(asyncio.wait(tasks))
    group1 = [get_html("http://projectsedu.com") for i in range(2)]
    group2 = [get_html("http://www.imooc.com") for i in range(2)]
    group1 = asyncio.gather(*group1)
    group2 = asyncio.gather(*group2)
    loop.run_until_complete(asyncio.gather(group1, group2))
    print(time.time() - start_time)

示例 1：

import asyncio

"""
    并发 执行多个任务。
    调度一个Task对象列表
    调用 asyncio.wait 或者 asyncio.gather 获取结果
"""


async def func(i):
    print('start')
    # 交出控制权,事件循环执行下个任务,同时等待完成
    await asyncio.sleep(i)
    return i


async def func_sleep():
    await asyncio.sleep(2)


def test_1():
    # asyncio create_task永远运行
    # https://www.pythonheidong.c

最低0.47元/天解锁文章

「已注销」

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
【进阶】 --- 多线程、多进程、异步IO实用例子

【进阶】 --- 多线程、多进程、异步IO实用例子：https://blog.csdn.net/lu8000/article/details/82315576 python之爬虫_并发(串行、多线程、多进程、异步IO)：https://www.cnblogs.com/fat39/archive/2004/01/13/9044474.html Pyth...
复制链接

扫一扫