协程与线程和线程池

最新推荐文章于 2023-12-01 14:00:41 发布

哦...

最新推荐文章于 2023-12-01 14:00:41 发布

阅读量1.2k

点赞数 3

分类专栏： python 文章标签： python 多线程协程

本文链接：https://blog.csdn.net/piglite/article/details/108703450

版权

python 专栏收录该内容

86 篇文章 6 订阅

订阅专栏

一句话，协程是基于单线程的，协程的根是事件总线(event loop)。

所以，想在多线程环境下跑协程任务，就必须在运行任务的线程中手动安排上事件总线才可以！

典型的场景就是loop.run_in_executor(线程池,任务,任务参数)

import asyncio
from concurrent.futures import ThreadPoolExecutor


def run(corofn, *args):
    # 协程任务可以跑起来的根是线程中有事件总线！
    # 获得一个全新的事件总线
    # 这里不能是get_event_loop，get_event_loop是获得当前的事件总线
    # 当前的事件总线是在主线程跑main函数的那条事件总线
    loop = asyncio.new_event_loop()
    try:
        #协程函数
        coro = corofn(*args)
        #手动为线程池中运行run函数的线程设置事件总线
        #设置事件总线是为了跑协程函数corofn
        asyncio.set_event_loop(loop)
        #在事件总线中跑协程函数
        return loop.run_until_complete(coro)
    finally:
        #协程函数跑完后，手动关闭该线程的协程
        loop.close()


async def main():
    loop = asyncio.get_event_loop()
    executor = ThreadPoolExecutor(max_workers=5)
    futures = [
        # run是正常函数会在线程池的某个线程运行中运行
        # 但run函数的函数体本质是要跑一个协程函数asyncio.sleep
        # 要跑协程任务，必须保证run所在的线程有为协程服务的事件总线！
        loop.run_in_executor(executor, run, asyncio.sleep, 1, x)
        for x in range(10)]
    print(await asyncio.gather(*futures))
    # Prints: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

使用loop.run_in_executor的真正应用场景并不是要把一个协程任务放到线程中去执行，这么做并不会让程序效率有什么明显提升。run_in_executor的真正奥义是解决你的这两个需求：

1. 希望把一个同步阻塞方法异步执行

2. 虽然异步执行，但我要拿到这个方法的返回值

异步执行同步阻塞方法用线程就可以，没有必要用Future。但是线程的run方法是没有返回值的，这样说明线程设计的初衷是为了主线程更快的完成业务逻辑，把繁重的具体操作挪到其它线程执行，而且这种执行的结果对主线程的业务逻辑没有什么直接影响。直白的说，利用线程执行的方法最好是不需要返回值的。以前如果需要异步执行方法的结果，最常用的可能就是回调了，现在有了Future，对既要异步又要返回值的处理就多了一种选择。Python这里的Future设计思想和Java的Future是非常相似的！

下面的例子，利用run_in_executor函数的返回值，把一个非awaitable任务变成可awaitable的，把同步的阻塞任务扔到线程中池执行实现协程的异步效果，执行结束时可以拿到返回值。

import asyncio
import time
from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor(5)

# 同步阻塞函数
def mysleep(num):
    time.sleep(1)
    return num


async def main():
    loop = asyncio.get_event_loop()
    #拿到以协程方式运行的main函数的事件总线后
    #通过run_in_executor的“包装”，把mysleep变成了awaitable的协程任务
    fs = [loop.run_in_executor(executor, mysleep, i) for i in range(10)]
    #将fs列表中的awaitable任务gather起来
    #这样，在5个线程的线程池中，大概2秒钟就可以得到10个同步阻塞函数mysleep的结果
    print(await asyncio.gather(*fs))
    # Prints: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


if __name__ == '__main__':
    asyncio.run(main())

如果这个例子让你没什么感觉，那么想想使用requests批量get url的场景：

import asyncio
import time
from concurrent.futures import ThreadPoolExecutor

import requests
import fake_useragent
import bs4

executor = ThreadPoolExecutor(5)
urls = [
    'https://sports.sina.com.cn/g/laliga/2020-09-21/doc-iivhvpwy7861279.shtml',
    'https://sports.sina.com.cn/g/laliga/2020-09-21/doc-iivhuipp5474305.shtml',
    'https://sports.sina.com.cn/g/laliga/2020-09-21/doc-iivhuipp5474709.shtml',
    'https://sports.sina.com.cn/g/seriea/2020-09-21/doc-iivhvpwy7860211.shtml',
    'https://sports.sina.com.cn/g/laliga/2020-09-20/doc-iivhuipp5449461.shtml'
]

agent = fake_useragent.UserAgent()


def get(url):
    # 这是一个标准的同步阻塞有返回值的函数
    time.sleep(1)
    r = requests.get(url, headers={'user-agent': agent.random})
    return r.content.decode('utf-8')


async def main():
    tasks = []
    # 拿到运行main函数的事件总线
    loop = asyncio.get_event_loop()
    # 异步的通过get函数获取5个网页内容
    # get函数的返回值会按url中url的次序组成contents列表
    contents = await asyncio.gather(*[
        loop.run_in_executor(executor, get, url) for url in urls])
    
    # 异步的获得5个BeautifulSou对象
    # 5个BeautifulSoup对象会按照contents列表中的顺序组成tasks列表
    tasks = await asyncio.gather(*[
        loop.run_in_executor(executor, bs4.BeautifulSoup, content, 'lxml')
        for content in contents
    ])
    # 按找urls中url的顺序打印5个网页的文章标题
    for task in tasks:
        print(task.find('h1').text)


if __name__ == '__main__':
    s = time.perf_counter()
    asyncio.run(main())
    print(time.perf_counter() - s)

用run_in_executor结合gather，有异步，有结果，更重要的还有顺序。还有什么比这更美好的呢？

哦...

关注

3
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
协程与线程和线程池

一句话，协程是基于单线程的，协程的根是事件总线(eventloop)。所以，想在多线程环境下跑协程任务，就必须在运行任务的线程中手动安排上事件总线才可以！典型的场景就是loop.run_in_executor(线程池,任务,任务参数)import asynciofrom concurrent.futures import ThreadPoolExecutordef run(corofn, *args): # 协程任务可以跑起来的根是线程中有事件总线！ # 获取事件总线.
复制链接

扫一扫