Python协程

小Pawn爷

已于 2023-04-15 22:04:18 修改

阅读量2.4k

点赞数 1

分类专栏： 1.python进阶文章标签： python 开发语言后端

于 2022-01-13 08:22:39 首次发布

本文链接：https://blog.csdn.net/weixin_44689630/article/details/122466242

版权

1.python进阶专栏收录该内容

28 篇文章 1 订阅

订阅专栏

文章目录

1.基本概念

1.阻塞/非阻塞

概念	定义
阻塞	程序未得到计算资源时被挂起的状态,在此期间程序无法处理其他事情
非阻塞	程序未得到计算资源时被挂起的状态,在此期间程序任然可以处理其他事情

2.同步/异步

概念	定义	场景
同步	不同程序单元为了完成某个任务,在执行过程中需靠某种通信方式以协调一致，我们称这些程序单元是同步执行的	购物系统中更新商品库存,用行锁作为通信信号
异步	不同程序单元之间过程中无需通信协调，也能完成任务的方式，不相关的程序单元之间可以是异步的	爬虫

2.协程定义

1.含义

协程(Coroutine)是一种用户态的轻量级线程

2.特点

拥有自己的寄存器上下文
协程本质上是单线程

3.协程与线程比较

1.由于GIL锁的存在,多线程的运行需要频繁的加锁解锁,切换线程,这极大地降低了并发性能,

2.协程本质上是单线程,无需线程上下文切换开销,无需原子操作,

3.协程调度切换时,将寄存器上下文和栈保存到其他地方,在切回来的时候,恢复先前的寄存器上下文和栈,极大的提高了并发性能

3.greenlet

手动进行线程切换

from greenlet import greenlet
import time

def test1():
    while True:
        print "---A--"
        gr2.switch()
        time.sleep(0.5)

def test2():
    while True:
        print "---B--"
        gr1.switch()
        time.sleep(0.5)

gr1 = greenlet(test1)
gr2 = greenlet(test2)

#手动切换到gr1中运行
gr1.switch()

输出

---A--
---B--
---A--
---B--
---A--
---B--
---A--
---B--

4.gevent

1.基本使用

import gevent
import random
import time


def coroutine_work(coroutine_name):
    for i in range(10):
        print(coroutine_name, i)
        # gevent不会认同time模块中的sleep是耗时操作
        time.sleep(random.random())


gevent.joinall([
    gevent.spawn(coroutine_work, "work1"),
    gevent.spawn(coroutine_work, "work2")
])

运行结果:2个gevent是依次执行,因为gevent不会认同time模块中的sleep是耗时操作

work1 0
work1 1
work1 2
work1 3
work1 4
work1 5
work1 6
work1 7
work1 8
work1 9
work2 0
work2 1
work2 2
work2 3
work2 4
work2 5
work2 6
work2 7
work2 8
work2 9

2.模拟IO操作

import gevent
import random


def coroutine_work(coroutine_name):
    for i in range(10):
        print(coroutine_name, i)
        # 模拟耗时操作
        gevent.sleep(random.random())


gevent.joinall([
    gevent.spawn(coroutine_work, "work1"),
    gevent.spawn(coroutine_work, "work2")
])

输出结果:2个gevent交替执行

work1 0
work1 1
work1 2
work1 3
work1 4
work1 5
work1 6
work1 7
work1 8
work1 9
work2 0
work2 1
work2 2
work2 3
work2 4
work2 5
work2 6
work2 7
work2 8
work2 9

3.猴子补丁

from gevent import monkey
import gevent
import random
import time
#当程序中存在非gevent的耗时操作时,需要使用猴子补丁将耗时操作转换为gevent的耗时操作
monkey.patch_all()


def coroutine_work(coroutine_name):
    for i in range(10):
        print(coroutine_name, i)
        time.sleep(random.random())


gevent.joinall([
    gevent.spawn(coroutine_work, "work1"),
    gevent.spawn(coroutine_work, "work2")
])

输出结果:2个交替执行,

work1 0
work2 0
work1 1
work2 1
work2 2
work1 2
work1 3
work1 4
work2 3
work2 4
work1 5
work1 6
work2 5
work2 6

5.asyn

1.从一个爬虫说起

迭代1:串行爬虫

import time


def crawl_page(url):
    print('crawling {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    time.sleep(sleep_time)
    print('OK {}'.format(url))


def main(urls):
    for url in urls:
        crawl_page(url)


begin_time = time.perf_counter()
main(['url_1', 'url_2', 'url_3', 'url_4'])
end_time = time.perf_counter()
print(end_time - begin_time)

结果

crawling url_1
OK url_1
crawling url_2
OK url_2
crawling url_3
OK url_3
crawling url_4
OK url_4
10.015771199949086

迭代2:使用async标记异步函数,使用await进行调用

import time
import asyncio


async def crawl_page(url):
    print('crawling {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('OK {}'.format(url))


async def main(urls):
    for url in urls:
        await crawl_page(url)


begin_time = time.perf_counter()
asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
end_time = time.perf_counter()
print(end_time - begin_time)

结果

crawling url_1
OK url_1
crawling url_2
OK url_2
crawling url_3
OK url_3
crawling url_4
OK url_4
10.009807399939746

迭代3:使用任务Task

import time
import asyncio


async def crawl_page(url):
    print('crawling {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('OK {}'.format(url))


async def main(urls):
    tasks = [asyncio.create_task(crawl_page(url)) for url in urls]
    for task in tasks:
        await task


begin_time = time.perf_counter()
asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
end_time = time.perf_counter()
print(end_time - begin_time)

结果

crawling url_1
crawling url_2
crawling url_3
crawling url_4
OK url_1
OK url_2
OK url_3
OK url_4
3.9802477001212537

迭代4,使用gather

import time
import asyncio


async def crawl_page(url):
    print('crawling {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('OK {}'.format(url))


async def main(urls):
    tasks = [asyncio.create_task(crawl_page(url)) for url in urls]
    await asyncio.gather(*tasks)


begin_time = time.perf_counter()
asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
end_time = time.perf_counter()
print(end_time - begin_time)

2.解密协程的运行

迭代1:不创建task的异步任务,等同于串行

import asyncio
import time


async def worker_1():
    print('worker_1 start')
    await asyncio.sleep(1)
    print('worker_1 done')


async def worker_2():
    print('worker_2 start')
    await asyncio.sleep(2)
    print('worker_2 done')


async def main():
    print('before await')
    await worker_1()
    print('awaited worker_1')
    await worker_2()
    print('awaited worker_2')


begin_time = time.perf_counter()
asyncio.run(main())
end_time = time.perf_counter()
print(end_time - begin_time)

结果:

before await
worker_1 start
worker_1 done
awaited worker_1
worker_2 start
worker_2 done
awaited worker_2
2.9887187001295388

迭代2:

import time
import asyncio


async def worker_1():
    print('worker_1 start')
    await asyncio.sleep(1)
    print('worker_1 done')


async def worker_2():
    print('worker_2 start')
    await asyncio.sleep(2)
    print('worker_2 done')


async def main():
    task1 = asyncio.create_task(worker_1())
    task2 = asyncio.create_task(worker_2())
    print('before await')
    await task1
    print('awaited worker_1')
    await task2
    print('awaited worker_2')


begin_time = time.perf_counter()
asyncio.run(main())
end_time = time.perf_counter()
print(end_time - begin_time)

结果

before await
worker_1 start
worker_2 start
worker_1 done
awaited worker_1
worker_2 done
awaited worker_2
2.002005099784583

3.协程实现生产生消费则

import asyncio
import time
from asyncio import Queue


async def customer(id, queue: Queue):
    while True:
        val = queue.get()
        print(f"{id} get a val:{val}")
        await asyncio.sleep(1)


async def producer(id, queue: Queue):
    for i in range(5):
        await queue.put(i)
        print(f"{id} put a val:{i}")
        await asyncio.sleep(1)


async def main():
    queue = asyncio.Queue()
    cust_1 = asyncio.create_task(customer('customer_1', queue))
    cust_2 = asyncio.create_task(customer('customer_2', queue))

    prod_1 = asyncio.create_task(customer('prod_1', queue))
    prod_2 = asyncio.create_task(customer('prod_2', queue))

    await asyncio.sleep(1)

    cust_1.cancel()
    cust_2.cancel()

    await asyncio.gather(cust_1, cust_2, prod_1, prod_2, return_exceptions=True)


begin_time = time.perf_counter()
asyncio.run(main())
end_time = time.perf_counter()
print(end_time - begin_time)

在上述代码中,先调用 cancel() 方法是为了在协程任务运行前就取消它们的执行,如果 cancel() 方法在协程任务运行后才调用,那么这些协程任务就可能会继续执行一段时间,直到它们进入下一个 await 语句或者 yield 关键字时才能被取消.这可能会导致一些资源泄漏和额外的执行时间,因为这些协程任务会占用CPU和内存资源,但实际上它们并没有执行任何有用的操作. 因此,在需要取消协程任务的场景下,我们应该尽早调用 cancel() 方法,以确保它们能够尽早地停止执行.同时,还需要注意一些关键点：

cancel() 方法只是向协程任务发送一个取消信号,协程任务需要自己处理取消信号并进行清理工作。
如果协程任务正在执行 I/O 操作或者阻塞等待,那么它们可能无法立即响应取消信号,需要等待 I/O 操作完成或者阻塞事件解除后才能取消执行。
取消协程任务不一定能够成功,有些协程任务可能会忽略取消信号或者无法正确处理取消事件.因此,在编写异步程序时,需要考虑到取消的情况,并针对不同的协程任务进行合理的处理。

4.豆瓣爬虫

前期步骤

pip install lxml
pip install beautifulsoup4

迭代一:

import time

import requests
from bs4 import BeautifulSoup


def main():
    url = "https://movie.douban.com/cinema/later/beijing/"
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) '
                      'Chrome/52.0.2743.116 Safari/537.36 '
    }

    response = requests.get(url, headers=headers)
    init_page = response.text

    init_soup = BeautifulSoup(init_page, 'lxml')

    all_movies = init_soup.find('div', id="showing-soon")
    for each_movie in all_movies.find_all('div', class_="item"):
        all_a_tag = each_movie.find_all('a')
        all_li_tag = each_movie.find_all('li')

        movie_name = all_a_tag[1].text
        url_to_fetch = all_a_tag[1]['href']
        movie_date = all_li_tag[0].text

        response_item = requests.get(url_to_fetch, headers=headers).content
        soup_item = BeautifulSoup(response_item, 'lxml')
        img_tag = soup_item.find('img')

        print('{} {} {}'.format(movie_name, movie_date, img_tag['src']))


begin_time = time.perf_counter()
main()
end_time = time.perf_counter()
print(end_time - begin_time)

结果

灌篮高手 04月20日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2888398295.jpg
长空之王 04月28日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2889598060.jpg
人生路不熟 04月28日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2889864501.jpg
这么多年 04月28日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2890327372.jpg
长沙夜生活 04月28日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2888648134.jpg
倒数说爱你 04月28日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2888751814.jpg
惊天救援 04月28日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2890135815.jpg
检察风云 04月29日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2890643247.jpg
天堂谷大冒险 04月29日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2890601753.jpg
新猪猪侠大电影·超级赛车 04月29日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2890529454.jpg
魔幻奇缘之宝石公主 04月29日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2890340206.jpg
宇宙护卫队：风暴力量 04月29日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2881560195.jpg
鲛在水中央 05月01日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2890526786.jpg
马庄村 05月01日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2890134283.jpg
银河护卫队3 05月05日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2889358680.jpg
我和妈妈的最后一年 05月12日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2890599819.jpg
荒野狂兽 05月12日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2890557435.jpg
贫民窟之王 05月12日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2889912399.jpg
速度与激情10 05月17日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2890343870.jpg
余生那些年 05月20日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2888332880.jpg
请别相信她 05月20日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2886540928.jpg
69.0037998999469

迭代二

import asyncio
import aiohttp
import time

from bs4 import BeautifulSoup

header = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/52.0.2743.116 Safari/537.36 '
}


async def fetch_content(url):
    async with aiohttp.ClientSession(
            headers=header, connector=aiohttp.TCPConnector(ssl=False)
    ) as session:
        async with session.get(url) as response:
            return await response.text()


async def main():
    url = "https://movie.douban.com/cinema/later/beijing/"
    init_page = await fetch_content(url)
    init_soup = BeautifulSoup(init_page, 'lxml')

    movie_names, urls_to_fetch, movie_dates = [], [], []

    all_movies = init_soup.find('div', id="showing-soon")
    for each_movie in all_movies.find_all('div', class_="item"):
        all_a_tag = each_movie.find_all('a')
        all_li_tag = each_movie.find_all('li')

        movie_names.append(all_a_tag[1].text)
        urls_to_fetch.append(all_a_tag[1]['href'])
        movie_dates.append(all_li_tag[0].text)

    tasks = [fetch_content(url) for url in urls_to_fetch]
    pages = await asyncio.gather(*tasks)

    for movie_name, movie_date, page in zip(movie_names, movie_dates, pages):
        soup_item = BeautifulSoup(page, 'lxml')
        img_tag = soup_item.find('img')
        print('{} {} {}'.format(movie_name, movie_date, img_tag['src']))


begin_time = time.perf_counter()
asyncio.run(main())
end_time = time.perf_counter()
print(end_time - begin_time)

结果

灌篮高手 04月20日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2888398295.jpg
长空之王 04月28日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2889598060.jpg
人生路不熟 04月28日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2889864501.jpg
这么多年 04月28日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2890327372.jpg
长沙夜生活 04月28日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2888648134.jpg
倒数说爱你 04月28日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2888751814.jpg
惊天救援 04月28日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2890135815.jpg
检察风云 04月29日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2890643247.jpg
天堂谷大冒险 04月29日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2890601753.jpg
新猪猪侠大电影·超级赛车 04月29日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2890529454.jpg
魔幻奇缘之宝石公主 04月29日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2890340206.jpg
宇宙护卫队：风暴力量 04月29日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2881560195.jpg
鲛在水中央 05月01日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2890526786.jpg
马庄村 05月01日 https://img2.doubanio.com/view/photo/s_ratio_poster/public/p2890134283.jpg
银河护卫队3 05月05日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2889358680.jpg
我和妈妈的最后一年 05月12日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2890599819.jpg
荒野狂兽 05月12日 https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2890557435.jpg
贫民窟之王 05月12日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2889912399.jpg
速度与激情10 05月17日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2890343870.jpg
余生那些年 05月20日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2888332880.jpg
请别相信她 05月20日 https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2886540928.jpg
4.7766728000715375

小Pawn爷

关注

1
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
Python协程

1.基本概念1.阻塞/非阻塞概念定义阻塞程序未得到计算资源时被挂起的状态,在此期间程序无法处理其他事情非阻塞程序未得到计算资源时被挂起的状态,在此期间程序任然可以处理其他事情2.同步/异步概念定义场景同步不同程序单元为了完成某个任务,在执行过程中需靠某种通信方式以协调一致，我们称这些程序单元是同步执行的购物系统中更新商品库存,用行锁作为通信信号异步不同程序单元之间过程中无需通信协调，也能完成任务的方式，不相关的程序单元之间可以是异步
复制链接

扫一扫

专栏目录