python 协程池gevent.pool_python 并发专题(六):协程相关函数以及实现(gevent)...

一、协程实现

线程和协程

既然我们上面也说了,协程也被称为微线程,下面对比一下协程和线程:

线程之间需要上下文切换成本相对协程来说是比较高的,尤其在开启线程较多时,但协程的切换成本非常低。

同样的线程的切换更多的是靠操作系统来控制,而协程的执行由我们自己控制

我们通过下面的图更容易理解:

2b5409a453bcc14451b6c9eba09c2a73.png

21a076dba9dbbba39da85fe356ac8fa2.png

从上图可以看出,协程只是在单一的线程里不同的协程之间切换,其实和线程很像,线程是在一个进程下,不同的线程之间做切换,这也可能是协程称为微线程的原因吧

继续分析协程:

1afc16edfd34bf730cdef4b8f6e3657d.png

既然Gevent用的是Greenlet,我们通过下图来理解greenlet:

510ffa6d841926d90ecd51b5c3b9c4fe.png

每个协程都有一个parent,最顶层的协程就是man thread或者是当前的线程,每个协程遇到IO的时候就把控制权交给最顶层的协程,它会看那个协程的IO event已经完成,就将控制权给它。

from greenlet importgreenletdeftest1(x,y):

z= gr2.switch(x+y)print(z)deftest2(u):print(u)

gr1.switch(42)

gr1=greenlet(test1)

gr2=greenlet(test2)

gr1.switch("hello",'world')

greenlet(run=None, parent=None): 创建一个greenlet实例.

gr.parent:每一个协程都有一个父协程,当前协程结束后会回到父协程中执行,该 属性默认是创建该协程的协程.

gr.run: 该属性是协程实际运行的代码. run方法结束了,那么该协程也就结束了.

gr.switch(*args, **kwargs): 切换到gr协程.

gr.throw(): 切换到gr协程,接着抛出一个异常.

下面是gevent的一个例子:

importgeventdeffunc1():print("start func1")

gevent.sleep(1)print("end func1")deffunc2():print("start func2")

gevent.sleep(1)print("end func2")

gevent.joinall(

[

gevent.spawn(func1),

gevent.spawn(func2)

]

)

二、多协程

简单的多协程

importgeventdeffunc1():print("start func1")

gevent.sleep(1)print("end func1")deffunc2():print("start func2")

gevent.sleep(1)print("end func2")

gevent.joinall(

[

gevent.spawn(func1),

gevent.spawn(func2)

]

)

joinall(greenlets, timeout=None, raise_error=False, count=None)

Wait for the greenlets to finish.

Parametersgreenlets – A sequence (supporting len()) of greenlets to wait for.

timeout (float) – If given, the maximum number of seconds to wait.ReturnsA sequence of the greenlets that finished before the timeout (if any) expired

wait(objects=None, timeout=None, count=None)

Wait for objects to become ready or for event loop to finish.

协程间的通信

importgeventfrom gevent.queue importQueue

tasks=Queue()defworker(n):while nottasks.empty():

task=tasks.get()print('Worker %s got task %s' %(n, task))

gevent.sleep(0)print('Quitting time!')defboss():for i in xrange(1,25):

tasks.put_nowait(i)

gevent.spawn(boss).join()

gevent.joinall([

gevent.spawn(worker,'steve'),

gevent.spawn(worker,'john'),

gevent.spawn(worker,'nancy'),

])

Worker steve got task 1Worker john got task2Worker nancy got task3Worker steve got task4Worker john got task5Worker nancy got task6Worker steve got task7Worker john got task8Worker nancy got task9Worker steve got task10Worker john got task11Worker nancy got task12Worker steve got task13Worker john got task14Worker nancy got task15Worker steve got task16Worker john got task17Worker nancy got task18Worker steve got task19Worker john got task20Worker nancy got task21Worker steve got task22Worker john got task23Worker nancy got task24Quitting time!

Quitting time!

Quitting time!

full()Return True if the queue is full, False otherwise.

Queue(None) is never full.get(block=True, timeout=None)Remove and return an item from the queue.

If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).get_nowait()Remove and return an item from the queue without blocking.

Only get an item if one is immediately available. Otherwise raise the Empty exception.peek(block=True, timeout=None)Return an item from the queue without removing it.

If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).peek_nowait()Return an item from the queue without blocking.

Only return an item if one is immediately available. Otherwise raise the Empty exception.put(item, block=True, timeout=None)Put an item into the queue.

If optional arg block is true and timeout is None (the default), block if necessary until a free slot is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Full exception if no free slot was available within that time. Otherwise (block is false), put an item on the queue if a free slot is immediately available, else raise the Full exception (timeout is ignored in that case).put_nowait(item)Put an item into the queue without blocking.

Only enqueue the item if a free slot is immediately available. Otherwise raise the Full exception.qsize()Return the size of the queue.

三、协程池

from __future__ importprint_functionimporttimeimportgeventfrom gevent.threadpool importThreadPool

pool= ThreadPool(3)

start=time.time()for _ in range(4):

pool.spawn(time.sleep,1)

gevent.wait()

delay= time.time() -startprint('Running "time.sleep(1)" 4 times with 3 threads. Should take about 2 seconds: %.3fs' % delay)

spawn(func, *args, **kwargs)

Add a new task to the threadpool that will run func(*args, **kwargs).

Waits until a slot is available. Creates a new native thread if necessary.

join()

Waits until all outstanding tasks have been completed.

四、协程爬虫实现

普通多协程版本

importgevent

from gevent importmonkey

importreimporturllib.requestfrom lxml importetreefrom lxml.cssselect importCSSSelectorimportlxml.htmlfrom lxml importetreefrom lxml.html.clean importCleanerimportstringimportrequestsimportjsonimportzipfile, ioimportmathimporttimefrom gevent.queue importQueue

HEADERS= {#'Accept':"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",

'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50',

}#Thread-local state to stored information on locks already acquired

defstart_urls(tasks,total_page):#生产者 产生用于消费的urls任务列表

url= "https://api.bilibili.com/x/v2/reply?jsonp=jsonp&pn={}&type=1&oid=455312953&sort=2&_=1587372277524"

for i in range(1,total_page+1):

tasks.put(url.format(i))returntasksdefinit_start():#获取评论列表的总页数

url = "https://api.bilibili.com/x/v2/reply?jsonp=jsonp&pn=1&type=1&oid=455312953&sort=2&_=1587372277524"content=downloader(url)

data=json.loads(content.text)

total_page= math.ceil(int(data['data']['page']['count'])/int(data['data']['page']['size']))print(total_page)returntotal_pagedefdownloader(url):#下载任务

content = requests.get(url,headers=HEADERS)print(content.status_code,type(content.status_code))returncontentdefwork(tasks,n):#消费者

while nottasks.empty():

gevent.sleep(1)try:

url=tasks.get()exceptException as e:print('e',e)continue

print(url)

data=downloader(url)if __name__ == '__main__':

total_page=init_start()

tasks=Queue()

task_urls=start_urls(tasks,total_page)

gevent.joinall([gevent.spawn(work,task_urls,i)for i in range(3)])

协程池版本

importgeventfrom gevent importmonkey

monkey.patchall()importtimeimportjsonfrom gevent.queue importQueuefrom gevent import poolimportrequestsimportmath#

HEADERS= {#'Accept':"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",

'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50',

}#Thread-local state to stored information on locks already acquired

defstart_urls(tasks,total_page):#生产者 产生用于消费的urls任务列表

url= "https://api.bilibili.com/x/v2/reply?jsonp=jsonp&pn={}&type=1&oid=455312953&sort=2&_=1587372277524"

for i in range(1,total_page+1):

tasks.put(url.format(i))returntasksdefinit_start():#获取评论列表的总页数

url = "https://api.bilibili.com/x/v2/reply?jsonp=jsonp&pn=1&type=1&oid=455312953&sort=2&_=1587372277524"content=downloader(url)

data=json.loads(content.text)

total_page= math.ceil(int(data['data']['page']['count'])/int(data['data']['page']['size']))print(total_page)returntotal_pagedefdownloader(url):#下载任务

content = requests.get(url,headers=HEADERS)print(content.status_code,type(content.status_code))returncontentdefwork(tasks,n):#消费者

while nottasks.empty():

time.sleep(1)try:

url=tasks.get()exceptException as e:print('e',e)continue

print(url)

data=downloader(url)if __name__ == '__main__':

total_page=init_start()

tasks=Queue()

task_urls=start_urls(tasks,total_page)

pool= pool.Pool(3)for i in range(3):

pool.spawn(work,task_urls,i)

pool.join()

五、web服务器与客户端实现

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值