python（34）-高并发-协程-Gevent-爬虫

最新推荐文章于 2024-05-25 14:50:16 发布

多云的夏天

最新推荐文章于 2024-05-25 14:50:16 发布

阅读量351

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/aggie4628/article/details/105340808

版权

python 专栏收录该内容

91 篇文章 3 订阅

订阅专栏

爬虫有很多现成的第三方库。如果自己用Gevent写一个要注意一点：
gevent 检测不到 urllib的IO操作 所以不会进行切换，始终会串行的。那如果让gevnet知道urllib的IO操作呢？需要加入 monkey
好可爱的名字是不是。
1.加入MONKEY前，以下代码运行时间是差不多的。

import gevent
import time
from urllib.request import urlopen
urls=[
    'https://www.python.org/',
    'https://www.yahoo.com/',
    'https://github.com/'
]
def f(url):
    print('GET: %s' % url)
    resp = urlopen(url)          #1.打开一个url
    data = resp.read()           #2.请求结果 此data就是下载的网页
    print('%d bytes received from %s.' % (len(data), url))


#同步代码
time_start=time.time()
for url in urls:
    f(url)
print("同步cost",time.time()-time_start)

#异步代码
async_time_start=time.time()
gevent.joinall([
    gevent.spawn(f, 'https://www.python.org/'),
    gevent.spawn(f, 'https://www.yahoo.com/'),
    gevent.spawn(f, 'https://github.com/'),
])
print("异步cost",time.time()-async_time_start)

2.加入MONKEY后，明显短了。

import gevent
import time
from urllib.request import urlopen
from gevent import monkey;

monkey.patch_all()

urls=[
    'https://www.python.org/',
    'https://www.yahoo.com/',
    'https://github.com/'
]
def f(url):
    print('GET: %s' % url)
    resp = urlopen(url)          #1.打开一个url
    data = resp.read()           #2.请求结果 此data就是下载的网页
    print('%d bytes received from %s.' % (len(data), url))


#同步代码
time_start=time.time()
for url in urls:
    f(url)
print("同步cost",time.time()-time_start)

#异步代码
async_time_start=time.time()
gevent.joinall([
    gevent.spawn(f, 'https://www.python.org/'),
    gevent.spawn(f, 'https://www.yahoo.com/'),
    gevent.spawn(f, 'https://github.com/'),
])
print("异步cost",time.time()-async_time_start)

多云的夏天

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python（34）-高并发-协程-Gevent-爬虫

爬虫有很多现成的第三方库。如果自己用Gevent写一个要注意一点：gevent 检测不到 urllib的IO操作所以不会进行切换，始终会串行的。那如果让gevnet知道urllib的IO操作呢？需要加入 monkey好可爱的名字是不是。1.加入MONKEY前，以下代码运行时间是差不多的。import geventimport timefrom urllib.request imp...
复制链接

扫一扫

专栏目录