python aiohttp_Python-异步之aiohttp

最新推荐文章于 2023-12-23 22:24:10 发布

weixin_39864101

最新推荐文章于 2023-12-23 22:24:10 发布

阅读量242

点赞数

文章标签： python aiohttp

什么是 aiohttp？一个异步的 HTTP 客户端\服务端框架，基于 asyncio 的异步模块。可用于实现异步爬虫，更快于 requests 的同步爬虫。

aiohttp 和 requests

requests 版爬虫

requests 同步方式连续 30 次简单爬取http://httpbin.org网站

importrequestsfrom datetime importdatetimedeffetch(url):

r=requests.get(url)print(r.text)

start=datetime.now()for i in range(30):

fetch('http://httpbin.org/get')

end=datetime.now()print("requests版爬虫花费时间为：")print(end - start)

从爬取结果可以看出，同步爬取 30 次网站将花费 43 秒左右的时间，耗时非常长。

aiohttp 版爬虫

使用 aiohttp 和 asyncio 异步方式简单爬取 30 次网站

importaiohttpimportasynciofrom datetime importdatetime

asyncdeffetch(client):

async with client.get('http://httpbin.org/get') as resp:assert resp.status == 200

returnawait resp.text()

asyncdefmain():

async with aiohttp.ClientSession() as client:

html=await fetch(client)print(html)

loop=asyncio.get_event_loop()

tasks=[]for i in range(30):

task=loop.create_task(main())

tasks.append(task)

start=datetime.now()

loop.run_until_complete(main())

end=datetime.now()print("aiohttp版爬虫花费时间为：")print(end - start)

从爬取时间可以看出，aiohttp 异步爬取网站只用了 0.5 秒左右的时间，比 requests 同步方式快了 80 倍左右，速度非常之快。

同一个 session

aiohttp.ClientSession() 中封装了一个 session 的连接池，并且在默认情况下支持 keepalives，官方建议在程序中使用单个 ClientSession 对象，而不是像上面示例中的那样每次连接都创建一个 ClientSession 对象，除非在程序中遇到大量的不同的服务。

将上面的示例修改为：

importaiohttpimportasynciofrom datetime importdatetime

asyncdeffetch(client):print("打印 ClientSession 对象")print(client)

async with client.get('http://httpbin.org/get') as resp:assert resp.status == 200

returnawait resp.text()

asyncdefmain():

async with aiohttp.ClientSession() as client:

tasks=[]for i in range(30):

tasks.append(asyncio.create_task(fetch(client)))

await asyncio.wait(tasks)

loop=asyncio.get_event_loop()

start=datetime.now()

loop.run_until_complete(main())

end=datetime.now()print("aiohttp版爬虫花费时间为：")print(end - start)

示例结果

#重复30遍

打印 ClientSession 对象aiohttp版爬虫花费时间为：

0:00:01.778045

从上面爬取的时间可以看出单个 ClientSession 对象比多个 ClientSession 对象多花了 3 倍时间。ClientSession 对象一直是同一个 0x1094aff98。

返回值

Json 串

在上面的示例中使用 response.text() 函数返回爬取到的内容，aiohttp 在处理 Json 返回值的时候，可以直接将字符串转换为 Json。

async deffetch(client):

async with client.get('http://httpbin.org/get') as resp:return await resp.json()

示例结果

{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'Python/3.7 aiohttp/3.6.2'}, 'origin': '49.80.42.33, 49.80.42.33', 'url': 'https://httpbin.org/get'}

当返回的 Json 串不是一个标准的 Json 时，resp.json() 函数可以传递一个函数对 json 进行预处理，如：resp.json(replace(a, b))，replace()函数表示 a 替换为 b。

字节流

aiohttp 使用 response.read() 函数处理字节流，使用 with open() 方式保存文件或者图片

async deffetch(client):

async with client.get('http://httpbin.org/image/png') as resp:returnawait resp.read()

asyncdefmain():

async with aiohttp.ClientSession() as client:

image=await fetch(client)

with open("/Users/xxx/Desktop/image.png", 'wb') as f:

f.write(image)

response.read() 函数可以传递数字参数用于读取多少个字节，如：response.read(3)读取前 3 个字节。

参数

aiohttp 可以使用 3 种方式在 URL 地址中传递参数

async deffetch(client):

params= [('a', 1), ('b', 2)]

async with client.get('http://httpbin.org/get',params=params) as resp:return await resp.text()

示例 URL 地址

http://httpbin.org/get?a=1&b=2asyncdeffetch(client):

params= {"a": 1,"b": 2}

async with client.get('http://httpbin.org/get',params=params) as resp:return await resp.text()

示例 URL 地址

http://httpbin.org/get?a=1&b=2asyncdeffetch(client):

async with client.get('http://httpbin.org/get',params='q=aiohttp+python&a=1') as resp:return await resp.text()

请求头

aiohttp 在自定义请求头时，类似于向 URL 传递参数的方式

async deffetch(client):

headers= {'content-type': 'application/json', 'User-Agent': 'Python/3.7 aiohttp/3.7.2'}

async with client.get('http://httpbin.org/get',headers=headers) as resp:return await resp.text()

cookies 是整个会话共用的，所以应该在初始化 ClientSession 对象时传递

async deffetch(client):

async with client.get('http://httpbin.org/get') as resp:returnawait resp.text()

asyncdefmain():

cookies= {'cookies': 'this is cookies'}

async with aiohttp.ClientSession(cookies=cookies) as client:

html=await fetch(client)print(html)

POST 方式

在前面的示例中都是以 GET 方式提交请求，下面用 POST 方式请求

async deffetch(client):

data= {'a': '1', 'b': '2'}

async with client.post('http://httpbin.org/post', data =data) as resp:return await resp.text()

示例结果

{"args": {},"data": "","files": {},"form": {"a": "1","b": "2"},"headers": {"Accept": "*/*","Accept-Encoding": "gzip, deflate","Content-Length": "7","Content-Type": "application/x-www-form-urlencoded","Host": "httpbin.org","User-Agent": "Python/3.7 aiohttp/3.6.2"},"json": null,"origin": "49.80.42.33, 49.80.42.33","url": "https://httpbin.org/post"}

aiohttp版爬虫花费时间为：

0:00:00.514402

超时

在请求网站时，有时会遇到超时问题，aiohttp 中使用 timeout 参数设置，单位为秒数，aiohttp 默认超时时间为 5 分钟

async deffetch(client):

async with client.get('http://httpbin.org/get', timeout=60) as resp:return await resp.text()

weixin_39864101

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python aiohttp_Python-异步之aiohttp

什么是 aiohttp？一个异步的 HTTP 客户端\服务端框架，基于 asyncio 的异步模块。可用于实现异步爬虫，更快于 requests 的同步爬虫。aiohttp 和 requestsrequests 版爬虫requests 同步方式连续 30 次简单爬取http://httpbin.org网站importrequestsfrom datetime importdatetimedeffe...
复制链接

扫一扫