第十章 aiohttp：加速加速

最新推荐文章于 2023-11-17 11:11:07 发布

狮范客

最新推荐文章于 2023-11-17 11:11:07 发布

阅读量311

点赞数

分类专栏：爬虫文章标签：网络百度 python java http

本文链接：https://blog.csdn.net/weixin_47388144/article/details/106539574

版权

爬虫专栏收录该内容

14 篇文章 3 订阅

订阅专栏

在这里插入图片描述

简介

asyncio实现了TCP、UDP、SSL等协议，aiohttp则是基于asyncio实现的异步请求HTTP框架。可以用于爬虫三部曲的第一步，替换之前的requests请求，进行请求加速，后边解析存储还是不变。

中文文档地址： https://www.cntofu.com/book/127/index.html

安装

pip install aiohttp

但是使用清华源更快：
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests

知识

这个库，基本上就是requests的异步版，语法使用方式，几乎一样。

Session机制

和requests的session类似。

aiohttp.ClientSession. 首先要建立一个session对象，然后用该session对象去打开网页。

import asyncio
import aiohttp
from aiohttp import ClientSession,ClientResponse
import time
import requests

# aiohttp实现
async def print_page():
    async with aiohttp.ClientSession() as session:
        async with session.get("http://www.baidu.com") as resp:
            print(resp.status)
            body = await resp.read()
            print("响应内容长度：",len(body))

startTime = time.time()
loop = asyncio.get_event_loop()
tasks = asyncio.wait([print_page() for i in range(100)])
loop.run_until_complete(tasks)
endTime = time.time()
print("aiohttp库请求百度首页消耗时间:",endTime - startTime)

这里顺便也记录了一下消耗的时间，大家可以通过简单的requests的代码，直观的比较aiohttp和requests的区别

# requests实现
startTime = time.time()
for i in range(100):
    response = requests.get("http://www.baidu.com")
    print(response.status_code,len(response.content))
endTime = time.time()
print("requests库请求百度首页消耗时间:",endTime - startTime)

虽然结果会受到各种各样的因素影响，尤其是网络因素，但是明显aiohttp还是完胜的。

添加请求头

这个比较简单，将headers放于session.get/post的选项中即可。注意headers数据要是dict格式。

import asyncio
import aiohttp

# aiohttp实现
async def print_page():
    headers = {
        "User-Agent":"my test UA"
    }
    async with aiohttp.ClientSession() as session:
        async with session.get("http://127.0.0.1:5000/",headers=headers) as resp:
            print(resp.status)
            body = await resp.read()
            print("响应内容长度：",len(body))
            print(body.decode())

loop = asyncio.get_event_loop()
loop.run_until_complete(print_page())
loop.close()

添加Cookie

import asyncio
import aiohttp

# aiohttp实现
async def print_page():
    cookies = {
        "keyA": "valueA",
        "keyB": "valueB",
        "keyC": "valueC",
    }
    async with aiohttp.ClientSession() as session:
        async with session.get("http://127.0.0.1:5000/Cookie",cookies = cookies) as resp:
            print(resp.status)
            body = await resp.read()
            print("响应内容长度：",len(body))
            print(body.decode())

loop = asyncio.get_event_loop()
loop.run_until_complete(print_page())
loop.close()

添加代理

import asyncio
import aiohttp

# aiohttp实现
async def print_page():
    headers = {
        "User-Agent":"my test UA"
    }
    async with aiohttp.ClientSession() as session:
        async with session.get("http://ip.27399.com/",proxy="http://47.95.249.140:8118") as resp:
            print(resp.status)
            body = await resp.read()
            print("响应内容长度：",len(body))
            print(body.decode())

loop = asyncio.get_event_loop()
loop.run_until_complete(print_page())
loop.close()

任务

可以拿之前的一些练习，测试对比一下requests与aiohttp的差别

狮范客

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
第十章 aiohttp：加速加速

简介asyncio实现了TCP、UDP、SSL等协议，aiohttp则是基于asyncio实现的异步请求HTTP框架。可以用于爬虫三部曲的第一步，替换之前的requests请求，进行请求加速，后边解析存储还是不变。中文文档地址： https://www.cntofu.com/book/127/index.html安装pip install aiohttp但是使用清华源更快：pip install -i https://pypi.tuna.tsinghua.edu.cn/simple reque.
复制链接

扫一扫