Python中使用HTTPX：构建高效异步HTTP客户端

最新推荐文章于 2024-09-04 22:01:23 发布

傻啦嘿哟

最新推荐文章于 2024-09-04 22:01:23 发布

阅读量1.2k

点赞数 28

文章标签： python httpx http

本文链接：https://blog.csdn.net/weixin_43856625/article/details/141251283

版权

在Python的Web开发领域，HTTP客户端库扮演着至关重要的角色。随着异步编程的兴起，传统的HTTP客户端库如requests已难以满足现代Web应用对高并发、低延迟的需求。

这时，httpx作为一款现代化的HTTP客户端库，以其对异步请求的原生支持、灵活的API设计以及强大的功能特性，成为了开发者们的新宠。

本文将深入探讨httpx的使用方法、特性、优势，并通过丰富的代码示例和案例，帮助新手朋友快速上手。

一、HTTPX简介

httpx是一个功能强大的Python HTTP客户端库，它提供了同步和异步的HTTP客户端接口，支持HTTP/1.1和HTTP/2协议，以及WebSocket。httpx的设计初衷是为了解决requests库在异步编程方面的不足，同时保持与requests相似的API设计，使得开发者可以无缝迁移。

1.1 安装HTTPX

要使用httpx，首先需要将其安装到你的Python环境中。httpx支持Python 3.7及以上版本，可以通过pip进行安装：

pip install httpx

如果你需要httpx的异步功能，并且你的Python版本低于3.8（Python 3.8及以上版本内置了asyncio），你可能还需要安装aiohttp作为依赖：

pip install httpx[aiohttp]

二、HTTPX的基本使用

2.1 发送HTTP请求

httpx提供了多种HTTP请求方法，如get、post、put、delete等，这些方法与requests库非常相似。

2.1.1 GET请求

import httpx  

response = httpx.get('https://example.com')  

print(response.status_code)  

print(response.text)

2.1.2 POST请求

import httpx  

response = httpx.post('https://example.com', data={'key': 'value'})  

print(response.status_code)  

print(response.json())

2.2 异步HTTP请求

httpx的最大亮点之一是其对异步请求的原生支持。使用AsyncClient，可以轻松实现异步HTTP请求，提高程序的并发性能。

import httpx  

import asyncio  

async def fetch_data():  

    async with httpx.AsyncClient() as client:  

        response = await client.get('https://example.com')  

        print(response.status_code)  

        print(response.text)  

asyncio.run(fetch_data())

2.3 上下文管理器

httpx提供了上下文管理器来管理HTTP客户端的生命周期，确保资源在使用后被正确释放。

import httpx  

with httpx.Client() as client:  

    response = client.get('https://example.com')  

    print(response.status_code)  

    print(response.text)

三、HTTPX的高级特性

3.1 设置请求标头和参数

在发送HTTP请求时，经常需要设置请求标头（Headers）和查询参数（Params）。httpx允许通过传递headers和params参数来实现。

import httpx  

headers = {  

    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'  

}  

params = {'key1': 'value1', 'key2': 'value2'}  

response = httpx.get('https://example.com', headers=headers, params=params)  

print(response.status_code)  

print(response.text)

3.2 发送JSON数据

在发送POST或PUT请求时，经常需要发送JSON格式的数据。httpx允许通过传递json参数来自动将Python字典转换为JSON字符串。

import httpx  

data = {'key': 'value'}  

response = httpx.post('https://example.com', json=data)  

print(response.status_code)  

print(response.json())

3.3 设置超时和错误处理

在发送HTTP请求时，设置超时时间是一个好习惯，可以防止请求因网络问题而

无限期地挂起。httpx允许通过timeout参数来设置请求的超时时间。同时，它也提供了丰富的错误处理机制，帮助开发者优雅地处理各种异常情况。

3.3.1 设置超时

import httpx  

try:  

    response = httpx.get('https://example.com', timeout=5.0)  # 设置超时时间为5秒  

    print(response.status_code)  

except httpx.TimeoutException:  

    print("请求超时")  

except httpx.HTTPError as exc:  

    print(f"HTTP错误: {exc}")  

except Exception as e:  

    print(f"发生其他错误: {e}")

3.3.2 错误处理

httpx在请求失败时会抛出HTTPError异常，这个异常封装了HTTP响应的详细信息，包括状态码、响应内容等。开发者可以通过捕获这个异常并处理它，来实现自定义的错误处理逻辑。

import httpx  

try:  

    response = httpx.get('https://nonexistentdomain.com')  

    print(response.status_code)  

except httpx.HTTPError as exc:  

    if exc.response.status_code == 404:  

        print("资源未找到")  

    else:  

        print(f"HTTP错误: {exc.response.status_code}")  

except httpx.RequestException as e:  

    print(f"请求发生错误: {e}")

3.4 文件上传

httpx支持通过files参数来上传文件。这个参数接收一个字典，字典的键是文件字段的名称，值是一个元组，包含文件名、文件内容（字节流）和MIME类型（可选）。

import httpx  

files = {'file': ('report.xlsx', open('report.xlsx', 'rb'), 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')}  

response = httpx.post('https://example.com/upload', files=files)  

print(response.status_code)

3.5 响应内容处理

httpx的响应对象（Response）提供了多种方法来处理响应内容，如text（文本内容）、json()（JSON内容，自动解析为Python字典）、content（原始字节内容）等。

import httpx  

response = httpx.get('https://jsonplaceholder.typicode.com/todos/1')  

print(response.text)  # 打印文本内容  

print(response.json())  # 打印解析后的JSON内容  

print(response.content)  # 打印原始字节内容

四、案例：使用HTTPX构建Web爬虫

下面是一个使用httpx构建简单Web爬虫的案例。这个爬虫将访问一个网站，获取页面内容，并解析出页面中的链接。

import httpx  

import re  

def fetch_urls(url):  

    async with httpx.AsyncClient() as client:  

        response = await client.get(url)  

        response.raise_for_status()  # 如果响应状态码不是2xx，则抛出HTTPError  

       html = response.text  

        urls = re.findall(r'href="([^"]+)"', html)  # 使用正则表达式提取链接  

        # 过滤掉非绝对URL  

        absolute_urls = [url for url in urls if url.startswith('http')]  

        return absolute_urls

异步运行爬虫

import asyncio  

async def main():  

    urls = await fetch_urls('https://example.com')  

    for url in urls:  

        print(url)  

 asyncio.run(main())

注意：在实际应用中，使用正则表达式来解析HTML可能会遇到很多问题，比如HTML结构复杂、嵌套标签等。更推荐使用专门的HTML解析库，如BeautifulSoup或lxml，来解析HTML内容。

五、总结

httpx作为一款现代化的Python HTTP客户端库，以其对异步编程的原生支持、灵活的API设计以及强大的功能特性，为开发者们提供了更加高效、便捷的HTTP请求方式。通过本文的介绍，我们了解了httpx的基本使用方法、高级特性以及如何通过httpx构建Web爬虫等案例。希望这些内容能够帮助新手朋友们快速上手httpx，并在实际项目中灵活运用。