Python httpx库的使用

程序猿学习

已于 2023-08-15 15:15:27 修改

阅读量499

点赞数

分类专栏：爬虫 Python 文章标签： python 爬虫

于 2023-08-11 08:01:03 首次发布

本文链接：https://blog.csdn.net/Mountain_tai_li/article/details/132242267

版权

Python 同时被 2 个专栏收录

10 篇文章

订阅专栏

爬虫

2 篇文章

订阅专栏

爬虫时，urllib 与 requests 库只支持HTTP/1.1，有些网站强制使用 HTTP/2.0 访问协议，则此时 urllib 与 requests 将无能为力。目前来说，支持 HTTP/2.0 的请求库使用较多的是 hyper 和 httpx，其中 httpx 使用起来更为方便，功能也更强大，几乎支持了 requests 已有的所有功能。

1、安装

python 版本需为 3.6 及以上，通过如下命令安装：

pip3 install httpx[http2]

2、使用

httpx 用法与 requests 相似，基本使用方法如下：

# *********************************************
#   Basic use of httpx
# *********************************************
import httpx
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ApppleWebKit/537.36 (KHTML, like Gecko)'
                         'Chrome/90.0.4430.93 Safari/537.36'}
url = 'https://www.httpbin.org/get'
response = httpx.get(url=url, headers=headers)
print(response.text)

运行输出结果如下：

{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "www.httpbin.org",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ApppleWebKit/537.36 (KHTML, like Gecko)Chrome/90.0.4430.93 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-64d344dc-766eeeb637fbfa915a2e6b0f"
  },
  "origin": "xxx.xxx.12.60",
  "url": "https://www.httpbin.org/get"
}

httpx 请求 HTTP/2.0 网站，使用方法如下：

import httpx
url = 'https://spa16.scrape.center/'
client = httpx.Client(http2=True)
response = client.get(url)
print(response.text)

运行输出结果如下：

<!DOCTYPE html><html lang=en><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><meta name=referrer content=no-referrer><link rel=icon href=/favicon.ico><title>Scrape | Book</title><link href=/css/chunk-50522e84.e4e1dae6.css rel=prefetch><link href=/css/chunk-f52d396c.4f574d24.css rel=prefetch><link href=/js/chunk-50522e84.6b3e24aa.js rel=prefetch><link href=/js/chunk-f52d396c.f8f41620.js rel=prefetch><link href=/css/app.ea9d802a.css rel=preload as=style><link href=/js/app.b93891e2.js rel=preload as=script><link href=/js/chunk-vendors.a02ff921.js rel=preload as=script><link href=/css/app.ea9d802a.css rel=stylesheet></head><body><noscript><strong>We're sorry but portal doesn't work properly without JavaScript enabled. Please enable it to continue.</strong></noscript><div id=app></div><script src=/js/chunk-vendors.a02ff921.js></script><script src=/js/app.b93891e2.js></script></body></html>

httpx 默认使用的是HTTP/1.1，是不开启对 HTTP/2.0 支持的。若要开启 HTTP/2.0 支持，可以使用 httpx.Client(http2=True)。

使用下列属性或方法，可以获取想要的信息：

status_code：状态码
text：文本内容
content：相应的二进制内容，可以获取请求目标的二进制数据
headers：响应头 Headers 对象
json：可以将文本结果转化为 JSON 对象

3、Client 对象

httpx.Client 对象，使用方法与 requests 的 Session 类似：

# *********************************************
#   use of httpx.Client
# *********************************************
import httpx
with httpx.Client() as client:
    response = client.get('https://www.httpbin.org/get')
    print(response.text)

运行输出结果如下：

{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "www.httpbin.org",
    "User-Agent": "python-httpx/0.24.1",
    "X-Amzn-Trace-Id": "Root=1-64d43731-7ee43b435ecd09e174f2d47a"
  },
  "origin": "xxx.xxx.12.60",
  "url": "https://www.httpbin.org/get"
}

4、异步请求AsyncClient

httpx 支持异步客户端请求，支持 python 的 async 请求模式，使用方法如下：

# *********************************************
#   httpx.AsyncClient
# *********************************************
import httpx
import asyncio
async def fetch(url):
    async with httpx.AsyncClient(http2=True) as asyncClient:
        response = await asyncClient.get(url)
        print(response.text)
if __name__ == '__main__':
    url = "https://www.httpbin.org/get"
    asyncio.get_event_loop().run_until_complete(fetch(url))

运行输出结果如下：

{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "www.httpbin.org",
    "User-Agent": "python-httpx/0.24.1",
    "X-Amzn-Trace-Id": "Root=1-64d43b02-64f889d97b6deba3369a2f6f"
  },
  "origin": "xxx.xxx.12.60",
  "url": "https://www.httpbin.org/get"
}

后续公众号会发布系列教程，更多内容请关注公众号：程序猿学习日记