一文了解Python中的requests库：网络交互的基础

橙色小博

于 2025-05-09 11:29:48 发布

阅读量859

点赞数 13

分类专栏： python的学习之旅硬件与网络文章标签： python 网络开发语言网络协议 requests python库

本文链接：https://blog.csdn.net/m0_69722969/article/details/147815340

版权

python的学习之旅同时被 2 个专栏收录

30 篇文章

订阅专栏

硬件与网络

11 篇文章

订阅专栏

1. 前言

在Python中，requests库帮我们轻松打开与外部世界交互的大门，无论是获取网页内容、与API交互，还是处理各种网络请求，requests都能以它简洁而强大的方式为我们提供服务。对于初学者来说，requests库的吸引力在于它无需深入了解HTTP协议的复杂细节就能高效工作。而对于高级用户，它提供了足够的灵活性和扩展性来处理复杂的网络场景。

不了解HTTP及HTTPS协议，可以去阅读：
《HTTP协议：原理、应用与python实践》

《HTTPS协议：更安全的HTTP》

2. requests库的基本概念

requests是一个Python第三方库，用于向URL发送HTTP请求。它封装了底层的HTTP请求逻辑，使我们能够用简洁的代码与服务器进行通信。

为什么选择requests？

简洁直观的API
支持多种HTTP请求方法（GET、POST、PUT、DELETE等）
自动处理Cookie
支持文件上传和下载
支持SSL证书验证
可扩展性好

3. requests库的适应场景

requests库几乎适用于所有需要与服务器通信的Python应用场景：

网页内容抓取：获取网页HTML内容进行分析
API集成：与RESTful API通信，获取或发送数据
文件下载：从服务器下载文件
Web应用测试：模拟用户请求测试Web应用
数据交互：与外部服务进行JSON或XML数据交换

4. requests库的基本使用

4.1 安装requests

# 使用pip安装
pip install requests

4.2 发送第一个请求

import requests

# 发送GET请求
response = requests.get('https://api.github.com')
print(response.status_code)  # 打印状态码
print(response.text)         # 打印响应内容

4.3 常见HTTP请求方法

# GET请求
response = requests.get('https://api.github.com/events')

# POST请求
response = requests.post('https://httpbin.org/post', data={'key': 'value'})

# PUT请求
response = requests.put('https://httpbin.org/put', data={'key': 'value'})

# DELETE请求
response = requests.delete('https://httpbin.org/delete')

# OPTIONS请求
response = requests.options('https://httpbin.org/get')

# HEAD请求
response = requests.head('https://httpbin.org/get')

GET请求：用于从服务器获取指定资源的内容。
POST请求：向服务器提交数据以创建新的资源。
PUT请求：向服务器提交数据以更新现有资源的状态或内容。
DELETE请求：请求服务器删除指定的资源。
OPTIONS请求：请求服务器返回指定资源所支持的HTTP方法及相关信息。
HEAD请求：请求服务器返回与GET请求相同的信息，但不包含实体主体内容，通常用于获取资源的元信息。

4.4 响应对象的属性

# 状态码
print(response.status_code)

# 响应头
print(response.headers)

# 响应内容 - 字节形式
print(response.content)

# 响应内容 - 字符串形式，使用响应头指定的编码
print(response.text)

# JSON响应内容
print(response.json())

# URL
print(response.url)

# 请求历史
print(response.history)

# Cookie
print(response.cookies)

4.5 发送带参数的请求

URL参数

# 手动构建URL
response = requests.get('https://httpbin.org/get?name=John&age=30')

# 使用params参数
params = {'name': 'John', 'age': 30}
response = requests.get('https://httpbin.org/get', params=params)
print(response.url)  # https://httpbin.org/get?name=John&age=30

POST请求数据

# 发送表单数据
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.post('https://httpbin.org/post', data=payload)

# 发送JSON数据
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.post('https://httpbin.org/post', json=payload)

设置请求头

headers = {
    'User-Agent': 'My-App/1.0',
    'Accept': 'application/json',
    'Authorization': 'Bearer YOUR_ACCESS_TOKEN'
}

response = requests.get('https://api.example.com/data', headers=headers)

4.6 处理请求和响应

设置超时

# 设置请求的超时时间（秒）
response = requests.get('https://api.example.com/data', timeout=5)

验证SSL证书

# 默认会验证SSL证书
response = requests.get('https://api.example.com/data', verify=True)

# 忽略证书验证（不推荐用于生产环境）
response = requests.get('https://api.example.com/data', verify=False)

处理重定向

# 默认允许重定向
response = requests.get('http://github.com')

# 禁止重定向
response = requests.get('http://github.com', allow_redirects=False)

5. 高级功能

5.1 文件上传

# 单个文件上传
files = {'file': open('example.txt', 'rb')}
response = requests.post('https://httpbin.org/post', files=files)

# 多个文件上传
files = [
    ('image1', ('photo1.jpg', open('photo1.jpg', 'rb'), 'image/jpeg')),
    ('image2', ('photo2.jpg', open('photo2.jpg', 'rb'), 'image/jpeg'))
]
response = requests.post('https://httpbin.org/post', files=files)

image1 和 image2：表示上传文件时的字段名，用于标识每个文件。这些字段名通常由服务器端指定或根据API要求设置。
photo1.jpg 和 photo2.jpg：表示要上传的文件名。这些文件名可以是实际文件的名称，也可以是自定义的名称，用于在服务器端标识上传的文件。
open('photo1.jpg', 'rb') 和 open('photo2.jpg', 'rb')：用于以二进制读取模式打开本地文件，确保文件内容被正确读取并上传。'rb' 表示以二进制格式读取文件。
'image/jpeg'：表示文件的 MIME 类型，告诉服务器文件的格式。对于 JPEG 图像，MIME 类型是 image/jpeg。

5.2 会话对象

# 创建会话对象，可在同一会话中保持Cookie等状态
with requests.Session() as s:
    # 登录请求
    payload = {'username': 'user', 'password': 'pass'}
    s.post('https://httpbin.org/post', data=payload)
    
    # 后续请求将自动使用之前的Cookie
    response = s.get('https://httpbin.org/get')

5.3 处理异常

try:
    response = requests.get('https://api.example.com/data')
    response.raise_for_status()  # 如果返回码不是200，将抛出HTTPError异常
except requests.exceptions.HTTPError as errh:
    print(f"HTTP Error: {errh}")
except requests.exceptions.ConnectionError as errc:
    print(f"Error Connecting: {errc}")
except requests.exceptions.Timeout as errt:
    print(f"Timeout Error: {errt}")
except requests.exceptions.RequestException as err:
    print(f"OOps: Something Else: {err}")

5.4 代理支持

proxies = {
    'http': 'http://10.10.1.10:3128',
    'https': 'http://10.10.1.10:1080',
}

response = requests.get('http://example.org', proxies=proxies)

通过代理服务器发送请求：当直接访问目标网址受限时（如企业网络环境、需要翻墙访问外部网站等），可以通过代理服务器中转请求。
隐藏真实IP地址：使用代理服务器可以隐藏客户端的真实IP地址，保护隐私或绕过IP限制。

6. 实际应用案例

为了更好的理解requests库，本文介绍了几个经典案例。

6.1 获取天气数据

# 获取天气API数据
api_key = 'your_api_key'
city = 'Beijing'

url = f'https://api.openweathermap.org/data/2.5/weather?q={city}&appid={api_key}&units=metric'

response = requests.get(url)
data = response.json()

print(f"当前{city}的天气状况:")
print(f"温度: {data['main']['temp']}°C")
print(f"天气: {data['weather'][0]['description']}")
print(f"湿度: {data['main']['humidity']}%")

6.2 下载文件

# 下载大文件并保存
url = 'https://example.com/largefile.zip'
response = requests.get(url, stream=True)

with open('largefile.zip', 'wb') as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)

iter_content(chunk_size=8192) 方法将响应内容分成多个块（chunks），每个块的大小为 8192 字节（8 KB）。这样可以逐步读取大文件内容，避免一次性加载整个文件到内存中，从而节省内存。

6.3 使用API认证

# 使用Basic Auth认证
url = 'https://api.example.com/protected'
auth = ('username', 'password')
response = requests.get(url, auth=auth)

# 使用OAuth2认证
import requests_oauthlib

redirect_uri = 'http://localhost:8000/callback'
oauth = requests_oauthlib.OAuth2Session('client_id', redirect_uri=redirect_uri)
authorization_url, state = oauth.authorization_url('https://provider.com/oauth2/authorize')

创建一个 OAuth2Session 对象，指定客户端ID为 'client_id'，重定向URI为 'http://localhost:8000/callback'。
使用 authorization_url 方法生成OAuth2授权URL，并返回该URL和授权状态state。

7. 性能优化技巧

使用会话对象

会话对象在多个请求之间复用底层TCP连接，提高性能。

with requests.Session() as s:
    s.headers.update({'User-Agent': 'My-App/1.0'})
    
    # 发送多个请求
    response1 = s.get('https://api.example.com/v1/data')
    response2 = s.get('https://api.example.com/v1/more-data')

流式请求

对于大型文件，使用流式请求避免一次性加载全部内容。

response = requests.get('https://api.example.com/large-file', stream=True)

for chunk in response.iter_content(chunk_size=1024):
    if chunk:
        process_chunk(chunk)

连接池

使用适配器配置连接池大小。

from requests.adapters import HTTPAdapter

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=100, pool_maxsize=100))

response = s.get('https://api.example.com/data')

将一个配置好的 HTTPAdapter 对象挂载到会话 s 上，用于处理所有以 'https://' 开头的请求。
pool_connections=100：指定连接池中允许的连接数为 100。
pool_maxsize=100：指定连接池中每个主机的最大连接数为 100。

8. 网络请求注意事项

始终处理异常：网络请求可能会失败，确保处理可能的异常
设置合理的超时时间：避免请求无限挂起
验证SSL证书：在生产环境中不要禁用SSL验证
处理重定向：在不需要重定向的情况下禁用它以提高性能
不要滥用会话对象：在同一会话中只发送相关请求
清理资源：使用流式请求处理大文件时及时释放资源

9. 总结

requests库作为Python中处理HTTP请求的黄金标准，为我们提供了一种简单而强大的方式来与外部世界交互。无论您是初学者还是高级开发者，requests都能满足您的需求。通过本文的详细介绍，大家应该已经掌握了requests库的基本使用方法以及一些高级技巧。从简单的GET请求到复杂的认证和会话管理，requests都能轻松应对。在实际开发中，根据具体需求选择合适的API和方法，充分利用requests的强大功能，同时遵循注意事项来确保代码的健壮性和性能。我是橙色小博，关注我一起在人工智能领域学习进步！