Python 爬虫与接口自动化必备Requests模块

最新推荐文章于 2024-03-19 17:23:09 发布

需要休息的KK.

最新推荐文章于 2024-03-19 17:23:09 发布

阅读量1k

点赞数 31

分类专栏： python 第三方库探索代码先拿去用不懂再来问文章标签： python 爬虫自动化个人开发 pycharm 学习

本文链接：https://blog.csdn.net/weixin_54217348/article/details/136482204

版权

python 同时被 3 个专栏收录

55 篇文章 3 订阅

订阅专栏

第三方库探索

46 篇文章 1 订阅

订阅专栏

代码先拿去用不懂再来问

9 篇文章 0 订阅

订阅专栏

文章开篇

Python的魅力，犹如星河璀璨，无尽无边；人生苦短、我用Python！

HTTP简介

HTTP（Hypertext Transfer Protocol）超文本传输协议；
HTTP是一个基于“请求与响应”模式的、无状态的应用层协议，采用URL作为定位网络资源的标识；
URL是通过HTTP协议存取资源的Internet路径，一个URL对应一个数据资源；
URL格式：http://host[:port][path]

**host：**合法的Internet主机域名或IP地址；
**port：**端口号，缺省端口为80；
**path：**请求资源的路径；

HTTP协议对资源的操作方法说明

**GET请求：**获取URL位置的资源；
**HEAD请求：**获取URL位置资源的响应消息报告，即获取该资源的头部信息；
**POST请求：**向URL位置的资源后附加新的数据；
**PUT请求：**向URL位置存储一个资源，覆盖原URL位置的资源；
**PATCH请求：**局部更新URL位置的资源，即改变该处资源的部分内容；
**DELETE请求：**删除URL位置存储的资源；

GET和POST请求的区别

GET请求

用于获取资源，当采用GET方式请求指定资源时，被访问的资源经服务器解析后立即返回响应内容。通常以GET方式请求特定资源时，请求中不应该包含请求体，所有需要向被请求资源传递的数据都应该通过 URL 向服务器传递；

POST请求

用于提交数据，当采用POST方式向指定位置提交数据时，数据被包含在请求体中，服务器接收到这些数据后可能会建立新的资源、也可能会更新已有的资源。同时POST方式的请求体可以包含非常多的数据，而且格式不限。因此POST方式用途较为广泛，几乎所有的提交操作都可以使用POST方式来完成；

本质上区别

GET产生一个 TCP数据包
POST产生两个 TCP数据包
GET方式请求，浏览器会把http header和data一并发送出去，服务器响应200（返回数据）；
POST方式请求，浏览器先发送header，服务器响应100 continue，浏览器再发送data，服务器响应200；

参数传递区别

GET请求将参数包含在URL中
POST请求通过request body传递参数
这意味着GET请求的参数会直接显示在浏览器的地址栏中，而POST请求的参数不会显示。

安全性区别

POST请求通常被认为比GET请求更安全，因为POST请求的数据不会像GET请求那样显示在URL中，从而减少了敏感信息泄露的风险。

缓存区别

GET请求的结果可以被浏览器缓存，这使得重复访问同一资源时可以更快地加载页面。
POST请求通常不会被缓存，因为每次提交的数据可能不同，缓存可能会导致数据不一致。

URL长度区别

由于GET请求的参数是附加在URL上的，所以URL的长度受到限制。
如果参数过多或过长，可能会导致服务器拒绝处理请求；
POST请求则没有这个问题，因为参数是在请求体中发送的；

在这里插入图片描述

HTTP常见请求参数

参数名称	参数描述
url	请求的目标地址
headers	请求头
data	发送
params	查询字符串
host	请求web服务器的域名地址
User-Agent	HTTP客户端运行的浏览器类型的详细信息。通过该头部信息，web服务器可以判断到当前HTTP请求的客户端浏览器类别
Accept	指定客户端能够接收的内容类型，内容类型中的先后次序表示客户端接收的先后次序
Accept-Encoding	指定客户端浏览器可以支持的web服务器返回内容压缩编码类型
Accept-Language	指定HTTP客户端浏览器用来展示返回信息所优先选择的语言
Connection	表示是否需要持久连接。如果web服务器端看到这里的值为“Keep-Alive”，或者看到请求使用的是HTTP 1.1（HTTP 1.1默认进行持久连接），表示连接持久有效，是不会断开的
cookie	HTTP请求发送时，会把保存在该请求域名下的所有cookie值一起发送给web服务器
Refer	包含一个URL，用户从该URL代表的页面出发访问当前请求的页面

Requests简介

Requests库简化了Python与Web服务的交互，通过直观高效的API，实现HTTP请求与响应的优雅处理；
支持广泛的HTTP方法与常见的身份验证方式，无论是网页内容抓取、API调用还是用户登录模拟，Requests均展现出其强大与灵活的特质；
相较于urllib的复杂与繁琐，Requests以其人性化和易用性赢得了开发者喜爱，被誉为“人类最友好的HTTP客户端”。
其强大与灵活性使得网页内容抓取、API调用和用户登录模拟变得轻松自如；
在Python网络请求领域，Requests已成为众多开发者的首选工具。

Requests安装

Requests属于第三方库，需要打开你的终端（或命令提示符），输入以下命令：

pip install requests

核心方法简介

requests.request() 构造一个请求，支撑以下各方法的基础方法

requests.get() 获取HTML网页的主要方法，对应于HTTP的GET；
requests.head() 获取HTML网页头部信息的方法，对应于HTTP的HEAD；
requests.post() 向HTML网页提交POST请求的方法，对应于HTTP的POST；
requests.put() 向HTML网页提交PUT请求的方法，对应于HTTP的PUT；
requests.patch() 向HTML网页提交局部修改请求，对应于HTTP的PATCH；
requests.delete() 向HTML页面提交删除请求，对应于HTTP的DELETE；

它们都是requests.request的便捷版，也就是说，调用requests.get其实相当于调用 requests.request(“GET”, xxx)；

requests方法说明

原型：requests.request(method,url,**kwargs)

method：请求方式如下，

GET
HEAD
POST
PUT
PATCH
DELETE
OPTIONS

url：模拟获取页面的url链接
kwargs：控制访问的参数，共13个，说明如下

params	类型	描述
params	字典、字节序列	作为参数添加到URL链接中
data	字典、字节序列、文件对象	发起请求时携带的内容
json	字典、字符序列	将参数自动转换为JSON格式
headers	字典	HTTP请求头定制内容
cookies	字典	发起请求时携带的cookie
auth	元祖	支持HTTP认证功能
files	字典类型	传输文件
timeout	整数类型	以秒为单位设定请求超时时间
proxies	字典类型	设定访问代理服务器，可以增加登陆认证
allow_redirects	布尔值	重定向开关，默认为True
stream	布尔值	获取内容立即下载开关，默认为True
verify	布尔值	认证SSL证书开关，默认为True
cert	字符串	本地SSL证书路径

httpbin.org是一个用于测试和演示HTTP功能的在线服务，它不会实际地存储或管理会话状态

GET请求

相比于urllib中繁琐的接口，使用Requests发送GET请求就像是告诉Python：“嘿，去访问这个网址，并把内容带回来！”

GET请求没有请求体
携带数据大小必须在1K之内
GET请求数据会暴露在浏览器的地址栏中

1.不带参数发送GET请求

import requests

response = requests.get('https://httpbin.org')

print(response.text)
# 请求结果如下（内容太长，截取部分展示）：
# <!DOCTYPE html>
# <html lang="en">
# 
# <head>
#     <meta charset="UTF-8">
#     <title>httpbin.org</title>
#     <link href="https://fonts.googleapis.com/css?family=Open+Sans:400,700|Source+Code+Pro:300,600|Titillium+Web:400,600,700"
#         rel="stylesheet">
#     <link rel="stylesheet" type="text/css" href="/flasgger_static/swagger-ui.css">
#     <link rel="icon" type="image/png" href="/static/favicon.ico" sizes="64x64 32x32 16x16" />
#     <style>
#         html {
#             box-sizing: border-box;
#             overflow: -moz-scrollbars-vertical;
#             overflow-y: scroll;
#         }
# 
#         *,
#         *:before,
#         *:after {
#             box-sizing: inherit;
#         }
# 
#         body {
#             margin: 0;
#             background: #fafafa;
#         }
#     </style>
# </head>
# ...

2.带参数发送GET请求


import requests

payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.get('http://httpbin.org/get', params=payload)

print(response.text)
# {
#   "args": {
#     "key1": "value1", 
#     "key2": "value2"
#   }, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e5cd40-3bbebd954933e1aa374f79b6"
#   }, 
#   "origin": "180.164.28.66", 
#   "url": "http://httpbin.org/get?key1=value1&key2=value2"
# }

POST请求

进行POST请求时，Requests的表现同样优于urllib，提供了更加直观和简洁的数据提交方式：

请求携带数据不会暴露在浏览器地址栏中
请求携带数据没有大小上限
有请求体
请求体中如果存在中文，会使用URL编码

1.不带参数发送POST请求

import requests

# 不带参数
response = requests.post('http://httpbin.org/post')

print(response.text)
# 结果如下：
# {
#   "args": {}, 
#   "data": "", 
#   "files": {}, 
#   "form": {}, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Content-Length": "0", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e672cc-6607c15c29c4b0f87bb95b7e"
#   }, 
#   "json": null, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/post"
# }

2.带参数发送POST请求-键值对


import requests

data = {'username': 'zhangsan', 'password': 'abcdefg1234567'}

# 使用params关键字传递参数，以键值对形式请求
response = requests.post('http://httpbin.org/post', params=data)

print(response.text)
# 结果如下：
# {
#   "args": {
#     "password": "abcdefg1234567", 
#     "username": "zhangsan"
#   }, 
#   "data": "", 
#   "files": {}, 
#   "form": {}, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Content-Length": "0", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e672f6-6e7c9158279bdfa8070b24b3"
#   }, 
#   "json": null, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/post?username=zhangsan&password=abcdefg1234567"
# }

3.带参数发送POST请求-表单

import requests

data = {'username': 'zhangsan', 'password': 'abcdefg1234567'}

# 使用data关键字传递参数，以表单形式请求
response = requests.post('http://httpbin.org/post', data=data)

print(response.text)
# 结果如下：
# {
#   "args": {}, 
#   "data": "", 
#   "files": {}, 
#   "form": {
#     "password": "abcdefg1234567", 
#     "username": "zhangsan"
#   }, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Content-Length": "41", 
#     "Content-Type": "application/x-www-form-urlencoded", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e6732b-022e03c80e99f52a4e69fb0d"
#   }, 
#   "json": null, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/post"
# }

4.带参数发送POST请求-json对象


import requests

data = {'username': 'zhangsan', 'password': 'abcdefg1234567'}

# 使用json关键字传递参数，以对象形式请求
response = requests.post('http://httpbin.org/post', json=data)

print(response.text)
# {
#   "args": {},
#   "data": "{\"username\": \"zhangsan\", \"password\": \"abcdefg1234567\"}",
#   "files": {},
#   "form": {},
#   "headers": {
#     "Accept": "*/*",
#     "Accept-Encoding": "gzip, deflate, br",
#     "Content-Length": "54",
#     "Content-Type": "application/json",
#     "Host": "httpbin.org",
#     "User-Agent": "python-requests/2.31.0",
#     "X-Amzn-Trace-Id": "Root=1-65e5ce52-3137c56054637a925191dcd2"
#   },
#   "json": {
#     "password": "abcdefg1234567",
#     "username": "zhangsan"
#   },
#   "origin": "180.164.28.66",
#   "url": "http://httpbin.org/post"
# }

处理响应内容

Requests库提供了许多方法来处理响应内容；
例如，使用status_code属性获取响应的状态码；
使用headers属性获取响应头；
使用content属性获取响应内容的二进制形式等等；
下面是一些常用响应对象的属性：

属性名	描述
text	获取文本响应内容，即网页源代码(str格式)
content	获取二进制响应内容，即网页源代码(bytes格式)
status_code	HTTP响应状态码（例如，200、404等）。
headers	一个字典，包含响应头。键为响应头名称，值为响应头的值。
cookies	一个RequestsCookieJar对象，包含服务器发送的所有cookies。
url	获取最终的URL（在重定向之后）。
history	一个Response对象列表，按照请求被重定向的顺序排序。
encoding	从HTTP header中猜测的响应内容编码方式。
reason	响应状态码的文本表示（例如，“Not Found” 或 “OK”）。
elapsed	发送请求到响应返回之间经过的时间，一个timedelta对象。
request	产生当前响应的Request对象。
json()	一个方法，尝试将响应内容解析为JSON格式。如果解析成功，返回解析后的字典/列表；否则抛出一个异常。
raise_for_status()	如果响应状态码指示一个HTTP错误（4xx或5xx），则抛出HTTPError异常；否则什么也不做。


import requests

response = requests.get('http://httpbin.org/get')

print("返回接口的文本信息:", response.text)
print("返回bytes字节类型数据:", response.content)
print("返回状态码:", response.status_code)
print("返回响应头:", response.headers)
print("返回cookie信息:", response.cookies)
print("返回最终请求地址:", response.url)
print("返回响应对象列表:", response.history)
print("返回编码格式:", response.encoding)
print("返回状态信息:", response.reason)
print("返回产生当前响应的对象:", response.request)
print("返回json格式的数据:", response.json())

请求头

有时候，需要设置请求头来模拟浏览器发送请求。可以使用headers参数来设置请求头；

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
    'Mark': 'zhangsan'  # 在请求头中打个标记，后在响应中查看是否存在
}

response = requests.get('http://httpbin.org/get', headers=headers)

print(response.text)
# 结果如下
# {
#   "args": {}, 
#   "headers": {
#     "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8", 
#     "Host": "httpbin.org", 
#     "Mark": "zhangsan", 
#     "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36", 
#     "X-Amzn-Trace-Id": "Root=1-65e6764d-00190c076b2527315871a837"
#   }, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/get"
# }

Cookie

服务器可以在响应中设置一个或多个Cookie，然后浏览器会在后续的请求中自动发送这些Cookie；

import requests

response = requests.get('http://httpbin.org/cookies')

print(response.text)
# 结果如下：
# {
#   "cookies": {}
# }

cookies = {
    'session_id': 'abc123',
    'user_pref': 'lang=zh_CN'
}

response = requests.get('http://httpbin.org/cookies', cookies=cookies)

print(response.text)
# 结果如下：
# {
#   "cookies": {
#     "session_id": "abc123", 
#     "user_pref": "lang=zh_CN"
#   }
# }

重定向

默认情况下，Requests库会自动处理重定向；
也可以使用allow_redirects参数来控制是否允许重定向；

import requests

url = "http://httpbin.org/redirect-to?url=http://example.com"

# 发送请求
response = requests.get(url, allow_redirects=True)

# 输出重定向历史
print("重定向历史:", response.history)   # [<Response [302]>]

# 输出最终响应的 URL
print("最终 URL:", response.url)  # http://example.com

# 输出响应内容
print("响应内容:", response.text)
# 响应内容: <!doctype html>
# <html>
# <head>
#     <title>Example Domain</title>
# 
#     <meta charset="utf-8" />
#     <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
#     <meta name="viewport" content="width=device-width, initial-scale=1" />
#     <style type="text/css">
#     body {
#         background-color: #f0f0f2;
#         margin: 0;
#         padding: 0;
#         font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
#         
#     }
#     div {
#         width: 600px;
#         margin: 5em auto;
#         padding: 2em;
#         background-color: #fdfdff;
#         border-radius: 0.5em;
#         box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
#     }
#     a:link, a:visited {
#         color: #38488f;
#         text-decoration: none;
#     }
#     @media (max-width: 700px) {
#         div {
#             margin: 0 auto;
#             width: auto;
#         }
#     }
#     </style>    
# </head>
# 
# <body>
# <div>
#     <h1>Example Domain</h1>
#     <p>This domain is for use in illustrative examples in documents. You may use this
#     domain in literature without prior coordination or asking for permission.</p>
#     <p><a href="https://www.iana.org/domains/example">More information...</a></p>
# </div>
# </body>
# </html>

处理超时

可以使用timeout参数来设置请求超时时间：

import requests

# 请求目标接口，让其10秒后给出响应
response = requests.get('http://httpbin.org/delay/10')

print(response.text)    # 一直等待目标接口给出响应
# 结果如下：
# {
#   "args": {}, 
#   "data": "", 
#   "files": {}, 
#   "form": {}, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e6771e-348de36a30fda99f7878fc8c"
#   }, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/delay/10"
# }

try:
    # 请求目标接口，让其10秒后给出响应，另设置5秒超时
    response = requests.get('http://httpbin.org/delay/10', timeout=5)
    print(response.text)    # 5秒内未拿到响应，超时，抛出异常
except requests.exceptions.Timeout:
    print("请求超时！")  # 输出：请求超时！

处理代理

可以通过设置proxies参数来使用代理发送请求

import requests

proxies = {
    'http': 'http://127.0.0.1:8080',
    'https': 'https://127.0.0.1:8080',
}

response = requests.get('http://httpbin.org/get', proxies=proxies)

print(response.text)    # 抛出异常，无法连接到指定代理
# requests.exceptions.ProxyError: HTTPConnectionPool(host='127.0.0.1', port=8080): Max retries exceeded with url: http://httpbin.org/get (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcf084ee4f0>: Failed to establish a new connection: [Errno 61] Connection refused')))

SSL证书验证

证书认证通常用于安全地验证客户端和服务器之间的通信。
在HTTPS协议中，服务器会提供一个SSL/TLS证书，客户端会验证这个证书的有效性，以确保与正确的服务器进行通信，并且通信内容会被加密以保护数据的安全性。
默认情况下，Requests库会验证SSL证书。如果要禁用证书验证，可以设置verify参数为False


import requests

# 在Python的requests库中，当你向HTTPS URL发送请求时，它默认会验证服务器的SSL证书
# 如果你希望自定义证书验证，可以传递cert参数
cert_path = '/my_cert_file.crt'


# verify=True告诉requests库要验证服务器的SSL证书。
# 如果证书验证失败，requests库会抛出一个SSLError异常。
# response = requests.post('https://httpbin.org/post', verify=True, cert=cert_path)
# print(response.text)    # 抛出异常错误，原因是我没有证书，OSError: Could not find the TLS certificate file, invalid path: /my_cert_file.crt

# 通过指定verify=False参数，禁用证书验证，但会出现警告：正在向主机的http发出未经验证的HTTPS请求
response = requests.post('https://httpbin.org/post', verify=False)

print(response.text)

处理文件下载

目标URL将返回文件的二进制数据，然后，我们将响应内容的content数据写入本地文件；

import requests

# 文件下载的URL
download_url = 'http://httpbin.org/bytes/1024'

# 发送GET请求并获取响应
response = requests.get(download_url)

# 将响应内容保存到本地文件
with open('./downloaded_file.bin', 'wb') as file:
    file.write(response.content)

# 输出下载结果
print('File downloaded successfully.')
# 在当前目录下会出现downloaded_file.bin文件

处理文件上传

files字典中的键’file’对应于表单字段的名称，而文件对象则作为值传递；

import requests

# 文件上传的URL
upload_url = 'http://httpbin.org/post'

# 要上传的文件路径
# 文件内容是：The contents of my file: abcdefg1234567
file_path = './test.txt'

# 创建一个multipart/form-data编码的表单数据
with open(file_path, 'rb') as file:
    files = {'file': file}
    response = requests.post(upload_url, files=files)

# 输出上传结果
print(response.text)
# 结果如下：
# {
#   "args": {}, 
#   "data": "", 
#   "files": {
#     "file": "The contents of my file: abcdefg1234567"
#   }, 
#   "form": {}, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Content-Length": "183", 
#     "Content-Type": "multipart/form-data; boundary=fb427351fb3bfca331fd23c761ce5223", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e67ead-7ed04d2d39de1b00477bd288"
#   }, 
#   "json": null, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/post"
# }

使用Session管理会话

Session管理允许在多个请求之间保持状态。这在需要登录的网站上非常有用；
因为你可以在一个session中发送登录请求，然后在随后的请求中重复使用相同的session，而无需每次都重新登录。
使用requests.Session()对象可以创建一个会话，该会话可以跨多个请求保持某些参数和cookies

import requests

# 请求获取cookies接口
response1 = requests.get('https://httpbin.org/cookies')
print(response1.text)
# 结果如下：
# {
#   "cookies": {}
# }

# 创建一个会话对象
session = requests.Session()

# 使用会话对象发送第一个请求，并将zhangsan设置cookies的值
response2 = session.get('https://httpbin.org/cookies/set/sessioncookie/zhangsan')
print(response2.text)
# 结果如下：
# {
#   "cookies": {
#     "sessioncookie": "zhangsan"
#   }
# }

# 使用相同的会话对象发送第二个请求，携带之前设置的cookies
response3 = session.get('https://httpbin.org/cookies')
print(response3.text)
# 结果如下：
# {
#   "cookies": {
#     "sessioncookie": "zhangsan"
#   }
# }

源码分析

1.第一层源码

先来看一下GET、POST、PUT、DELETE等请求的源码，看一下它们都有什么特点；

GET请求源码

def get(url, params=None, **kwargs):
    r"""Sends a GET request.

    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request("get", url, params=params, **kwargs)

POST请求源码

def post(url, data=None, json=None, **kwargs):
    r"""Sends a POST request.

    :param url: URL for the new :class:`Request` object.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request("post", url, data=data, json=json, **kwargs)

PUT请求源码

def put(url, data=None, **kwargs):
    r"""Sends a PUT request.

    :param url: URL for the new :class:`Request` object.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request("put", url, data=data, **kwargs)

DELETE请求源码

def delete(url, **kwargs):
    r"""Sends a DELETE request.

    :param url: URL for the new :class:`Request` object.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request("delete", url, **kwargs)

2.第二层源码

从源码中发现，所有请求方式的源码中最终调用的都是request方法；
这里成功验证了上文说的无论是GET、POST还是PUT、DELETE等请求都是便捷版；
接着，我们继续深入，去看下request方法源码有什么特点；


def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How many seconds to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response

    Usage::

      >>> import requests
      >>> req = requests.request('GET', 'https://httpbin.org/get')
      >>> req
      <Response [200]>
    """

    # By using the 'with' statement we are sure the session is closed, thus we
    # avoid leaving sockets open which can trigger a ResourceWarning in some
    # cases, and look like a memory leak in others.
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)

3.第三层源码

从request方法的源码中可以看到，在进入底层方法前，使用了Python的上下文管理器；
来确保session对象在使用后被正确地关闭，即使发生了异常也是如此；
接着，我们继续深入，去看下session.request的源码；


    def request(
        self,
        method,
        url,
        params=None,
        data=None,
        headers=None,
        cookies=None,
        files=None,
        auth=None,
        timeout=None,
        allow_redirects=True,
        proxies=None,
        hooks=None,
        stream=None,
        verify=None,
        cert=None,
        json=None,
    ):
        """Constructs a :class:`Request <Request>`, prepares it and sends it.
        Returns :class:`Response <Response>` object.

        :param method: method for the new :class:`Request` object.
        :param url: URL for the new :class:`Request` object.
        :param params: (optional) Dictionary or bytes to be sent in the query
            string for the :class:`Request`.
        :param data: (optional) Dictionary, list of tuples, bytes, or file-like
            object to send in the body of the :class:`Request`.
        :param json: (optional) json to send in the body of the
            :class:`Request`.
        :param headers: (optional) Dictionary of HTTP Headers to send with the
            :class:`Request`.
        :param cookies: (optional) Dict or CookieJar object to send with the
            :class:`Request`.
        :param files: (optional) Dictionary of ``'filename': file-like-objects``
            for multipart encoding upload.
        :param auth: (optional) Auth tuple or callable to enable
            Basic/Digest/Custom HTTP Auth.
        :param timeout: (optional) How long to wait for the server to send
            data before giving up, as a float, or a :ref:`(connect timeout,
            read timeout) <timeouts>` tuple.
        :type timeout: float or tuple
        :param allow_redirects: (optional) Set to True by default.
        :type allow_redirects: bool
        :param proxies: (optional) Dictionary mapping protocol or protocol and
            hostname to the URL of the proxy.
        :param stream: (optional) whether to immediately download the response
            content. Defaults to ``False``.
        :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``. When set to
            ``False``, requests will accept any TLS certificate presented by
            the server, and will ignore hostname mismatches and/or expired
            certificates, which will make your application vulnerable to
            man-in-the-middle (MitM) attacks. Setting verify to ``False``
            may be useful during local development or testing.
        :param cert: (optional) if String, path to ssl client cert file (.pem).
            If Tuple, ('cert', 'key') pair.
        :rtype: requests.Response
        """
        # Create the Request.
        req = Request(
            method=method.upper(),
            url=url,
            headers=headers,
            files=files,
            data=data or {},
            json=json,
            params=params or {},
            auth=auth,
            cookies=cookies,
            hooks=hooks,
        )
        prep = self.prepare_request(req)

        proxies = proxies or {}

        settings = self.merge_environment_settings(
            prep.url, proxies, stream, verify, cert
        )

        # Send the request.
        send_kwargs = {
            "timeout": timeout,
            "allow_redirects": allow_redirects,
        }
        send_kwargs.update(settings)
        resp = self.send(prep, **send_kwargs)

        return resp

从session.request的源码中可以看出；
它是先创建一个Request，然后将传过来的所有参数放在里面，再接着调用self.send()，并将Request传过去；
这里我们将不在继续深入分析send后面的源码了，有兴趣的同学可以自行了解；
分析完源码之后发现，不需要单独在一个类中去定义Get、Post等其他方法，然后在单独调用request。
其实，我们直接调用request即可。

二次封装

1.request请求封装

import traceback
import requests


# 装饰器，用于请求之后，组装响应内容
def after(func):
    def inside(*args, **kwargs):
        requests_obj = func(*args, **kwargs)
        # 组装请求响应信息
        response_dict = {
            'url': requests_obj.url,  # 最终响应url
            'encoding': requests_obj.encoding,  # 响应编码
            'info': requests_obj.reason,  # 响应状态信息
            'code': requests_obj.status_code,  # 响应状态码
            'headers': requests_obj.headers,  # 响应头
            'cookies': dict(requests_obj.cookies),  # cookies
            'seconds': requests_obj.elapsed.total_seconds(),  # 秒
            'microseconds': requests_obj.elapsed.microseconds,  # 微秒
            'millisecond': requests_obj.elapsed.microseconds / 1000  # 毫秒
        }
        try:
            response_dict['json'] = requests_obj.json()
        except ValueError:
            response_dict['text'] = requests_obj.text

        return response_dict

    return inside


class HTTPClient(object):

    def __init__(self):
        """
        session管理器
        requests.session()：维持会话，跨请求的时候保持参数
        """
        self.session = requests.session()

    @after
    def seed_request(self, method: str, url: str, params=None, data=None, json=None, headers=None, **kwargs):
        """
        按照指定请求方式向url地址携带params/data/json/..数据发送HTTP请求
        :param method:  必填项，字符类型，接口请求方式；如：GET、POST、PUT、DELETE等
        :param url:     必填项，字符类型，接口请求地址；如：http://127.0.0.1/test
        :param params:  非必填，字符类型，接口请求参数类型（参数增加到url中）
        :param data:    非必填，字典类型，接口请求参数类型（作为Request的内容）
        :param json:    非必填，JSON类型，接口请求参数类型（作为Request的内容）
        :param headers: 非必填，字典类型，接口请求的头部信息；
        :param kwargs:  非必填，字典类型，其他参数；
        :return:        返回requests请求对象
        """

        # 1.GET：获取实体数据
        # 2.HEAD：获取响应头
        # 3.POST：提交数据
        # 4.PUT：上传数据
        # 5.PATCH：同PUT请求，对已知资源进行局部更新
        # 6.DELETE：删除数据
        # 7.OPTIONS：测试通信
        # 8.CONNECT：更改连接模式为管道方式的代理服务器
        # 9.TRACE：回显服务方收到的请求，用于测试和诊断

        # 1.检查请求方式是否允许
        methods = ('GET', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE', 'OPTIONS', 'CONNECT', 'TRACE')
        if method.upper() not in methods:
            raise ValueError(f'不支持的请求方式[{method}]！支持的请求方式有：{methods}')

        # 2.限制请求超时时间，如果kwargs字典中设置了timeout则直接使用
        kwargs["timeout"] = 120 if "timeout" not in kwargs.keys() else int(kwargs["timeout"])

        # 3.检查是否指定请求头信息
        headers = HTTPClient.process_headers(headers=headers)

        # 4.根据参数类型设置请求头的客户端数据类型
        if json is not None:
            # 可以忽略，因为使用json参数时，不需要手动序列化数据或设置Content-Type头，requests会自动处理。
            headers['Content-Type'] = "application/json;charset=utf-8"
        elif data is not None:
            headers['Content-Type'] = "application/x-www-form-urlencoded;charset=UTF-8"
        elif params is not None:
            # 对于params，通常不需要设置Content-Type，因为它们会附加到URL中
            pass

        # 5.发送请求
        try:
            return self.session.request(method=method, url=url, params=params,
                                        data=data, json=json,
                                        headers=headers, **kwargs)
        except Exception:
            raise ValueError(f"接口请求失败，请检查[请求地址-请求参数-参数类型]是否有误！异常信息：{traceback.format_exc()}")

    @staticmethod
    def process_headers(headers):
        default_headers = {
            'Accept-Language': 'zh-CN,zh;q=0.9',
            'Connection': 'keep-alive',
            'Accept': 'application/json, text/plain, */*',
            'User-Agent': "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36"
        }

        if headers is None:
            return default_headers

        if isinstance(headers, str):
            try:
                headers_dict = eval(headers)
                if isinstance(headers_dict, dict):
                    return headers_dict
                else:
                    raise TypeError(f"字符串转换后不是字典类型: {headers}")
            except (NameError, SyntaxError):
                raise TypeError(f"无法解析字符串为字典: {headers}")

        if isinstance(headers, dict):
            return headers

        raise TypeError(f"参数headers应为None、字符串或字典类型，但得到了: {type(headers)}")


if __name__ == '__main__':
    http = HTTPClient()
    test_get = http.seed_request("GET", "http://httpbin.org/get")
    print(test_get.get("url"))  # http://httpbin.org/get
    print(test_get.get("encoding"))  # utf-8
    print(test_get.get("info"))  # OK
    print(test_get.get("code"))  # 200

    data = {"name": "张三", "age": 20, "phone": "10086", "address": "上海市浦东新区"}
    test_post = http.seed_request("POST", "http://httpbin.org/post", data=data)
    print(test_post)

2.响应断言封装

import json
from jsonpath_ng import parse
import requests


class AssertTool:
    def __init__(self, response):
        """
        初始化断言工具类
        :param response: requests的响应对象
        """
        self.response = response
        try:
            self.response_json = response.json()
        except json.JSONDecodeError:
            self.response_json = None

    def assert_status_code(self, expected_code):
        """
        断言响应状态码
        """
        assert self.response.status_code == expected_code, f"预期响应状态码：「{expected_code}」, 却得到：「{self.response.status_code}」"

    def assert_status_message(self, expected_message):
        """
        断言响应状态信息
        """
        assert self.response.reason == expected_message, f"预期响应状态信息：「{expected_message}」, 却得到：「{self.response.reason}」"

    def assert_json_value_exists(self, json_path):
        """
        使用jsonpath断言响应体中的特定路径下的值是否存在
        """
        if not self.response_json:
            raise ValueError("响应不是JSON格式或为空")

        jsonpath_expression = parse(json_path)
        match = jsonpath_expression.find(self.response_json)
        assert match, f"找不到JsonPath的值: {json_path}"

    def assert_json_value(self, json_path, expected_value):
        """
        断言响应体中的特定路径下的值是否与预期值匹配
        """
        if not self.response_json:
            raise ValueError("响应不是JSON格式或为空")

        jsonpath_expression = parse(json_path)
        match = jsonpath_expression.find(self.response_json)
        assert match, f"找不到JsonPath的值: {json_path}"
        actual_value = match[0].value
        assert actual_value == expected_value, f"在「{json_path}」路径下预期值：「{expected_value}」, 却得到：「{actual_value}」"

    def get_json_value(self, json_path):
        """
        获取指定jsonpath路径的值
        """
        if not self.response_json:
            raise ValueError("Response is not in JSON format or is empty.")

        jsonpath_expression = parse(json_path)
        match = jsonpath_expression.find(self.response_json)
        if not match:
            raise ValueError(f"No value found for the jsonpath: {json_path}")
        return match[0].value  # 返回第一个匹配项的值

# 使用示例
if __name__ == "__main__":


    # 发送一个示例请求
    data = {"name": "张三", "age": 20, "phone": {"中国联通": "1001", "中国移动": "1002", "中国电信": "1003"}, "address": "上海市浦东新区"}

    response = requests.post("http://httpbin.org/post", json=data)
    print(response.json())
    js = response.json()['json']
    # 创建断言工具对象
    assert_tool = AssertTool(response)

    # 进行断言
    assert_tool.assert_status_code(200)
    assert_tool.assert_status_message('OK')
    print(assert_tool.get_json_value('$.json.phone'))   # {'中国电信': '1003', '中国移动': '1002', '中国联通': '1001'}
    assert_tool.assert_json_value_exists('$.json.phone."中国移动"')
    assert_tool.assert_json_value('$.json.phone."中国移动"', 666)

总结

Python的requests模块是发送HTTP请求的便捷工具，支持GET、POST等多种请求方法。
它提供了直观的API来定制请求，如设置请求头、传递参数等。
同时，requests也能自动处理许多底层细节，如Cookie、会话等。
总之，requests模块简化了Python的网络编程，使得发送HTTP请求变得高效且简单，是Python开发者处理网络请求的优选库，常常用于爬虫、数据分析、接口自动化等领域。

需要休息的KK.

关注

31
点赞
踩
11

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python 爬虫与接口自动化必备Requests模块

一文通晓爬虫与接口自动化之尖端武器！深度解析requests模块源码，封装了HTTP请求工具类与响应断言类，为爬虫与接口自动化工作奠定坚实基石。掌握此利器，彻底告别爬虫、数据分析、接口自动化的困扰！
复制链接

扫一扫