文章目录
文章开篇
Python的魅力,犹如星河璀璨,无尽无边;人生苦短、我用Python!
HTTP简介
HTTP(Hypertext Transfer Protocol)超文本传输协议;
HTTP是一个基于“请求与响应”模式的、无状态的应用层协议,采用URL作为定位网络资源的标识;
URL是通过HTTP协议存取资源的Internet路径,一个URL对应一个数据资源;
URL格式:http://host[:port][path]
- **host:**合法的Internet主机域名或IP地址;
- **port:**端口号,缺省端口为80;
- **path:**请求资源的路径;
HTTP协议对资源的操作方法说明
- **GET请求:**获取URL位置的资源;
- **HEAD请求:**获取URL位置资源的响应消息报告,即获取该资源的头部信息;
- **POST请求:**向URL位置的资源后附加新的数据;
- **PUT请求:**向URL位置存储一个资源,覆盖原URL位置的资源;
- **PATCH请求:**局部更新URL位置的资源,即改变该处资源的部分内容;
- **DELETE请求:**删除URL位置存储的资源;
GET和POST请求的区别
GET请求
- 用于获取资源,当采用GET方式请求指定资源时,被访问的资源经服务器解析后立即返回响应内容。通常以GET方式请求特定资源时,请求中不应该包含请求体,所有需要向被请求资源传递的数据都应该通过 URL 向服务器传递;
POST请求
- 用于提交数据,当采用POST方式向指定位置提交数据时,数据被包含在请求体中,服务器接收到这些数据后可能会建立新的资源、也可能会更新已有的资源。同时POST方式的请求体可以包含非常多的数据,而且格式不限。因此POST方式用途较为广泛,几乎所有的提交操作都可以使用POST方式来完成;
本质上区别
- GET产生 一个 TCP数据包
- POST产生 两个 TCP数据包
- GET方式请求,浏览器会把http header和data一并发送出去,服务器响应200(返回数据);
- POST方式请求,浏览器先发送header,服务器响应100 continue,浏览器再发送data,服务器响应200;
参数传递区别
- GET请求将参数包含在URL中
- POST请求通过request body传递参数
- 这意味着GET请求的参数会直接显示在浏览器的地址栏中,而POST请求的参数不会显示。
安全性区别
- POST请求通常被认为比GET请求更安全,因为POST请求的数据不会像GET请求那样显示在URL中,从而减少了敏感信息泄露的风险。
缓存区别
- GET请求的结果可以被浏览器缓存,这使得重复访问同一资源时可以更快地加载页面。
- POST请求通常不会被缓存,因为每次提交的数据可能不同,缓存可能会导致数据不一致。
URL长度区别
- 由于GET请求的参数是附加在URL上的,所以URL的长度受到限制。
- 如果参数过多或过长,可能会导致服务器拒绝处理请求;
- POST请求则没有这个问题,因为参数是在请求体中发送的;
HTTP常见请求参数
参数名称 | 参数描述 |
---|---|
url | 请求的目标地址 |
headers | 请求头 |
data | 发送 |
params | 查询字符串 |
host | 请求web服务器的域名地址 |
User-Agent | HTTP客户端运行的浏览器类型的详细信息。通过该头部信息,web服务器可以判断到当前HTTP请求的客户端浏览器类别 |
Accept | 指定客户端能够接收的内容类型,内容类型中的先后次序表示客户端接收的先后次序 |
Accept-Encoding | 指定客户端浏览器可以支持的web服务器返回内容压缩编码类型 |
Accept-Language | 指定HTTP客户端浏览器用来展示返回信息所优先选择的语言 |
Connection | 表示是否需要持久连接。如果web服务器端看到这里的值为“Keep-Alive”,或者看到请求使用的是HTTP 1.1(HTTP 1.1默认进行持久连接),表示连接持久有效,是不会断开的 |
cookie | HTTP请求发送时,会把保存在该请求域名下的所有cookie值一起发送给web服务器 |
Refer | 包含一个URL,用户从该URL代表的页面出发访问当前请求的页面 |
Requests简介
Requests库简化了Python与Web服务的交互,通过直观高效的API,实现HTTP请求与响应的优雅处理;
支持广泛的HTTP方法与常见的身份验证方式,无论是网页内容抓取、API调用还是用户登录模拟,Requests均展现出其强大与灵活的特质;
相较于urllib的复杂与繁琐,Requests以其人性化和易用性赢得了开发者喜爱,被誉为“人类最友好的HTTP客户端”。
其强大与灵活性使得网页内容抓取、API调用和用户登录模拟变得轻松自如;
在Python网络请求领域,Requests已成为众多开发者的首选工具。
Requests安装
Requests属于第三方库,需要打开你的终端(或命令提示符),输入以下命令:
pip install requests
核心方法简介
requests.request() 构造一个请求,支撑以下各方法的基础方法
- requests.get() 获取HTML网页的主要方法,对应于HTTP的GET;
- requests.head() 获取HTML网页头部信息的方法,对应于HTTP的HEAD;
- requests.post() 向HTML网页提交POST请求的方法,对应于HTTP的POST;
- requests.put() 向HTML网页提交PUT请求的方法,对应于HTTP的PUT;
- requests.patch() 向HTML网页提交局部修改请求,对应于HTTP的PATCH;
- requests.delete() 向HTML页面提交删除请求,对应于HTTP的DELETE;
它们都是requests.request的便捷版,也就是说,调用requests.get其实相当于调用 requests.request(“GET”, xxx);
requests方法说明
原型:requests.request(method,url,**kwargs)
method:请求方式如下,
- GET
- HEAD
- POST
- PUT
- PATCH
- DELETE
- OPTIONS
url:模拟获取页面的url链接
kwargs:控制访问的参数,共13个,说明如下
params | 类型 | 描述 |
---|---|---|
params | 字典、字节序列 | 作为参数添加到URL链接中 |
data | 字典、字节序列、文件对象 | 发起请求时携带的内容 |
json | 字典、字符序列 | 将参数自动转换为JSON格式 |
headers | 字典 | HTTP请求头定制内容 |
cookies | 字典 | 发起请求时携带的cookie |
auth | 元祖 | 支持HTTP认证功能 |
files | 字典类型 | 传输文件 |
timeout | 整数类型 | 以秒为单位设定请求超时时间 |
proxies | 字典类型 | 设定访问代理服务器,可以增加登陆认证 |
allow_redirects | 布尔值 | 重定向开关,默认为True |
stream | 布尔值 | 获取内容立即下载开关,默认为True |
verify | 布尔值 | 认证SSL证书开关,默认为True |
cert | 字符串 | 本地SSL证书路径 |
httpbin.org是一个用于测试和演示HTTP功能的在线服务,它不会实际地存储或管理会话状态
GET请求
相比于urllib中繁琐的接口,使用Requests发送GET请求就像是告诉Python:“嘿,去访问这个网址,并把内容带回来!”
- GET请求没有请求体
- 携带数据大小必须在1K之内
- GET请求数据会暴露在浏览器的地址栏中
1.不带参数发送GET请求
import requests
response = requests.get('https://httpbin.org')
print(response.text)
# 请求结果如下(内容太长,截取部分展示):
# <!DOCTYPE html>
# <html lang="en">
#
# <head>
# <meta charset="UTF-8">
# <title>httpbin.org</title>
# <link href="https://fonts.googleapis.com/css?family=Open+Sans:400,700|Source+Code+Pro:300,600|Titillium+Web:400,600,700"
# rel="stylesheet">
# <link rel="stylesheet" type="text/css" href="/flasgger_static/swagger-ui.css">
# <link rel="icon" type="image/png" href="/static/favicon.ico" sizes="64x64 32x32 16x16" />
# <style>
# html {
# box-sizing: border-box;
# overflow: -moz-scrollbars-vertical;
# overflow-y: scroll;
# }
#
# *,
# *:before,
# *:after {
# box-sizing: inherit;
# }
#
# body {
# margin: 0;
# background: #fafafa;
# }
# </style>
# </head>
# ...
2.带参数发送GET请求
import requests
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.get('http://httpbin.org/get', params=payload)
print(response.text)
# {
# "args": {
# "key1": "value1",
# "key2": "value2"
# },
# "headers": {
# "Accept": "*/*",
# "Accept-Encoding": "gzip, deflate, br",
# "Host": "httpbin.org",
# "User-Agent": "python-requests/2.31.0",
# "X-Amzn-Trace-Id": "Root=1-65e5cd40-3bbebd954933e1aa374f79b6"
# },
# "origin": "180.164.28.66",
# "url": "http://httpbin.org/get?key1=value1&key2=value2"
# }
POST请求
进行POST请求时,Requests的表现同样优于urllib,提供了更加直观和简洁的数据提交方式:
- 请求携带数据不会暴露在浏览器地址栏中
- 请求携带数据没有大小上限
- 有请求体
- 请求体中如果存在中文,会使用URL编码
1.不带参数发送POST请求
import requests
# 不带参数
response = requests.post('http://httpbin.org/post')
print(response.text)
# 结果如下:
# {
# "args": {},
# "data": "",
# "files": {},
# "form": {},
# "headers": {
# "Accept": "*/*",
# "Accept-Encoding": "gzip, deflate, br",
# "Content-Length": "0",
# "Host": "httpbin.org",
# "User-Agent": "python-requests/2.31.0",
# "X-Amzn-Trace-Id": "Root=1-65e672cc-6607c15c29c4b0f87bb95b7e"
# },
# "json": null,
# "origin": "101.82.87.75",
# "url": "http://httpbin.org/post"
# }
2.带参数发送POST请求-键值对
import requests
data = {'username': 'zhangsan', 'password': 'abcdefg1234567'}
# 使用params关键字传递参数,以键值对形式请求
response = requests.post('http://httpbin.org/post', params=data)
print(response.text)
# 结果如下:
# {
# "args": {
# "password": "abcdefg1234567",
# "username": "zhangsan"
# },
# "data": "",
# "files": {},
# "form": {},
# "headers": {
# "Accept": "*/*",
# "Accept-Encoding": "gzip, deflate, br",
# "Content-Length": "0",
# "Host": "httpbin.org",
# "User-Agent": "python-requests/2.31.0",
# "X-Amzn-Trace-Id": "Root=1-65e672f6-6e7c9158279bdfa8070b24b3"
# },
# "json": null,
# "origin": "101.82.87.75",
# "url": "http://httpbin.org/post?username=zhangsan&password=abcdefg1234567"
# }
3.带参数发送POST请求-表单
import requests
data = {'username': 'zhangsan', 'password': 'abcdefg1234567'}
# 使用data关键字传递参数,以表单形式请求
response = requests.post('http://httpbin.org/post', data=data)
print(response.text)
# 结果如下:
# {
# "args": {},
# "data": "",
# "files": {},
# "form": {
# "password": "abcdefg1234567",
# "username": "zhangsan"
# },
# "headers": {
# "Accept": "*/*",
# "Accept-Encoding": "gzip, deflate, br",
# "Content-Length": "41",
# "Content-Type": "application/x-www-form-urlencoded",
# "Host": "httpbin.org",
# "User-Agent": "python-requests/2.31.0",
# "X-Amzn-Trace-Id": "Root=1-65e6732b-022e03c80e99f52a4e69fb0d"
# },
# "json": null,
# "origin": "101.82.87.75",
# "url": "http://httpbin.org/post"
# }
4.带参数发送POST请求-json对象
import requests
data = {'username': 'zhangsan', 'password': 'abcdefg1234567'}
# 使用json关键字传递参数,以对象形式请求
response = requests.post('http://httpbin.org/post', json=data)
print(response.text)
# {
# "args": {},
# "data": "{\"username\": \"zhangsan\", \"password\": \"abcdefg1234567\"}",
# "files": {},
# "form": {},
# "headers": {
# "Accept": "*/*",
# "Accept-Encoding": "gzip, deflate, br",
# "Content-Length": "54",
# "Content-Type": "application/json",
# "Host": "httpbin.org",
# "User-Agent": "python-requests/2.31.0",
# "X-Amzn-Trace-Id": "Root=1-65e5ce52-3137c56054637a925191dcd2"
# },
# "json": {
# "password": "abcdefg1234567",
# "username": "zhangsan"
# },
# "origin": "180.164.28.66",
# "url": "http://httpbin.org/post"
# }
处理响应内容
Requests库提供了许多方法来处理响应内容;
例如,使用status_code属性获取响应的状态码;
使用headers属性获取响应头;
使用content属性获取响应内容的二进制形式等等;
下面是一些常用响应对象的属性:
属性名 | 描述 |
---|---|
text | 获取文本响应内容,即网页源代码(str格式) |
content | 获取二进制响应内容,即网页源代码(bytes格式) |
status_code | HTTP响应状态码(例如,200、404等)。 |
headers | 一个字典,包含响应头。键为响应头名称,值为响应头的值。 |
cookies | 一个RequestsCookieJar对象,包含服务器发送的所有cookies。 |
url | 获取最终的URL(在重定向之后)。 |
history | 一个Response对象列表,按照请求被重定向的顺序排序。 |
encoding | 从HTTP header中猜测的响应内容编码方式。 |
reason | 响应状态码的文本表示(例如,“Not Found” 或 “OK”)。 |
elapsed | 发送请求到响应返回之间经过的时间,一个timedelta对象。 |
request | 产生当前响应的Request对象。 |
json() | 一个方法,尝试将响应内容解析为JSON格式。如果解析成功,返回解析后的字典/列表;否则抛出一个异常。 |
raise_for_status() | 如果响应状态码指示一个HTTP错误(4xx或5xx),则抛出HTTPError异常;否则什么也不做。 |
import requests
response = requests.get('http://httpbin.org/get')
print("返回接口的文本信息:", response.text)
print("返回bytes字节类型数据:", response.content)
print("返回状态码:", response.status_code)
print("返回响应头:", response.headers)
print("返回cookie信息:", response.cookies)
print("返回最终请求地址:", response.url)
print("返回响应对象列表:", response.history)
print("返回编码格式:", response.encoding)
print("返回状态信息:", response.reason)
print("返回产生当前响应的对象:", response.request)
print("返回json格式的数据:", response.json())
请求头
有时候,需要设置请求头来模拟浏览器发送请求。可以使用headers参数来设置请求头;
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
'Mark': 'zhangsan' # 在请求头中打个标记,后在响应中查看是否存在
}
response = requests.get('http://httpbin.org/get', headers=headers)
print(response.text)
# 结果如下
# {
# "args": {},
# "headers": {
# "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
# "Accept-Encoding": "gzip, deflate, br",
# "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
# "Host": "httpbin.org",
# "Mark": "zhangsan",
# "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
# "X-Amzn-Trace-Id": "Root=1-65e6764d-00190c076b2527315871a837"
# },
# "origin": "101.82.87.75",
# "url": "http://httpbin.org/get"
# }
Cookie
服务器可以在响应中设置一个或多个Cookie,然后浏览器会在后续的请求中自动发送这些Cookie;
import requests
response = requests.get('http://httpbin.org/cookies')
print(response.text)
# 结果如下:
# {
# "cookies": {}
# }
cookies = {
'session_id': 'abc123',
'user_pref': 'lang=zh_CN'
}
response = requests.get('http://httpbin.org/cookies', cookies=cookies)
print(response.text)
# 结果如下:
# {
# "cookies": {
# "session_id": "abc123",
# "user_pref": "lang=zh_CN"
# }
# }
重定向
默认情况下,Requests库会自动处理重定向;
也可以使用allow_redirects参数来控制是否允许重定向;
import requests
url = "http://httpbin.org/redirect-to?url=http://example.com"
# 发送请求
response = requests.get(url, allow_redirects=True)
# 输出重定向历史
print("重定向历史:", response.history) # [<Response [302]>]
# 输出最终响应的 URL
print("最终 URL:", response.url) # http://example.com
# 输出响应内容
print("响应内容:", response.text)
# 响应内容: <!doctype html>
# <html>
# <head>
# <title>Example Domain</title>
#
# <meta charset="utf-8" />
# <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
# <meta name="viewport" content="width=device-width, initial-scale=1" />
# <style type="text/css">
# body {
# background-color: #f0f0f2;
# margin: 0;
# padding: 0;
# font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
#
# }
# div {
# width: 600px;
# margin: 5em auto;
# padding: 2em;
# background-color: #fdfdff;
# border-radius: 0.5em;
# box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
# }
# a:link, a:visited {
# color: #38488f;
# text-decoration: none;
# }
# @media (max-width: 700px) {
# div {
# margin: 0 auto;
# width: auto;
# }
# }
# </style>
# </head>
#
# <body>
# <div>
# <h1>Example Domain</h1>
# <p>This domain is for use in illustrative examples in documents. You may use this
# domain in literature without prior coordination or asking for permission.</p>
# <p><a href="https://www.iana.org/domains/example">More information...</a></p>
# </div>
# </body>
# </html>
处理超时
可以使用timeout参数来设置请求超时时间:
import requests
# 请求目标接口,让其10秒后给出响应
response = requests.get('http://httpbin.org/delay/10')
print(response.text) # 一直等待目标接口给出响应
# 结果如下:
# {
# "args": {},
# "data": "",
# "files": {},
# "form": {},
# "headers": {
# "Accept": "*/*",
# "Accept-Encoding": "gzip, deflate, br",
# "Host": "httpbin.org",
# "User-Agent": "python-requests/2.31.0",
# "X-Amzn-Trace-Id": "Root=1-65e6771e-348de36a30fda99f7878fc8c"
# },
# "origin": "101.82.87.75",
# "url": "http://httpbin.org/delay/10"
# }
try:
# 请求目标接口,让其10秒后给出响应,另设置5秒超时
response = requests.get('http://httpbin.org/delay/10', timeout=5)
print(response.text) # 5秒内未拿到响应,超时,抛出异常
except requests.exceptions.Timeout:
print("请求超时!") # 输出:请求超时!
处理代理
可以通过设置proxies参数来使用代理发送请求
import requests
proxies = {
'http': 'http://127.0.0.1:8080',
'https': 'https://127.0.0.1:8080',
}
response = requests.get('http://httpbin.org/get', proxies=proxies)
print(response.text) # 抛出异常,无法连接到指定代理
# requests.exceptions.ProxyError: HTTPConnectionPool(host='127.0.0.1', port=8080): Max retries exceeded with url: http://httpbin.org/get (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcf084ee4f0>: Failed to establish a new connection: [Errno 61] Connection refused')))
SSL证书验证
证书认证通常用于安全地验证客户端和服务器之间的通信。
在HTTPS协议中,服务器会提供一个SSL/TLS证书,客户端会验证这个证书的有效性,以确保与正确的服务器进行通信,并且通信内容会被加密以保护数据的安全性。
默认情况下,Requests库会验证SSL证书。如果要禁用证书验证,可以设置verify参数为False
import requests
# 在Python的requests库中,当你向HTTPS URL发送请求时,它默认会验证服务器的SSL证书
# 如果你希望自定义证书验证,可以传递cert参数
cert_path = '/my_cert_file.crt'
# verify=True告诉requests库要验证服务器的SSL证书。
# 如果证书验证失败,requests库会抛出一个SSLError异常。
# response = requests.post('https://httpbin.org/post', verify=True, cert=cert_path)
# print(response.text) # 抛出异常错误,原因是我没有证书,OSError: Could not find the TLS certificate file, invalid path: /my_cert_file.crt
# 通过指定verify=False参数,禁用证书验证,但会出现警告:正在向主机的http发出未经验证的HTTPS请求
response = requests.post('https://httpbin.org/post', verify=False)
print(response.text)
处理文件下载
目标URL将返回文件的二进制数据,然后,我们将响应内容的content数据写入本地文件;
import requests
# 文件下载的URL
download_url = 'http://httpbin.org/bytes/1024'
# 发送GET请求并获取响应
response = requests.get(download_url)
# 将响应内容保存到本地文件
with open('./downloaded_file.bin', 'wb') as file:
file.write(response.content)
# 输出下载结果
print('File downloaded successfully.')
# 在当前目录下会出现downloaded_file.bin文件
处理文件上传
files字典中的键’file’对应于表单字段的名称,而文件对象则作为值传递;
import requests
# 文件上传的URL
upload_url = 'http://httpbin.org/post'
# 要上传的文件路径
# 文件内容是:The contents of my file: abcdefg1234567
file_path = './test.txt'
# 创建一个multipart/form-data编码的表单数据
with open(file_path, 'rb') as file:
files = {'file': file}
response = requests.post(upload_url, files=files)
# 输出上传结果
print(response.text)
# 结果如下:
# {
# "args": {},
# "data": "",
# "files": {
# "file": "The contents of my file: abcdefg1234567"
# },
# "form": {},
# "headers": {
# "Accept": "*/*",
# "Accept-Encoding": "gzip, deflate, br",
# "Content-Length": "183",
# "Content-Type": "multipart/form-data; boundary=fb427351fb3bfca331fd23c761ce5223",
# "Host": "httpbin.org",
# "User-Agent": "python-requests/2.31.0",
# "X-Amzn-Trace-Id": "Root=1-65e67ead-7ed04d2d39de1b00477bd288"
# },
# "json": null,
# "origin": "101.82.87.75",
# "url": "http://httpbin.org/post"
# }
使用Session管理会话
Session管理允许在多个请求之间保持状态。这在需要登录的网站上非常有用;
因为你可以在一个session中发送登录请求,然后在随后的请求中重复使用相同的session,而无需每次都重新登录。
使用requests.Session()对象可以创建一个会话,该会话可以跨多个请求保持某些参数和cookies
import requests
# 请求获取cookies接口
response1 = requests.get('https://httpbin.org/cookies')
print(response1.text)
# 结果如下:
# {
# "cookies": {}
# }
# 创建一个会话对象
session = requests.Session()
# 使用会话对象发送第一个请求,并将zhangsan设置cookies的值
response2 = session.get('https://httpbin.org/cookies/set/sessioncookie/zhangsan')
print(response2.text)
# 结果如下:
# {
# "cookies": {
# "sessioncookie": "zhangsan"
# }
# }
# 使用相同的会话对象发送第二个请求,携带之前设置的cookies
response3 = session.get('https://httpbin.org/cookies')
print(response3.text)
# 结果如下:
# {
# "cookies": {
# "sessioncookie": "zhangsan"
# }
# }
源码分析
1.第一层源码
先来看一下GET、POST、PUT、DELETE等请求的源码,看一下它们都有什么特点;
GET请求源码
def get(url, params=None, **kwargs):
r"""Sends a GET request.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary, list of tuples or bytes to send
in the query string for the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
"""
return request("get", url, params=params, **kwargs)
POST请求源码
def post(url, data=None, json=None, **kwargs):
r"""Sends a POST request.
:param url: URL for the new :class:`Request` object.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
"""
return request("post", url, data=data, json=json, **kwargs)
PUT请求源码
def put(url, data=None, **kwargs):
r"""Sends a PUT request.
:param url: URL for the new :class:`Request` object.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
"""
return request("put", url, data=data, **kwargs)
DELETE请求源码
def delete(url, **kwargs):
r"""Sends a DELETE request.
:param url: URL for the new :class:`Request` object.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
"""
return request("delete", url, **kwargs)
2.第二层源码
从源码中发现,所有请求方式的源码中最终调用的都是request方法;
这里成功验证了上文说的无论是GET、POST还是PUT、DELETE等请求都是便捷版;
接着,我们继续深入,去看下request方法源码有什么特点;
def request(method, url, **kwargs):
"""Constructs and sends a :class:`Request <Request>`.
:param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary, list of tuples or bytes to send
in the query string for the :class:`Request`.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
:param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
to add for the file.
:param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How many seconds to wait for the server to send data
before giving up, as a float, or a :ref:`(connect timeout, read
timeout) <timeouts>` tuple.
:type timeout: float or tuple
:param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
:type allow_redirects: bool
:param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
:param verify: (optional) Either a boolean, in which case it controls whether we verify
the server's TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to ``True``.
:param stream: (optional) if ``False``, the response content will be immediately downloaded.
:param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
:return: :class:`Response <Response>` object
:rtype: requests.Response
Usage::
>>> import requests
>>> req = requests.request('GET', 'https://httpbin.org/get')
>>> req
<Response [200]>
"""
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
3.第三层源码
从request方法的源码中可以看到,在进入底层方法前,使用了Python的上下文管理器;
来确保session对象在使用后被正确地关闭,即使发生了异常也是如此;
接着,我们继续深入,去看下session.request的源码;
def request(
self,
method,
url,
params=None,
data=None,
headers=None,
cookies=None,
files=None,
auth=None,
timeout=None,
allow_redirects=True,
proxies=None,
hooks=None,
stream=None,
verify=None,
cert=None,
json=None,
):
"""Constructs a :class:`Request <Request>`, prepares it and sends it.
Returns :class:`Response <Response>` object.
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query
string for the :class:`Request`.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param json: (optional) json to send in the body of the
:class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the
:class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the
:class:`Request`.
:param files: (optional) Dictionary of ``'filename': file-like-objects``
for multipart encoding upload.
:param auth: (optional) Auth tuple or callable to enable
Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How long to wait for the server to send
data before giving up, as a float, or a :ref:`(connect timeout,
read timeout) <timeouts>` tuple.
:type timeout: float or tuple
:param allow_redirects: (optional) Set to True by default.
:type allow_redirects: bool
:param proxies: (optional) Dictionary mapping protocol or protocol and
hostname to the URL of the proxy.
:param stream: (optional) whether to immediately download the response
content. Defaults to ``False``.
:param verify: (optional) Either a boolean, in which case it controls whether we verify
the server's TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to ``True``. When set to
``False``, requests will accept any TLS certificate presented by
the server, and will ignore hostname mismatches and/or expired
certificates, which will make your application vulnerable to
man-in-the-middle (MitM) attacks. Setting verify to ``False``
may be useful during local development or testing.
:param cert: (optional) if String, path to ssl client cert file (.pem).
If Tuple, ('cert', 'key') pair.
:rtype: requests.Response
"""
# Create the Request.
req = Request(
method=method.upper(),
url=url,
headers=headers,
files=files,
data=data or {},
json=json,
params=params or {},
auth=auth,
cookies=cookies,
hooks=hooks,
)
prep = self.prepare_request(req)
proxies = proxies or {}
settings = self.merge_environment_settings(
prep.url, proxies, stream, verify, cert
)
# Send the request.
send_kwargs = {
"timeout": timeout,
"allow_redirects": allow_redirects,
}
send_kwargs.update(settings)
resp = self.send(prep, **send_kwargs)
return resp
从session.request的源码中可以看出;
它是先创建一个Request,然后将传过来的所有参数放在里面,再接着调用self.send(),并将Request传过去;
这里我们将不在继续深入分析send后面的源码了,有兴趣的同学可以自行了解;
分析完源码之后发现,不需要单独在一个类中去定义Get、Post等其他方法,然后在单独调用request。
其实,我们直接调用request即可。
二次封装
1.request请求封装
import traceback
import requests
# 装饰器,用于请求之后,组装响应内容
def after(func):
def inside(*args, **kwargs):
requests_obj = func(*args, **kwargs)
# 组装请求响应信息
response_dict = {
'url': requests_obj.url, # 最终响应url
'encoding': requests_obj.encoding, # 响应编码
'info': requests_obj.reason, # 响应状态信息
'code': requests_obj.status_code, # 响应状态码
'headers': requests_obj.headers, # 响应头
'cookies': dict(requests_obj.cookies), # cookies
'seconds': requests_obj.elapsed.total_seconds(), # 秒
'microseconds': requests_obj.elapsed.microseconds, # 微秒
'millisecond': requests_obj.elapsed.microseconds / 1000 # 毫秒
}
try:
response_dict['json'] = requests_obj.json()
except ValueError:
response_dict['text'] = requests_obj.text
return response_dict
return inside
class HTTPClient(object):
def __init__(self):
"""
session管理器
requests.session():维持会话,跨请求的时候保持参数
"""
self.session = requests.session()
@after
def seed_request(self, method: str, url: str, params=None, data=None, json=None, headers=None, **kwargs):
"""
按照指定请求方式向url地址携带params/data/json/..数据发送HTTP请求
:param method: 必填项,字符类型,接口请求方式;如:GET、POST、PUT、DELETE等
:param url: 必填项,字符类型,接口请求地址;如:http://127.0.0.1/test
:param params: 非必填,字符类型,接口请求参数类型(参数增加到url中)
:param data: 非必填,字典类型,接口请求参数类型(作为Request的内容)
:param json: 非必填,JSON类型,接口请求参数类型(作为Request的内容)
:param headers: 非必填,字典类型,接口请求的头部信息;
:param kwargs: 非必填,字典类型,其他参数;
:return: 返回requests请求对象
"""
# 1.GET:获取实体数据
# 2.HEAD:获取响应头
# 3.POST:提交数据
# 4.PUT:上传数据
# 5.PATCH:同PUT请求,对已知资源进行局部更新
# 6.DELETE:删除数据
# 7.OPTIONS:测试通信
# 8.CONNECT:更改连接模式为管道方式的代理服务器
# 9.TRACE:回显服务方收到的请求,用于测试和诊断
# 1.检查请求方式是否允许
methods = ('GET', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE', 'OPTIONS', 'CONNECT', 'TRACE')
if method.upper() not in methods:
raise ValueError(f'不支持的请求方式[{method}]!支持的请求方式有:{methods}')
# 2.限制请求超时时间,如果kwargs字典中设置了timeout则直接使用
kwargs["timeout"] = 120 if "timeout" not in kwargs.keys() else int(kwargs["timeout"])
# 3.检查是否指定请求头信息
headers = HTTPClient.process_headers(headers=headers)
# 4.根据参数类型设置请求头的客户端数据类型
if json is not None:
# 可以忽略,因为使用json参数时,不需要手动序列化数据或设置Content-Type头,requests会自动处理。
headers['Content-Type'] = "application/json;charset=utf-8"
elif data is not None:
headers['Content-Type'] = "application/x-www-form-urlencoded;charset=UTF-8"
elif params is not None:
# 对于params,通常不需要设置Content-Type,因为它们会附加到URL中
pass
# 5.发送请求
try:
return self.session.request(method=method, url=url, params=params,
data=data, json=json,
headers=headers, **kwargs)
except Exception:
raise ValueError(f"接口请求失败,请检查[请求地址-请求参数-参数类型]是否有误!异常信息:{traceback.format_exc()}")
@staticmethod
def process_headers(headers):
default_headers = {
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Accept': 'application/json, text/plain, */*',
'User-Agent': "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36"
}
if headers is None:
return default_headers
if isinstance(headers, str):
try:
headers_dict = eval(headers)
if isinstance(headers_dict, dict):
return headers_dict
else:
raise TypeError(f"字符串转换后不是字典类型: {headers}")
except (NameError, SyntaxError):
raise TypeError(f"无法解析字符串为字典: {headers}")
if isinstance(headers, dict):
return headers
raise TypeError(f"参数headers应为None、字符串或字典类型,但得到了: {type(headers)}")
if __name__ == '__main__':
http = HTTPClient()
test_get = http.seed_request("GET", "http://httpbin.org/get")
print(test_get.get("url")) # http://httpbin.org/get
print(test_get.get("encoding")) # utf-8
print(test_get.get("info")) # OK
print(test_get.get("code")) # 200
data = {"name": "张三", "age": 20, "phone": "10086", "address": "上海市浦东新区"}
test_post = http.seed_request("POST", "http://httpbin.org/post", data=data)
print(test_post)
2.响应断言封装
import json
from jsonpath_ng import parse
import requests
class AssertTool:
def __init__(self, response):
"""
初始化断言工具类
:param response: requests的响应对象
"""
self.response = response
try:
self.response_json = response.json()
except json.JSONDecodeError:
self.response_json = None
def assert_status_code(self, expected_code):
"""
断言响应状态码
"""
assert self.response.status_code == expected_code, f"预期响应状态码:「{expected_code}」, 却得到:「{self.response.status_code}」"
def assert_status_message(self, expected_message):
"""
断言响应状态信息
"""
assert self.response.reason == expected_message, f"预期响应状态信息:「{expected_message}」, 却得到:「{self.response.reason}」"
def assert_json_value_exists(self, json_path):
"""
使用jsonpath断言响应体中的特定路径下的值是否存在
"""
if not self.response_json:
raise ValueError("响应不是JSON格式或为空")
jsonpath_expression = parse(json_path)
match = jsonpath_expression.find(self.response_json)
assert match, f"找不到JsonPath的值: {json_path}"
def assert_json_value(self, json_path, expected_value):
"""
断言响应体中的特定路径下的值是否与预期值匹配
"""
if not self.response_json:
raise ValueError("响应不是JSON格式或为空")
jsonpath_expression = parse(json_path)
match = jsonpath_expression.find(self.response_json)
assert match, f"找不到JsonPath的值: {json_path}"
actual_value = match[0].value
assert actual_value == expected_value, f"在「{json_path}」路径下预期值:「{expected_value}」, 却得到:「{actual_value}」"
def get_json_value(self, json_path):
"""
获取指定jsonpath路径的值
"""
if not self.response_json:
raise ValueError("Response is not in JSON format or is empty.")
jsonpath_expression = parse(json_path)
match = jsonpath_expression.find(self.response_json)
if not match:
raise ValueError(f"No value found for the jsonpath: {json_path}")
return match[0].value # 返回第一个匹配项的值
# 使用示例
if __name__ == "__main__":
# 发送一个示例请求
data = {"name": "张三", "age": 20, "phone": {"中国联通": "1001", "中国移动": "1002", "中国电信": "1003"}, "address": "上海市浦东新区"}
response = requests.post("http://httpbin.org/post", json=data)
print(response.json())
js = response.json()['json']
# 创建断言工具对象
assert_tool = AssertTool(response)
# 进行断言
assert_tool.assert_status_code(200)
assert_tool.assert_status_message('OK')
print(assert_tool.get_json_value('$.json.phone')) # {'中国电信': '1003', '中国移动': '1002', '中国联通': '1001'}
assert_tool.assert_json_value_exists('$.json.phone."中国移动"')
assert_tool.assert_json_value('$.json.phone."中国移动"', 666)
总结
Python的requests模块是发送HTTP请求的便捷工具,支持GET、POST等多种请求方法。
它提供了直观的API来定制请求,如设置请求头、传递参数等。
同时,requests也能自动处理许多底层细节,如Cookie、会话等。
总之,requests模块简化了Python的网络编程,使得发送HTTP请求变得高效且简单,是Python开发者处理网络请求的优选库,常常用于爬虫、数据分析、接口自动化等领域。