Python 爬虫与接口自动化必备Requests模块



文章开篇

Python的魅力,犹如星河璀璨,无尽无边;人生苦短、我用Python!


HTTP简介

HTTP(Hypertext Transfer Protocol)超文本传输协议;
HTTP是一个基于“请求与响应”模式的、无状态的应用层协议,采用URL作为定位网络资源的标识
URL是通过HTTP协议存取资源的Internet路径,一个URL对应一个数据资源;
URL格式:http://host[:port][path]

  • **host:**合法的Internet主机域名或IP地址;
  • **port:**端口号,缺省端口为80;
  • **path:**请求资源的路径;

HTTP协议对资源的操作方法说明

  • **GET请求:**获取URL位置的资源;
  • **HEAD请求:**获取URL位置资源的响应消息报告,即获取该资源的头部信息;
  • **POST请求:**向URL位置的资源后附加新的数据;
  • **PUT请求:**向URL位置存储一个资源,覆盖原URL位置的资源;
  • **PATCH请求:**局部更新URL位置的资源,即改变该处资源的部分内容;
  • **DELETE请求:**删除URL位置存储的资源;

GET和POST请求的区别

GET请求

  • 用于获取资源,当采用GET方式请求指定资源时,被访问的资源经服务器解析后立即返回响应内容。通常以GET方式请求特定资源时,请求中不应该包含请求体,所有需要向被请求资源传递的数据都应该通过 URL 向服务器传递;

POST请求

  • 用于提交数据,当采用POST方式向指定位置提交数据时,数据被包含在请求体中,服务器接收到这些数据后可能会建立新的资源、也可能会更新已有的资源。同时POST方式的请求体可以包含非常多的数据,而且格式不限。因此POST方式用途较为广泛,几乎所有的提交操作都可以使用POST方式来完成;

本质上区别

  • GET产生 一个 TCP数据包
  • POST产生 两个 TCP数据包
  • GET方式请求,浏览器会把http header和data一并发送出去,服务器响应200(返回数据);
  • POST方式请求,浏览器先发送header,服务器响应100 continue,浏览器再发送data,服务器响应200;

参数传递区别

  • GET请求将参数包含在URL中
  • POST请求通过request body传递参数
  • 这意味着GET请求的参数会直接显示在浏览器的地址栏中,而POST请求的参数不会显示。

安全性区别

  • POST请求通常被认为比GET请求更安全,因为POST请求的数据不会像GET请求那样显示在URL中,从而减少了敏感信息泄露的风险。

缓存区别

  • GET请求的结果可以被浏览器缓存,这使得重复访问同一资源时可以更快地加载页面。
  • POST请求通常不会被缓存,因为每次提交的数据可能不同,缓存可能会导致数据不一致。

URL长度区别

  • 由于GET请求的参数是附加在URL上的,所以URL的长度受到限制。
  • 如果参数过多或过长,可能会导致服务器拒绝处理请求;
  • POST请求则没有这个问题,因为参数是在请求体中发送的;

在这里插入图片描述


HTTP常见请求参数

参数名称参数描述
url请求的目标地址
headers请求头
data发送
params查询字符串
host请求web服务器的域名地址
User-AgentHTTP客户端运行的浏览器类型的详细信息。通过该头部信息,web服务器可以判断到当前HTTP请求的客户端浏览器类别
Accept指定客户端能够接收的内容类型,内容类型中的先后次序表示客户端接收的先后次序
Accept-Encoding指定客户端浏览器可以支持的web服务器返回内容压缩编码类型
Accept-Language指定HTTP客户端浏览器用来展示返回信息所优先选择的语言
Connection表示是否需要持久连接。如果web服务器端看到这里的值为“Keep-Alive”,或者看到请求使用的是HTTP 1.1(HTTP 1.1默认进行持久连接),表示连接持久有效,是不会断开的
cookieHTTP请求发送时,会把保存在该请求域名下的所有cookie值一起发送给web服务器
Refer包含一个URL,用户从该URL代表的页面出发访问当前请求的页面

Requests简介

Requests库简化了Python与Web服务的交互,通过直观高效的API,实现HTTP请求与响应的优雅处理
支持广泛的HTTP方法与常见的身份验证方式,无论是网页内容抓取、API调用还是用户登录模拟,Requests均展现出其强大与灵活的特质;
相较于urllib的复杂与繁琐,Requests以其人性化和易用性赢得了开发者喜爱,被誉为“人类最友好的HTTP客户端”。
强大与灵活性使得网页内容抓取、API调用和用户登录模拟变得轻松自如
在Python网络请求领域,Requests已成为众多开发者的首选工具


Requests安装

Requests属于第三方库,需要打开你的终端(或命令提示符),输入以下命令:

pip install requests

核心方法简介

requests.request() 构造一个请求,支撑以下各方法的基础方法

  • requests.get() 获取HTML网页的主要方法,对应于HTTP的GET;
  • requests.head() 获取HTML网页头部信息的方法,对应于HTTP的HEAD;
  • requests.post() 向HTML网页提交POST请求的方法,对应于HTTP的POST;
  • requests.put() 向HTML网页提交PUT请求的方法,对应于HTTP的PUT;
  • requests.patch() 向HTML网页提交局部修改请求,对应于HTTP的PATCH;
  • requests.delete() 向HTML页面提交删除请求,对应于HTTP的DELETE;

它们都是requests.request的便捷版,也就是说,调用requests.get其实相当于调用 requests.request(“GET”, xxx);


requests方法说明

原型:requests.request(method,url,**kwargs)

method:请求方式如下,

  • GET
  • HEAD
  • POST
  • PUT
  • PATCH
  • DELETE
  • OPTIONS

url:模拟获取页面的url链接
kwargs:控制访问的参数,共13个,说明如下

params类型描述
params字典、字节序列作为参数添加到URL链接中
data字典、字节序列、文件对象发起请求时携带的内容
json字典、字符序列将参数自动转换为JSON格式
headers字典HTTP请求头定制内容
cookies字典发起请求时携带的cookie
auth元祖支持HTTP认证功能
files字典类型传输文件
timeout整数类型以秒为单位设定请求超时时间
proxies字典类型设定访问代理服务器,可以增加登陆认证
allow_redirects布尔值重定向开关,默认为True
stream布尔值获取内容立即下载开关,默认为True
verify布尔值认证SSL证书开关,默认为True
cert字符串本地SSL证书路径

httpbin.org是一个用于测试和演示HTTP功能的在线服务,它不会实际地存储或管理会话状态


GET请求

相比于urllib中繁琐的接口,使用Requests发送GET请求就像是告诉Python:“嘿,去访问这个网址,并把内容带回来!”

  • GET请求没有请求体
  • 携带数据大小必须在1K之内
  • GET请求数据会暴露在浏览器的地址栏中

1.不带参数发送GET请求
import requests

response = requests.get('https://httpbin.org')

print(response.text)
# 请求结果如下(内容太长,截取部分展示):
# <!DOCTYPE html>
# <html lang="en">
# 
# <head>
#     <meta charset="UTF-8">
#     <title>httpbin.org</title>
#     <link href="https://fonts.googleapis.com/css?family=Open+Sans:400,700|Source+Code+Pro:300,600|Titillium+Web:400,600,700"
#         rel="stylesheet">
#     <link rel="stylesheet" type="text/css" href="/flasgger_static/swagger-ui.css">
#     <link rel="icon" type="image/png" href="/static/favicon.ico" sizes="64x64 32x32 16x16" />
#     <style>
#         html {
#             box-sizing: border-box;
#             overflow: -moz-scrollbars-vertical;
#             overflow-y: scroll;
#         }
# 
#         *,
#         *:before,
#         *:after {
#             box-sizing: inherit;
#         }
# 
#         body {
#             margin: 0;
#             background: #fafafa;
#         }
#     </style>
# </head>
# ...

2.带参数发送GET请求

import requests

payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.get('http://httpbin.org/get', params=payload)

print(response.text)
# {
#   "args": {
#     "key1": "value1", 
#     "key2": "value2"
#   }, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e5cd40-3bbebd954933e1aa374f79b6"
#   }, 
#   "origin": "180.164.28.66", 
#   "url": "http://httpbin.org/get?key1=value1&key2=value2"
# }


POST请求

进行POST请求时,Requests的表现同样优于urllib,提供了更加直观和简洁的数据提交方式:

  • 请求携带数据不会暴露在浏览器地址栏中
  • 请求携带数据没有大小上限
  • 有请求体
  • 请求体中如果存在中文,会使用URL编码

1.不带参数发送POST请求
import requests

# 不带参数
response = requests.post('http://httpbin.org/post')

print(response.text)
# 结果如下:
# {
#   "args": {}, 
#   "data": "", 
#   "files": {}, 
#   "form": {}, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Content-Length": "0", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e672cc-6607c15c29c4b0f87bb95b7e"
#   }, 
#   "json": null, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/post"
# }

2.带参数发送POST请求-键值对

import requests

data = {'username': 'zhangsan', 'password': 'abcdefg1234567'}

# 使用params关键字传递参数,以键值对形式请求
response = requests.post('http://httpbin.org/post', params=data)

print(response.text)
# 结果如下:
# {
#   "args": {
#     "password": "abcdefg1234567", 
#     "username": "zhangsan"
#   }, 
#   "data": "", 
#   "files": {}, 
#   "form": {}, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Content-Length": "0", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e672f6-6e7c9158279bdfa8070b24b3"
#   }, 
#   "json": null, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/post?username=zhangsan&password=abcdefg1234567"
# }

3.带参数发送POST请求-表单
import requests

data = {'username': 'zhangsan', 'password': 'abcdefg1234567'}

# 使用data关键字传递参数,以表单形式请求
response = requests.post('http://httpbin.org/post', data=data)

print(response.text)
# 结果如下:
# {
#   "args": {}, 
#   "data": "", 
#   "files": {}, 
#   "form": {
#     "password": "abcdefg1234567", 
#     "username": "zhangsan"
#   }, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Content-Length": "41", 
#     "Content-Type": "application/x-www-form-urlencoded", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e6732b-022e03c80e99f52a4e69fb0d"
#   }, 
#   "json": null, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/post"
# }

4.带参数发送POST请求-json对象

import requests

data = {'username': 'zhangsan', 'password': 'abcdefg1234567'}

# 使用json关键字传递参数,以对象形式请求
response = requests.post('http://httpbin.org/post', json=data)

print(response.text)
# {
#   "args": {},
#   "data": "{\"username\": \"zhangsan\", \"password\": \"abcdefg1234567\"}",
#   "files": {},
#   "form": {},
#   "headers": {
#     "Accept": "*/*",
#     "Accept-Encoding": "gzip, deflate, br",
#     "Content-Length": "54",
#     "Content-Type": "application/json",
#     "Host": "httpbin.org",
#     "User-Agent": "python-requests/2.31.0",
#     "X-Amzn-Trace-Id": "Root=1-65e5ce52-3137c56054637a925191dcd2"
#   },
#   "json": {
#     "password": "abcdefg1234567",
#     "username": "zhangsan"
#   },
#   "origin": "180.164.28.66",
#   "url": "http://httpbin.org/post"
# }

处理响应内容

Requests库提供了许多方法来处理响应内容;
例如,使用status_code属性获取响应的状态码
使用headers属性获取响应头
使用content属性获取响应内容的二进制形式等等
下面是一些常用响应对象的属性:

属性名描述
text获取文本响应内容,即网页源代码(str格式)
content获取二进制响应内容,即网页源代码(bytes格式)
status_codeHTTP响应状态码(例如,200、404等)。
headers一个字典,包含响应头。键为响应头名称,值为响应头的值。
cookies一个RequestsCookieJar对象,包含服务器发送的所有cookies。
url获取最终的URL(在重定向之后)。
history一个Response对象列表,按照请求被重定向的顺序排序。
encoding从HTTP header中猜测的响应内容编码方式。
reason响应状态码的文本表示(例如,“Not Found” 或 “OK”)。
elapsed发送请求到响应返回之间经过的时间,一个timedelta对象。
request产生当前响应的Request对象。
json()一个方法,尝试将响应内容解析为JSON格式。如果解析成功,返回解析后的字典/列表;否则抛出一个异常。
raise_for_status()如果响应状态码指示一个HTTP错误(4xx或5xx),则抛出HTTPError异常;否则什么也不做。

import requests

response = requests.get('http://httpbin.org/get')

print("返回接口的文本信息:", response.text)
print("返回bytes字节类型数据:", response.content)
print("返回状态码:", response.status_code)
print("返回响应头:", response.headers)
print("返回cookie信息:", response.cookies)
print("返回最终请求地址:", response.url)
print("返回响应对象列表:", response.history)
print("返回编码格式:", response.encoding)
print("返回状态信息:", response.reason)
print("返回产生当前响应的对象:", response.request)
print("返回json格式的数据:", response.json())


请求头

有时候,需要设置请求头来模拟浏览器发送请求。可以使用headers参数来设置请求头;

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
    'Mark': 'zhangsan'  # 在请求头中打个标记,后在响应中查看是否存在
}

response = requests.get('http://httpbin.org/get', headers=headers)

print(response.text)
# 结果如下
# {
#   "args": {}, 
#   "headers": {
#     "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8", 
#     "Host": "httpbin.org", 
#     "Mark": "zhangsan", 
#     "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36", 
#     "X-Amzn-Trace-Id": "Root=1-65e6764d-00190c076b2527315871a837"
#   }, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/get"
# }

Cookie

服务器可以在响应中设置一个或多个Cookie,然后浏览器会在后续的请求中自动发送这些Cookie

import requests

response = requests.get('http://httpbin.org/cookies')

print(response.text)
# 结果如下:
# {
#   "cookies": {}
# }

cookies = {
    'session_id': 'abc123',
    'user_pref': 'lang=zh_CN'
}

response = requests.get('http://httpbin.org/cookies', cookies=cookies)

print(response.text)
# 结果如下:
# {
#   "cookies": {
#     "session_id": "abc123", 
#     "user_pref": "lang=zh_CN"
#   }
# }

重定向

默认情况下,Requests库会自动处理重定向
也可以使用allow_redirects参数来控制是否允许重定向;

import requests

url = "http://httpbin.org/redirect-to?url=http://example.com"

# 发送请求
response = requests.get(url, allow_redirects=True)

# 输出重定向历史
print("重定向历史:", response.history)   # [<Response [302]>]

# 输出最终响应的 URL
print("最终 URL:", response.url)  # http://example.com

# 输出响应内容
print("响应内容:", response.text)
# 响应内容: <!doctype html>
# <html>
# <head>
#     <title>Example Domain</title>
# 
#     <meta charset="utf-8" />
#     <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
#     <meta name="viewport" content="width=device-width, initial-scale=1" />
#     <style type="text/css">
#     body {
#         background-color: #f0f0f2;
#         margin: 0;
#         padding: 0;
#         font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
#         
#     }
#     div {
#         width: 600px;
#         margin: 5em auto;
#         padding: 2em;
#         background-color: #fdfdff;
#         border-radius: 0.5em;
#         box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
#     }
#     a:link, a:visited {
#         color: #38488f;
#         text-decoration: none;
#     }
#     @media (max-width: 700px) {
#         div {
#             margin: 0 auto;
#             width: auto;
#         }
#     }
#     </style>    
# </head>
# 
# <body>
# <div>
#     <h1>Example Domain</h1>
#     <p>This domain is for use in illustrative examples in documents. You may use this
#     domain in literature without prior coordination or asking for permission.</p>
#     <p><a href="https://www.iana.org/domains/example">More information...</a></p>
# </div>
# </body>
# </html>

处理超时

可以使用timeout参数来设置请求超时时间:

import requests

# 请求目标接口,让其10秒后给出响应
response = requests.get('http://httpbin.org/delay/10')

print(response.text)    # 一直等待目标接口给出响应
# 结果如下:
# {
#   "args": {}, 
#   "data": "", 
#   "files": {}, 
#   "form": {}, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e6771e-348de36a30fda99f7878fc8c"
#   }, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/delay/10"
# }

try:
    # 请求目标接口,让其10秒后给出响应,另设置5秒超时
    response = requests.get('http://httpbin.org/delay/10', timeout=5)
    print(response.text)    # 5秒内未拿到响应,超时,抛出异常
except requests.exceptions.Timeout:
    print("请求超时!")  # 输出:请求超时!
    

处理代理

可以通过设置proxies参数来使用代理发送请求

import requests

proxies = {
    'http': 'http://127.0.0.1:8080',
    'https': 'https://127.0.0.1:8080',
}

response = requests.get('http://httpbin.org/get', proxies=proxies)

print(response.text)    # 抛出异常,无法连接到指定代理
# requests.exceptions.ProxyError: HTTPConnectionPool(host='127.0.0.1', port=8080): Max retries exceeded with url: http://httpbin.org/get (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcf084ee4f0>: Failed to establish a new connection: [Errno 61] Connection refused')))


SSL证书验证

证书认证通常用于安全地验证客户端和服务器之间的通信。
在HTTPS协议中,服务器会提供一个SSL/TLS证书,客户端会验证这个证书的有效性,以确保与正确的服务器进行通信,并且通信内容会被加密以保护数据的安全性。
默认情况下,Requests库会验证SSL证书。如果要禁用证书验证,可以设置verify参数为False


import requests

# 在Python的requests库中,当你向HTTPS URL发送请求时,它默认会验证服务器的SSL证书
# 如果你希望自定义证书验证,可以传递cert参数
cert_path = '/my_cert_file.crt'


# verify=True告诉requests库要验证服务器的SSL证书。
# 如果证书验证失败,requests库会抛出一个SSLError异常。
# response = requests.post('https://httpbin.org/post', verify=True, cert=cert_path)
# print(response.text)    # 抛出异常错误,原因是我没有证书,OSError: Could not find the TLS certificate file, invalid path: /my_cert_file.crt

# 通过指定verify=False参数,禁用证书验证,但会出现警告:正在向主机的http发出未经验证的HTTPS请求
response = requests.post('https://httpbin.org/post', verify=False)

print(response.text)

处理文件下载

目标URL将返回文件的二进制数据,然后,我们将响应内容的content数据写入本地文件

import requests

# 文件下载的URL
download_url = 'http://httpbin.org/bytes/1024'

# 发送GET请求并获取响应
response = requests.get(download_url)

# 将响应内容保存到本地文件
with open('./downloaded_file.bin', 'wb') as file:
    file.write(response.content)

# 输出下载结果
print('File downloaded successfully.')
# 在当前目录下会出现downloaded_file.bin文件


处理文件上传

files字典中的键’file’对应于表单字段的名称,而文件对象则作为值传递;

import requests

# 文件上传的URL
upload_url = 'http://httpbin.org/post'

# 要上传的文件路径
# 文件内容是:The contents of my file: abcdefg1234567
file_path = './test.txt'

# 创建一个multipart/form-data编码的表单数据
with open(file_path, 'rb') as file:
    files = {'file': file}
    response = requests.post(upload_url, files=files)

# 输出上传结果
print(response.text)
# 结果如下:
# {
#   "args": {}, 
#   "data": "", 
#   "files": {
#     "file": "The contents of my file: abcdefg1234567"
#   }, 
#   "form": {}, 
#   "headers": {
#     "Accept": "*/*", 
#     "Accept-Encoding": "gzip, deflate, br", 
#     "Content-Length": "183", 
#     "Content-Type": "multipart/form-data; boundary=fb427351fb3bfca331fd23c761ce5223", 
#     "Host": "httpbin.org", 
#     "User-Agent": "python-requests/2.31.0", 
#     "X-Amzn-Trace-Id": "Root=1-65e67ead-7ed04d2d39de1b00477bd288"
#   }, 
#   "json": null, 
#   "origin": "101.82.87.75", 
#   "url": "http://httpbin.org/post"
# }

使用Session管理会话

Session管理允许在多个请求之间保持状态。这在需要登录的网站上非常有用;
因为你可以在一个session中发送登录请求,然后在随后的请求中重复使用相同的session,而无需每次都重新登录。
使用requests.Session()对象可以创建一个会话,该会话可以跨多个请求保持某些参数和cookies

import requests

# 请求获取cookies接口
response1 = requests.get('https://httpbin.org/cookies')
print(response1.text)
# 结果如下:
# {
#   "cookies": {}
# }

# 创建一个会话对象
session = requests.Session()

# 使用会话对象发送第一个请求,并将zhangsan设置cookies的值
response2 = session.get('https://httpbin.org/cookies/set/sessioncookie/zhangsan')
print(response2.text)
# 结果如下:
# {
#   "cookies": {
#     "sessioncookie": "zhangsan"
#   }
# }

# 使用相同的会话对象发送第二个请求,携带之前设置的cookies
response3 = session.get('https://httpbin.org/cookies')
print(response3.text)
# 结果如下:
# {
#   "cookies": {
#     "sessioncookie": "zhangsan"
#   }
# }


源码分析

1.第一层源码

先来看一下GET、POST、PUT、DELETE等请求的源码,看一下它们都有什么特点;


GET请求源码

def get(url, params=None, **kwargs):
    r"""Sends a GET request.

    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request("get", url, params=params, **kwargs)

POST请求源码

def post(url, data=None, json=None, **kwargs):
    r"""Sends a POST request.

    :param url: URL for the new :class:`Request` object.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request("post", url, data=data, json=json, **kwargs)

PUT请求源码

def put(url, data=None, **kwargs):
    r"""Sends a PUT request.

    :param url: URL for the new :class:`Request` object.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request("put", url, data=data, **kwargs)

DELETE请求源码

def delete(url, **kwargs):
    r"""Sends a DELETE request.

    :param url: URL for the new :class:`Request` object.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request("delete", url, **kwargs)
2.第二层源码

从源码中发现,所有请求方式的源码中最终调用的都是request方法
这里成功验证了上文说的无论是GET、POST还是PUT、DELETE等请求都是便捷版
接着,我们继续深入,去看下request方法源码有什么特点;


def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How many seconds to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response

    Usage::

      >>> import requests
      >>> req = requests.request('GET', 'https://httpbin.org/get')
      >>> req
      <Response [200]>
    """

    # By using the 'with' statement we are sure the session is closed, thus we
    # avoid leaving sockets open which can trigger a ResourceWarning in some
    # cases, and look like a memory leak in others.
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)

3.第三层源码

从request方法的源码中可以看到,在进入底层方法前,使用了Python的上下文管理器
来确保session对象在使用后被正确地关闭,即使发生了异常也是如此;
接着,我们继续深入,去看下session.request的源码;


    def request(
        self,
        method,
        url,
        params=None,
        data=None,
        headers=None,
        cookies=None,
        files=None,
        auth=None,
        timeout=None,
        allow_redirects=True,
        proxies=None,
        hooks=None,
        stream=None,
        verify=None,
        cert=None,
        json=None,
    ):
        """Constructs a :class:`Request <Request>`, prepares it and sends it.
        Returns :class:`Response <Response>` object.

        :param method: method for the new :class:`Request` object.
        :param url: URL for the new :class:`Request` object.
        :param params: (optional) Dictionary or bytes to be sent in the query
            string for the :class:`Request`.
        :param data: (optional) Dictionary, list of tuples, bytes, or file-like
            object to send in the body of the :class:`Request`.
        :param json: (optional) json to send in the body of the
            :class:`Request`.
        :param headers: (optional) Dictionary of HTTP Headers to send with the
            :class:`Request`.
        :param cookies: (optional) Dict or CookieJar object to send with the
            :class:`Request`.
        :param files: (optional) Dictionary of ``'filename': file-like-objects``
            for multipart encoding upload.
        :param auth: (optional) Auth tuple or callable to enable
            Basic/Digest/Custom HTTP Auth.
        :param timeout: (optional) How long to wait for the server to send
            data before giving up, as a float, or a :ref:`(connect timeout,
            read timeout) <timeouts>` tuple.
        :type timeout: float or tuple
        :param allow_redirects: (optional) Set to True by default.
        :type allow_redirects: bool
        :param proxies: (optional) Dictionary mapping protocol or protocol and
            hostname to the URL of the proxy.
        :param stream: (optional) whether to immediately download the response
            content. Defaults to ``False``.
        :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``. When set to
            ``False``, requests will accept any TLS certificate presented by
            the server, and will ignore hostname mismatches and/or expired
            certificates, which will make your application vulnerable to
            man-in-the-middle (MitM) attacks. Setting verify to ``False``
            may be useful during local development or testing.
        :param cert: (optional) if String, path to ssl client cert file (.pem).
            If Tuple, ('cert', 'key') pair.
        :rtype: requests.Response
        """
        # Create the Request.
        req = Request(
            method=method.upper(),
            url=url,
            headers=headers,
            files=files,
            data=data or {},
            json=json,
            params=params or {},
            auth=auth,
            cookies=cookies,
            hooks=hooks,
        )
        prep = self.prepare_request(req)

        proxies = proxies or {}

        settings = self.merge_environment_settings(
            prep.url, proxies, stream, verify, cert
        )

        # Send the request.
        send_kwargs = {
            "timeout": timeout,
            "allow_redirects": allow_redirects,
        }
        send_kwargs.update(settings)
        resp = self.send(prep, **send_kwargs)

        return resp

从session.request的源码中可以看出;
它是先创建一个Request,然后将传过来的所有参数放在里面,再接着调用self.send(),并将Request传过去
这里我们将不在继续深入分析send后面的源码了,有兴趣的同学可以自行了解;
分析完源码之后发现,不需要单独在一个类中去定义Get、Post等其他方法,然后在单独调用request。
其实,我们直接调用request即可。

二次封装

1.request请求封装
import traceback
import requests


# 装饰器,用于请求之后,组装响应内容
def after(func):
    def inside(*args, **kwargs):
        requests_obj = func(*args, **kwargs)
        # 组装请求响应信息
        response_dict = {
            'url': requests_obj.url,  # 最终响应url
            'encoding': requests_obj.encoding,  # 响应编码
            'info': requests_obj.reason,  # 响应状态信息
            'code': requests_obj.status_code,  # 响应状态码
            'headers': requests_obj.headers,  # 响应头
            'cookies': dict(requests_obj.cookies),  # cookies
            'seconds': requests_obj.elapsed.total_seconds(),  # 秒
            'microseconds': requests_obj.elapsed.microseconds,  # 微秒
            'millisecond': requests_obj.elapsed.microseconds / 1000  # 毫秒
        }
        try:
            response_dict['json'] = requests_obj.json()
        except ValueError:
            response_dict['text'] = requests_obj.text

        return response_dict

    return inside


class HTTPClient(object):

    def __init__(self):
        """
        session管理器
        requests.session():维持会话,跨请求的时候保持参数
        """
        self.session = requests.session()

    @after
    def seed_request(self, method: str, url: str, params=None, data=None, json=None, headers=None, **kwargs):
        """
        按照指定请求方式向url地址携带params/data/json/..数据发送HTTP请求
        :param method:  必填项,字符类型,接口请求方式;如:GET、POST、PUT、DELETE等
        :param url:     必填项,字符类型,接口请求地址;如:http://127.0.0.1/test
        :param params:  非必填,字符类型,接口请求参数类型(参数增加到url中)
        :param data:    非必填,字典类型,接口请求参数类型(作为Request的内容)
        :param json:    非必填,JSON类型,接口请求参数类型(作为Request的内容)
        :param headers: 非必填,字典类型,接口请求的头部信息;
        :param kwargs:  非必填,字典类型,其他参数;
        :return:        返回requests请求对象
        """

        # 1.GET:获取实体数据
        # 2.HEAD:获取响应头
        # 3.POST:提交数据
        # 4.PUT:上传数据
        # 5.PATCH:同PUT请求,对已知资源进行局部更新
        # 6.DELETE:删除数据
        # 7.OPTIONS:测试通信
        # 8.CONNECT:更改连接模式为管道方式的代理服务器
        # 9.TRACE:回显服务方收到的请求,用于测试和诊断

        # 1.检查请求方式是否允许
        methods = ('GET', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE', 'OPTIONS', 'CONNECT', 'TRACE')
        if method.upper() not in methods:
            raise ValueError(f'不支持的请求方式[{method}]!支持的请求方式有:{methods}')

        # 2.限制请求超时时间,如果kwargs字典中设置了timeout则直接使用
        kwargs["timeout"] = 120 if "timeout" not in kwargs.keys() else int(kwargs["timeout"])

        # 3.检查是否指定请求头信息
        headers = HTTPClient.process_headers(headers=headers)

        # 4.根据参数类型设置请求头的客户端数据类型
        if json is not None:
            # 可以忽略,因为使用json参数时,不需要手动序列化数据或设置Content-Type头,requests会自动处理。
            headers['Content-Type'] = "application/json;charset=utf-8"
        elif data is not None:
            headers['Content-Type'] = "application/x-www-form-urlencoded;charset=UTF-8"
        elif params is not None:
            # 对于params,通常不需要设置Content-Type,因为它们会附加到URL中
            pass

        # 5.发送请求
        try:
            return self.session.request(method=method, url=url, params=params,
                                        data=data, json=json,
                                        headers=headers, **kwargs)
        except Exception:
            raise ValueError(f"接口请求失败,请检查[请求地址-请求参数-参数类型]是否有误!异常信息:{traceback.format_exc()}")

    @staticmethod
    def process_headers(headers):
        default_headers = {
            'Accept-Language': 'zh-CN,zh;q=0.9',
            'Connection': 'keep-alive',
            'Accept': 'application/json, text/plain, */*',
            'User-Agent': "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36"
        }

        if headers is None:
            return default_headers

        if isinstance(headers, str):
            try:
                headers_dict = eval(headers)
                if isinstance(headers_dict, dict):
                    return headers_dict
                else:
                    raise TypeError(f"字符串转换后不是字典类型: {headers}")
            except (NameError, SyntaxError):
                raise TypeError(f"无法解析字符串为字典: {headers}")

        if isinstance(headers, dict):
            return headers

        raise TypeError(f"参数headers应为None、字符串或字典类型,但得到了: {type(headers)}")


if __name__ == '__main__':
    http = HTTPClient()
    test_get = http.seed_request("GET", "http://httpbin.org/get")
    print(test_get.get("url"))  # http://httpbin.org/get
    print(test_get.get("encoding"))  # utf-8
    print(test_get.get("info"))  # OK
    print(test_get.get("code"))  # 200

    data = {"name": "张三", "age": 20, "phone": "10086", "address": "上海市浦东新区"}
    test_post = http.seed_request("POST", "http://httpbin.org/post", data=data)
    print(test_post)

2.响应断言封装
import json
from jsonpath_ng import parse
import requests


class AssertTool:
    def __init__(self, response):
        """
        初始化断言工具类
        :param response: requests的响应对象
        """
        self.response = response
        try:
            self.response_json = response.json()
        except json.JSONDecodeError:
            self.response_json = None

    def assert_status_code(self, expected_code):
        """
        断言响应状态码
        """
        assert self.response.status_code == expected_code, f"预期响应状态码:「{expected_code}」, 却得到:「{self.response.status_code}」"

    def assert_status_message(self, expected_message):
        """
        断言响应状态信息
        """
        assert self.response.reason == expected_message, f"预期响应状态信息:「{expected_message}」, 却得到:「{self.response.reason}」"

    def assert_json_value_exists(self, json_path):
        """
        使用jsonpath断言响应体中的特定路径下的值是否存在
        """
        if not self.response_json:
            raise ValueError("响应不是JSON格式或为空")

        jsonpath_expression = parse(json_path)
        match = jsonpath_expression.find(self.response_json)
        assert match, f"找不到JsonPath的值: {json_path}"

    def assert_json_value(self, json_path, expected_value):
        """
        断言响应体中的特定路径下的值是否与预期值匹配
        """
        if not self.response_json:
            raise ValueError("响应不是JSON格式或为空")

        jsonpath_expression = parse(json_path)
        match = jsonpath_expression.find(self.response_json)
        assert match, f"找不到JsonPath的值: {json_path}"
        actual_value = match[0].value
        assert actual_value == expected_value, f"在「{json_path}」路径下预期值:「{expected_value}」, 却得到:「{actual_value}」"

    def get_json_value(self, json_path):
        """
        获取指定jsonpath路径的值
        """
        if not self.response_json:
            raise ValueError("Response is not in JSON format or is empty.")

        jsonpath_expression = parse(json_path)
        match = jsonpath_expression.find(self.response_json)
        if not match:
            raise ValueError(f"No value found for the jsonpath: {json_path}")
        return match[0].value  # 返回第一个匹配项的值

# 使用示例
if __name__ == "__main__":


    # 发送一个示例请求
    data = {"name": "张三", "age": 20, "phone": {"中国联通": "1001", "中国移动": "1002", "中国电信": "1003"}, "address": "上海市浦东新区"}

    response = requests.post("http://httpbin.org/post", json=data)
    print(response.json())
    js = response.json()['json']
    # 创建断言工具对象
    assert_tool = AssertTool(response)

    # 进行断言
    assert_tool.assert_status_code(200)
    assert_tool.assert_status_message('OK')
    print(assert_tool.get_json_value('$.json.phone'))   # {'中国电信': '1003', '中国移动': '1002', '中国联通': '1001'}
    assert_tool.assert_json_value_exists('$.json.phone."中国移动"')
    assert_tool.assert_json_value('$.json.phone."中国移动"', 666)


总结

Python的requests模块是发送HTTP请求的便捷工具,支持GET、POST等多种请求方法。
它提供了直观的API来定制请求,如设置请求头、传递参数等。
同时,requests也能自动处理许多底层细节,如Cookie、会话等。
总之,requests模块简化了Python的网络编程,使得发送HTTP请求变得高效且简单,是Python开发者处理网络请求的优选库,常常用于爬虫、数据分析、接口自动化等领域

  • 31
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

需要休息的KK.

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值