什么是requests ?
requests 是Python语言编写的,基于urllib,采用了Apache2 Licensed 开源协议的HTTP库。
它比 URLLib更加方便,可以节约我们大量的工作,完全满足HTTP测试需求。
一句话:requests 便是Python实现简单易用的HTTP库。
request 用法:
import requests
response = requests.get('https://www.baidu.com/') # 发起GET请求
print(type(response)) # 查看响应类型 requests.models.Response
print(response.status_code) # 输出状态码
print(type(response.text)) # 输出响应内容类型 text
print(response.text) # 输出响应内容
print(response.cookies) # 输出cookies
requests 请求方式:
import requests
# 发起POST请求
requests.post('http://httpbin.org/post')
# 发起PUT请求
requests.put('http://httpbin.org/put')
# 发起DELETE请求
requests.delete('http://httpbin.org/delete')
# 发送HEAD请求
requests.head('http://httpbin.org/get')
# 发送OPTION请求
requests.options('http://httpbin.org/get')
基本GET请求
基本写法
import requests
response = requests.get('http://httpbin.org/get')
print(response.text)
带参数的GET请求(1)
import requests
response = requests.get('http://httpbin.org/get?name=germey&age=22')
print(response.text)
效果如下:
{
"args": {
"age": "22",
"name": "germey"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.23.0",
"X-Amzn-Trace-Id": "Root=1-5e97b85d-dfc5d314d137f9bc117c77b8"
},
"origin": "123.93.143.40",
"url": "http://httpbin.org/get?name=germey&age=22"
}
带参数的GET请求 (2) params
import requests
# 分装GET请求参数
param = {'name':'drifter','age':22}
# 设置GET请求参数(Params)
response = requests.get('http://httpbin.org/get',params=param)
print(response.text)
效果如下:
{
"args": {
"age": "22",
"name": "drifter"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.23.0",
"X-Amzn-Trace-Id": "Root=1-5e97b91e-eddb4ff6e19cdd5d2175ccb9"
},
"origin": "123.93.143.40",
"url": "http://httpbin.org/get?name=drifter&age=22"
}
2.1.4、解析json
import requests
response = requests.get('http://httpbin.org/get')
# 获取响应内容的类型
print(type(response.text))
# 如果响应内容是json,就将其转为json
print(response.json())
# 输出的是字典类型
print(type(response.json()))
获取二进制数据
import requests
response = requests.get('http://github.com/favicon.ico')
# str,bytes
print(type(response.text),type(response.content))
# 输出响应的文本内容
print(response.text)
# 输出响应的二进制内容
print(response.content)
# 下载二进制数据到本地
with open('favicon.ico','wb')as f:
f.write(response.content)
f.close()
添加headers(浏览器的伪装):
import requests
# 设置User-Agent浏览器信息
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
}
# 设置请求头信息
response = requests.get('https://www.zhihu.com/explore',headers=headers)
print(response.text)
基本POST请求
import requests
# 设置传入post表单信息
data= {'name':'jyx','age':18}
# 设置请求头信息
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
}
# 设置请求头信息和POST请求参数(data,header)
response = requests.post('http://httpbin.org/post', data=data, headers=headers)
print(response.text)
响应:response属性
import requests
response = requests.get('http://www.jianshu.com/')
# 获取响应状态码
print(type(response.status_code),response.status_code)
# 获取响应头信息类型
print(type(response.headers),response.headers)
# 获取响应头中的cookies
print(type(response.cookies),response.cookies)
# 获取访问的url
print(type(response.url),response.url)
# 获取访问的历史记录
print(type(response.history),response.history)
效果如下:
状态码判断
200:请求成功,请求所希望的响应头或者数据体随此响应返回。
404:Not Found,请求失败,表示请求的资源未被在服务器上发现。整个状态可能是暂时的,也可能是永久的。所以,要做好该状态下的处理。
#coding=utf-8
import requests
def get_status(url):
r = requests.get(url, allow_redirects = False)
return r.status_code
def main():
status=get_status('http://www.baidu.com/')
if status==200:
print "Success"
else:
print "Failed"
if __name__=="__main__":
main();
requests 高级操作:
文件上传:(文件上传肯定使用post操作)
import requests
files = {'file':open('1586509745765.6008.jpg','rb')}
response = requests.post("http://httpbin.org/post",files=files)
print(response.text)
效果如下:
获取cookie:
import requests
response = requests.get("http://www.baidu.com")
print(response.cookies)
for key, value in response.cookies.items():
print(key + '=' + value)
效果如下:
/usr/local/bin/python3.7 /Users/drifter/PyCharm/plum/day-01/jupyter.py
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ=27315
Process finished with exit code 0
会话维持(模拟登陆):
会话不保持情况:(相当于在两个浏览器中)
import requests
requests.get('http://httpbin.org/cookies/set/number/123456789')
response = requests.get('http://httpbin.org/cookies')
print(response.text)
效果如下:
会话保持情况:(相当于在一个浏览器中)
import requests
s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
response =s.get('http://httpbin.org/cookies')
print(response.text)
效果如下:
证书验证:
import requests
response = requests.get('https://www.python.org',)
print(response.status_code)
会提示:
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)>
解决方法:
import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response = requests.get('https://www.12306.cn',verify=False)
print(response.status_code)
或者手动指定一个证书(cert)
import requests
response = requests.get('https://www.12306.cn',cert=('/path/server.crt','path/key'))
print(response.status_code)
效果如下:
代理设置(proxies):
import requests
proxies = {
http: "http://127.0.0.1:9743",
https: "https://127.0.0.1:9743",
}
response = requests.get("https://www.taobao.com",proxies=proxies)
print(response.status_code)
如果代理是有用户名和密码的情况下:
import requests
proxies = {
http: "http://user:password@127.0.0.1:9743",
}
response = requests.get("https://www.taobao.com",proxies=proxies)
print(response.status_code)
socks代理方式 :
pip3 install 'requests[socks]'
import requests
proxies = {
http: "socks5://127.0.0.1:9741",
https: "socks5://127.0.0.1:9741",
}
response = requests.get("https://www.taobao.com",proxies=proxies)
print(response.status_code)
超时设置:
import requests
response = requests.get("https://www.taobao.com",timeout=1)
print(response.status_code)
在规定时间内没有请求到URL,就报出如下错误
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host=‘www.taobao.com’, port=443): Read timed out. (read timeout=0.1)
采用 ReadTimeout 报出超时错误提示,便于异常处理:
import requests
from requests.exceptions import ReadTimeout
try:
response = requests.get("https://www.taobao.com",timeout=0.1)
print(response.status_code)
except ReadTimeout:
print('Timeout')
认证设置(假设一个网站需要登录验证):
auth=HTTPBasicAuth('user','password')
本地的 Jupyter 需要密码登录
import requests
from requests.auth import HTTPBasicAuth
r = requests.get('http://127.0.0.1:8888',auth=HTTPBasicAuth('','96PWF@an.')) #
print(r.status_code)
效果如下:
异常处理:
import requests
from requests.exceptions import ReadTimeout,HTTPError,RequestException,ConnectionError
try:
response = requests.get("https://hhtpbin.org/get",timeout=0.1)
print(response.status_code)
except ReadTimeout:
print('Time Out')
except HTTPError:
print('htpp error')
except RequestException:
print('ERROR')
except ConnectionError: # 关闭网络看 ConnectionError 效果
print('ConnectionError error')
设置timeout大小,多次执行看效果 💪