三、Requests库详解
1.Requests
Requests是用Python语言编写,基于urllib,采用Apache2 Licensed开源协议的HTTP库。
它比urllib更加方便,可以节约我们大量的工作,完全满足HTTP测试需求。
一句话——Requests是Python实现的简单易用的HTTP库。
2.安装Requests
pip install requests
3.requests
1.实例引入
import requests
response = requests.get('https://www.baidu.com/')
print(type(response))
print(response.status_code)
print(type(response.text))
print(response.text)
print(response.cookies)
2.各种请求方式
[验证网站]http://httpbin.org
import requests
requests.post('http://httpbin.org/post')
requests.put('http://httpbin.org/put')
requests.delete('http://httpbin.org/delete')
requests.head('http://httpbin.org/get')
requests.options('http://httpbin.org/get')
4.请求
1.基本GET请求
import requests
response = requests.get('http://httpbin.org/get')
print(response.text)
2.带参数的GET请求
方法一
import requests
response = requests.get("http://httpbin.org/get?name=germey&age=22")
print(response.text)
方法二
import requests
data = {
"age":"22",
"name":"germey",
}
response = requests.get("http://httpbin.org/get",params=data)
print(response.text)
3.解析json
import requests
import json
response = requests.get("http://httpbin.org/get")
print(type(response.text))
print(response.json())
print(json.loads(response.text))
print(tpye(response.json()))
4.获取二进制数据
是下载视频、图片的时候常用的方法
import requests
response = requests.get("https://github.com/favicon.ico")
print(type(response.text),type(response.content))
print(response.text)
print(response.content)
import requests
response = requests.get("https://githua.com/favicon.ico")
with open('favicon.ico','wb') as f:
f.write(response.content)
f.close()
5.添加headers
import requests
response = requests.get("https://www.zhihu.com/expiore")
print(response.text)
比如直接爬取知乎,如果不设置headers就会报错500,需要设置headers参数
import requests
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36',
}
response = requests.get("https://www.zhihu.com/expiore",headers=headers)
print(response.text)
6.基本POST请求
直接加一个字典就可以实现一个POST的操作
import requests
data = {'name':'germey','age':'22'}
response = requests.post("http://httpbin.org/post",data=data)
print(response.text)
配置一个headers
import requests
data = {'name':'germey','age':'22'}
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36',
}
response = requests.post("http://httpbin.org/post",data=data)
print(response.json())
5.响应
1.response属性
响应的常用属性
import requests
response = requests.get("http://www.jianshu.com")
print(type(response.status_code),response.status_code)
print(type(response.headers),response.headers)
print(type(response.cookies),response.cookies)
print(type(response.url),response.url)
print(type(response.history),response.history)
2.状态码判断
import requests
response = requests.get("http://www.jianshu.com")
exit() if not response.status_code == requests.codes.ok else print('Request Successfully')
import requests
response = requests.get("http://www.jianshu.com")
exit() if not response.status_code == 200 else print('Request Successfully')
200 可以改写成HTTP请求的任意状态码,Request Successfully改写成状态码对应的状态
6.高级操作
1.文件上传
import requests
files = {'file':open('favicon.ico','rb')}
response = requests.post("http://httpbin.org/post",files=files)
print(response.text)
2.cookie
import requests
response = requests.get("https://www.baidu.com")
print(response.cookies)
for key,value in response.cookies.items():
print(key + '=' + value)
3.会话维持
cookies用来做会话维持的,有了cookies就可以维持登录状态,会话维持是用来做模拟登录的
import requests
requests.get("http://httpbin.org/cookies/set/number/123456789")
response = requests.get("http://httpbin.org/cookies")
print(response.text)
获取到的结果是{“cookies”:{}}
import requests
s = requests.Session()
s.get("http://httpbin.org/cookies/set/number/123456789")
response = s.get("http://httpbin.org/cookies")
print(response.text)
4.证书验证
import requests
response = requests.get("https://www.12306.cn")
print(response.status_code)
用requests请求在爬取https的网站的时候,它首先会检测证书是否合法,如果证书不合法则会抛出一个SSLError的错误,程序就会中断
import requests
response = requests.get("https://www.12306.cn",verify=False)
print(response.status_code)
加上一个参数verify=False就可以解决上述问题,默认为True,但是仍然会有警告,警告内容为提示用户验证证书
import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response = requests.get("https://www.12306.cn",verify=False)
print(response.status_code)
加上urllib3这两句即可消除警告
import requests
response = requests.get("https://www.12306.cn",cert=('/path/server.crt','/path/key'))
print(response.status_code)
通过cert参数手动指定证书
5.代理设置
直接把代理的ip地址用字典形式写出来(代理没有密码)
import requests
proxies = {
"http":"http://127.0.0.1:port",
"https":"https://127.0.0.1:port",
}
response = requests.get("https://www.taobao.com",proxies=proxies)
print(response.status_code)
变量写代理的ip地址以及user和passwd(代理有密码)
import requests
proxies = {
"http":"http://user:password@127.0.0.1:port/",
}
response = requests.get("https://www.taobao.com",proxies=proxies)
print(response.status_code)
如果不支持http或者https的代理还可以用socks代理
pip install ‘requests[socks]’
import requests
proxies = {
"http":"socks5://127.0.0.1:port",
"https":"socks5://127.0.0.1:port",
}
response = requests.get("https://www.taobao.com",proxies=proxies)
print(response.status_code)
6.超时设置
加参数timeout 即可 单位是s
import requests
response = requests.get("http://httpbin.org/get",timeout = 1)
print(response.status_code)
异常处理
import requests
from requests.exceptions import ReadTimeout
try:
response = requests.get("http://httpbin.org/get",timeout = 1)
print(response.status_code)
except ReadTimeout:
print("Timeout")
7.认证设置
import requests
from requests.auth import HTTPBasicAuth
r = requests.get("http://120.27.34.24:9001",auth=HTTPBasicAuth('user','123'))
print(r.status_code)
import requests
r = requests.get("http://120.27.34.24:9001",auth=('user','123'))
print(r.status_code)
8.异常处理
import requests
from requests.exceptions import ReadTimeout,ConnectionError,RequestException
try:
response = requests.get("http://httpbin.org/get",timeout = 0.5)
print(response.status_code)
except ReadTimeout:
print("Timeout")
except ConnocetionError:
print("Connocetion Error")
except RequestException:
print("Error")