【干货】requests的使用方法

最新推荐文章于 2024-02-01 10:06:00 发布

花罚

最新推荐文章于 2024-02-01 10:06:00 发布

阅读量6.9k

点赞数 5

分类专栏： Python 爬虫文章标签： python requests url library

本文链接：https://blog.csdn.net/wuzuodingfeng/article/details/76156777

版权

爬虫同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

Python

4 篇文章 0 订阅

订阅专栏

Requests is an elegant and simple HTTP library for Python, built for human beings.

两个重要的方法：get和post

requests.get()

语法

r = requests.get(url, params={}, headers={}, cookies={}, allow_redirects=True, timeout=float, proxies={}, verify=True)

参数说明

参数名	类型	说明
url	str	请求地址，必填
params	dict	设置参数,字典类型,如：{‘key1’: ‘value1’, ‘key2’: ‘value2’};
headers	dict	设置头部,字典类型,如：{‘user-agent’: ‘my-app/0.0.1’};
cookies	dict	设置cookie,字典类型，如：{“key”: “value”};
allow_redirects	bool	设置重定向,默认开启;
timeout	float	设置请求超时时间(s);
proxies	dict	设置代理,字典类型,如：{“http”: “http://10.10.1.10:8080“};
verify	bool	设置证书验证,默认True,也可以是CA库地址;

注：若verify=False,会有警告,可 import requests.packages.urllib3 requests.packages.urllib3.disable_warnings();

requests.post()

语法

r = requests.post(url, data={}, headers={}, cookies={}, json=”, files={}, allow_redirects=True, timeout=float, proxies={}, verify=True)

参数说明

参数名	类型	说明
url	str	请求地址，必填
data	dict	设置表单数据，字典类型；也可以接收 json.dumps()过后的数据;
headers	dict	设置头部,字典类型,如：{‘user-agent’: ‘my-app/0.0.1’};
cookies	dict	设置cookie,字典类型，如：{“key”: “value”};
json	str	传递json数据,如：{‘key’: ‘value’};
files	dict	上传文件,如：{‘file’: open(‘report.txt’, ‘rb’)}. 注:最好使用二进制打开文件;
allow_redirects	bool	设置重定向,默认开启;
timeout	float	设置请求超时时间(s);
proxies	dict	设置代理,字典类型,如：{“http”: “http://10.10.1.10:8080“};
verify	bool	设置证书验证,默认True,也可以是CA库地址;

请求响应体说明

字段	类型	说明
r.url	str	被编码后的请求url
r.text	unicode	返回处理后的Unicode型数据
r.content	str	返回bytes型的原始数据(二进制)
r.json()	dict	将json数据解码后返回
r.status_code	int	返回响应状态码
r.raise_for_status()		若发送一个错误请求,则抛出此异常
r.headers	dict	服务器响应头部信息
r.cookies	dict	Response中的cookies.
r.history	list	Response对象(请求历史)列表，按最老到最近的请求进行排序
r.encoding	str	r.text输出的编码格式,也就是网页编码
r.apparent_encoding	str	r.content原始数据编码类型
r.elapsed		请求url花费时间
r.request.headers	dict	请求头信息

requests.utils中的常用方法

requests.utils.get_encodings_from_content(r.content): 返回原始数据编码;
requests.utils.dict_from_cookiejar(r.cookies): 将CookieJar转为字典;
requests.utils.cookiejar_from_dict(cookie_dict, cookiejar=None, overwrite=True): 将字典转为CookieJar;

问题与解决方案

开启会话，保持cookie

s = requests.Session()             # 开启会话
cookies = json.loads(result)                     # phantomjs获取的cookies json对象
cookie = {}
for k in cookies:
    cookie[k['name']] = k['value']               # 获取每个cookie中的name和value
s.cookies = requests.utils.cookiejar_from_dict(cookie, cookiejar=None, overwrite=True)       # 将字典cookie转换为cookieJar,然后放在会话中
s.get(url.....)                    # 此时每个请求都会带上cookie
---------
s.cookies: cookiejar对象;
s.cookies.get_dict(): dict,cookie键值对;

设置超时和最大尝试次数

timeout是get/post等的参数, 单位秒.
max_retries需要构建一个HTTPAdapter并设置其max_retries, 最后将该Adaptor加载给requests的Session对象. mount时的链接是前端最大匹配, 使用”http://”和”https://”可以分别对应两大类网址. 也可以更具体针对某网站.

requestsSession = requests.Session()                                # 开启会话
requestsAdapterA = requests.adapters.HTTPAdapter(max_retries=3)     # 挂载适配器
requestsSession.mount('http://', requestsAdapterA)                  # 此会话中适用所有http请求
r = requestsSession.get(url , timeout=20)                           # 打开相应url并设置超时

注：max_retries适用于超时，并不适用于访问出错。
注：在会话中，请求url1所返回的cookies会自动保存，当访问url2的时候也会被自动带入。

上传文件

url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}
# files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}   # 显示的设置文件名、文件类型、文件头
r = requests.post(url, files=files)
r.text

流式上传

with open('xxx.txt') as fp:
    requests.post('http://some.url/api', data=fp)

文件下载

from PIL import Image
from StringIO import StringIO

url = 'http://xxx.jpg'
r = requests.get(url)
i = Image.open(StringIO(r.content))
i.save('local.jpg')

注意事项

get()或post()中的headers、cookies设置的值，将合并到Requests中去，所以传入 {} 也没关系；
图片、pdf等打开方式应该为 ‘wb’，写入的内容应该是 r.content；
若是响应头Content-Type中不含charset，则 r.text 默认为 ‘ISO-8859-1’；
若是timeout没有显示的设置，理论上requests请求永不超时。
在session中删除一个参数，直接设置其值为None；

文档资源

花罚

关注

5
点赞
踩
19

收藏

觉得还不错? 一键收藏
1
评论
【干货】requests的使用方法

Requests is an elegant and simple HTTP library for Python, built for human beings. 两个重要的方法：get和postrequests.get()语法r = requests.get(url, params={}, headers={}, cookies={}, allow_redirects=True, timeou
复制链接

扫一扫