Request
使用request库比urllib更加的方便
response.context 和 response.text的区别:
response.context返回bytes,未解码
response.text返回str,由requests 解码,解码结果可能为乱码,因此我们通常需要使用response.context.decode(‘utf-8’)选择合适的编码方式来手动解码
- request发送Get请求:
# -*- coding: UTF-8 -*-
import requests
url = 'https://www.baidu.com/s'
# print(type(requests.get(url=url).content)) # bytes
# print(requests.get(url=url).content)
# print(type(requests.get(url=url).text)) # str
# print(requests.get(url=url).text)
params = {
'wd': '中国'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit'
'/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
}
response = requests.get(url, params=params, headers=headers)
with open('baidu.html', 'w', encoding='utf-8') as f:
f.write(response.content.decode('utf-8'))
print(response.url)
- request发送Post请求:
使用requests发送post请求很简单,直接调用request.post方法就可以了,如果返回时json数据,那么可以使用response.json()的方法,直接将json字符串转换为字典
# -*- coding: UTF-8 -*-
import requests
url = 'https://www.lagou.com/jobs/companyAjax.json?needAddtionalResult=false'
data = {
'first': 'true',
'pn': 1,
'kd': 'python'
}
headers = {
'Referer': 'https://www.lagou.com/zhaopin/Python/?labelWords=label',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit'
'/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
}
response = requests.post(url, data=data, headers=headers)
print(type(response.json()))
print(response.json())
- requests使用代理
通过proxies这个关键字参数,既可以简单的使用代理
# -*- coding: UTF-8 -*-
import requests
url = 'http://httpbin.org/ip'
proxy = {
'http': '1.198.72.156:9999'
}
response = requests.get(url, proxies=proxy)
print(response.text)
- requests处理cookie信息
如果想要在多次请求中共享cookie,那么应该使用session。
示例如下:
通过第一次的post进行登陆,后续需要登陆才能访问的页面就可以共享登陆得到的cookie
# -*- coding: UTF-8 -*-
import requests
url = 'xxx.xx.xx'
data = {
'user': 'xxx',
'password': 'xxx'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit'
'/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
}
session = requests.session()
session.post(url, data=data, headers=headers)
response = session.get('xxx.xxx.xxx')
- requests处理不信任的SSL证书
对于那些不信任的ssl网站,需要加上verify=false这个参数
示例:
resp = requests.get('url', verify=False)