什么是Urllib
内置的一个http请求库,不需要额外的安装,不需要了解底层到底怎么实现。
- urllib.request 请求模块
- urllib.error 异常处理模块
- urllib.parse url解析模块
- urllib.robotparser
import urllib2
response = urllib2.urlopen('http://www.baidu.com')
什么是Requests
基于urllib3,Python实现的简单易用的http请求库
相关用法
- 基本GET请求
import requests
response = requests.get('http://httpbin.org/get')
print response.text
请求结果:
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.19.1"
},
"origin": "223.104.213.74",
"url": "http://httpbin.org/get"
}
- 带参数的GET请求,字典方式传值
import requests
data = {
'name': 'lt',
'age': 18
}
response = requests.get('http://httpbin.org/get', params=data)
print response.text
请求结果:
{
"args": {
"age": "18",
"name": "lt"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.19.1"
},
"origin": "223.104.213.74",
"url": "http://httpbin.org/get?age=18&name=lt"
}
- 解析Json
import requests
data = {
'name': 'lt',
'age': 18
}
response = requests.get('http://httpbin.org/get', params=data)
print response.json()
- 获取二进制数据
import requests
response = requests.get('https://ss1.bdstatic.com/kvoZeXSm1A5BphGlnYG/skin_zoom/178.jpg?2')
with open('e:/aaa.jpg', 'wb') as f:
f.write(response.content)
f.close()
- 添加一个headers,伪装
不加headers,返回400:
import requests
response = requests.get('https://www.zhihu.com/')
print response.status_code
加了之后,返回200:
import requests
headers = {
'user-agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'
}
response = requests.get('https://www.zhihu.com/', headers = headers)
print response.status_code
- POST请求
import requests
data = {
'aaaa' : 'bbbb'
}
response = requests.post('http://httpbin.org/post', data = data)
print response.json()
响应
response属性
- status_code
- headers
- cookies
- url
- history
状态码的判断
import requests
response = requests.post('http://httpbin.org/post')
print response.status_code == 200
高级操作
- 文件上传
import requests
files = {'file': open('e:/aaa.jpg', 'rb')}
response = requests.post('http://httpbin.org/post',files = files)
print response.json()
- 获取cookies
import requests
response = requests.get('http://www.baidu.com')
print response.cookies
for key,value in response.cookies.items():
print(key + ' = ' + value)
- 会话维持(用作登录验证)
如果是:
import requests
requests.get('http://httpbin.org/cookies/set/number/123456')
response = requests.get('http://httpbin.org/cookies')
print response.text
返回
{"cookies":{}}
改为:
import requests
s = requests.session()
s.get('http://httpbin.org/cookies/set/number/123456')
response = s.get('http://httpbin.org/cookies')
print response.text
返回:
{"cookies":{"number":"123456"}}
- 证书验证
证书是不合法的,这种情况下会报
requests.exceptions.SSLError
import requests
response = requests.get('https://www.12306.cn')
print response.status_code
改为
import requests
import urllib3
urllib3.disable_warnings() #消除警告
response = requests.get('https://www.12306.cn', verify=False)
print response.status_code
- 代理设置
import requests
proxies = {
'http':'http://127.0.0.1:8743',
'https':'https://127.0.0.1:9743'
}
response = requests.get('https://www.taobao.com', proxies = proxies)
print response.status_code
- 超时的设置
import requests
try:
requests.get('https://www.taobao.com/', timeout=0.1)
except requests.exceptions.ConnectTimeout:
print 'ConnectTimeout'
except requests.exceptions.Timeout:
print 'Timeout'
- 认证设置
auth属性