目录
前言
前面我们聊到python的内置模块urllib模块,用于访问网络资源。但是,它用起来比较麻烦,缺少很多实用的高级功能。于是有了更好的第三方库requests,处理URL十分方便。
一、安装requests
如果你安装了Anaconda,requests就已经可以直接使用了。你也可以用下面命令安装:
pip install requests
如果遇到Permission denied安装失败,请加上sudo重试。
二、使用requests
要通过GET访问一个页面:
import requests
r = requests.get('https://www.baidu.com/') #百度首页
r.status_code
200
r.text
'<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>ç\x99¾åº¦ä¸\x80ä¸\x8bï¼\x8cä½\xa0å°±ç\x9f¥é\x81\x93</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div
各种请求方式:
import requests
requests.get('http://httpbin.org/get')
requests.post('http://httpbin.org/post')
requests.put('http://httpbin.org/put')
requests.delete('http://httpbin.org/delete')
requests.head('http://httpbin.org/get')
requests.options('http://httpbin.org/get')
带参数的两种方式:
# 第一种直接将参数放在url内
import requests
response = requests.get(http://httpbin.org/get?name=gemey&age=22)
print(response.text)
# =========================================================
# 另一种先将参数填写在dict中,发起请求时params参数指定为dict
import requests
data = {
'name': 'tom',
'age': 20
}
response = requests.get('http://httpbin.org/get', params=data)
print(response.text)
解析json
import requests
response = requests.get('http://httpbin.org/get')
print(response.text)
print(response.json()) #response.json()方法同json.loads(response.text)
print(type(response.json()))
简单保存一个二进制文件
import requests
response = requests.get('http://img.ivsky.com/img/tupian/pre/201708/30/kekeersitao-002.jpg')
b = response.content
with open('F://fengjing.jpg','wb') as f:
f.write(b)
证书验证设置
import requests
from requests.packages import urllib3
urllib3.disable_warnings() #从urllib3中消除警告
response = requests.get('https://www.12306.cn',verify=False) #证书验证设为FALSE
print(response.status_code)打印结果:200
异常处理
在你不确定会发生什么错误时,尽量使用try...except来捕获异常
所有的requests exception:
import requests
from requests.exceptions import ReadTimeout,HTTPError,RequestException
try:
response = requests.get('http://www.baidu.com',timeout=0.5)
print(response.status_code)
except ReadTimeout:
print('timeout')
except HTTPError:
print('httperror')
except RequestException:
print('reqerror')
requests库的内容还有很多,也很灵活多变,对于想做爬虫工程师的小伙伴可以把这个库掌握了: