内容大纲
1. 什么是 Requests ?
- Requests 是一个常用的用于HTTP请求的第三方模块,其实在Python内置的urllib 基础之上进一步封装编写的。
- Requests的使用它会比 urllib 更加方便,可以大大提高我们的开发效率,建议爬虫开发使用 Requests 库。
- Requests 库安装:
pip install requests
- 官方主页:http://python-requests.org/
- 中文文档:https://2.python-requests.org//zh_CN/latest/index.html
2. 发起 GET 请求
2.1 基本使用
- 使用Requests创建GET请求,有两种方法:一种是直接使用get方法,一种是使用request方法并设置请求方法为get。
import requests
# 方法一
response1 = requests.get("http://www.baidu.com/")
print(response1.text)
# 方法二
response2 = requests.request("get", "http://www.baidu.com/")
print(response2.text)
注意:
(1)使用response.text 时,Requests 会基于 HTTP 响应的文本编码自动解码响应内容,大多数 Unicode 字符集都能被无缝地解码。
(2)使用response.content 时,返回的是服务器响应数据的原始二进制字节流,可以用来保存图片等二进制文件。
- 如果我们想要在URL中传递数据给服务器,即携带查询字符串参数(Query String Parameters),有两种方式可以实现:① 直接构造携带参数的URL,如
httpbin.org/get?key1=val1&key2=val2
; ② Requests模 块允许的get方法允许我们使用params关键字参数传递,以一个字典来传递这些参数。
import requests
query_string_parameters = {
"username":"admin",
"password":"123456"
}
response = requests.get("http://httpbin.org/get",
params=query_string_parameters)
print(response.url)
print(response.text)
2.2 设置请求头
2.3 获取Cookies
- 如果一个响应中设置了 cookies,那么我们可以利用使用响应对象的 cookies 属性拿到
import requests
response = requests.get("http://www.baidu.com")
# CookieJar对象:
cookiejar = response.cookies
print(cookiejar)
# 遍历
for key,value in cookiejar.items():
print(key + ":" + value)
# 将CookieJar转为字典:
cookiedict = requests.utils.dict_from_cookiejar(cookiejar)
print(cookiedict)
2.4 会话维持
- 在 requests 里,session对象是一个非常常用的对象,这个对象代表一次用户会话:从客户端浏览器连接服务器开始,到客户端浏览器与服务器断开。
- 会话能让我们在跨请求时候保持某些参数,如同一个 Session 对象发出的所有请求之间保持 cookie 。
import requests
sess = requests.Session()
sess.get("http://httpbin.org/cookies/set/number/123456")
response = sess.get("http://httpbin.org/cookies")
print(response.text)
2.5 证书验证
- Requests 默认情况下,启用SSL验证,如果无法验证SSL证书,将会引发SSLError
- 如果想要避免报SSLError错误,可以不进行证书验证,通过设置
verify=False
import requests
response = requests.get("https://www.12306.cn", verify=False)
print(response.status_code)
# print(response.text)
- 运行结果如下,会报警告
D:\Program Files\Python37\lib\site-packages\urllib3\connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
D:\Program Files\Python37\lib\site-packages\urllib3\connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
- 如果想要避免报警告,通过加入如下代码
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
# 禁用安全请求警告
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
response = requests.get("https://www.12306.cn", verify=False)
print(response.status_code)
# print(response.text)
3. 发起 POST 请求
- 使用post方法,并设置data参数
import requests
if __name__ == '__main__':
# POST 请求
resp = requests.post("http://httpbin.org/post", data={'key': 'value'})
print(resp)
print(resp.url)
print(resp.status_code) # 获取HTTP响应状态码
print(resp.text) # 获取解码后的HTTP响应内容(字符串方式),会自动动为你解码 gzip 和 deflate 压缩
print(resp.content) # 获取HTTP响应内容(字节方式)
print(resp.raw) # 获取HTTP原始响应内容,也就是 urllib 的 response 对象,使用 r.raw.read() 读取
print(resp.headers) # 获取HTTP响应头,以字典对象存储,但是这个字典比较特殊,字典键不区分大小写,若键不存在则返回None
4. 发起其他请求
import requests
if __name__ == '__main__':
# DELETE 请求
resp3 = requests.delete('http://httpbin.org/delete')
print(resp3)
# HEAD 请求
resp4 = requests.head('http://httpbin.org/get')
print(resp4)
# OPTIONS 请求
resp5 = requests.options('http://httpbin.org/get')
print(resp5)
5. 异常处理 ?
- Requests 提供的异常类都在requests.exceptions 中, 源码见 http://cn.python-requests.org/zh_CN/latest/_modules/requests/exceptions.html#RequestException
- 从如下源码中可以看出异常类的继承关系:
(1)RequestException 继承自IOError
(2)HTTPError、ConnectionError、Timeout 继承自 RequestionException
(3)ProxyError、SSLError 继承自 ConnectionError
(4) ReadTimeout继承Timeout异常
这里列举了一些常用的异常继承关系,详细的可以看:
# -*- coding: utf-8 -*-
"""
requests.exceptions
~~~~~~~~~~~~~~~~~~~
This module contains the set of Requests' exceptions.
"""
from urllib3.exceptions import HTTPError as BaseHTTPError
class RequestException(IOError):
"""There was an ambiguous exception that occurred while handling your
request.
"""
def __init__(self, *args, **kwargs):
"""Initialize RequestException with `request` and `response` objects."""
response = kwargs.pop('response', None)
self.response = response
self.request = kwargs.pop('request', None)
if (response is not None and not self.request and
hasattr(response, 'request')):
self.request = self.response.request
super(RequestException, self).__init__(*args, **kwargs)
class HTTPError(RequestException):
"""An HTTP error occurred."""
class ConnectionError(RequestException):
"""A Connection error occurred."""
class ProxyError(ConnectionError):
"""A proxy error occurred."""
class SSLError(ConnectionError):
"""An SSL error occurred."""
class Timeout(RequestException):
"""The request timed out.
Catching this error will catch both
:exc:`~requests.exceptions.ConnectTimeout` and
:exc:`~requests.exceptions.ReadTimeout` errors.
"""
class ConnectTimeout(ConnectionError, Timeout):
"""The request timed out while trying to connect to the remote server.
Requests that produced this error are safe to retry.
"""
class ReadTimeout(Timeout):
"""The server did not send any data in the allotted amount of time."""
class URLRequired(RequestException):
"""A valid URL is required to make a request."""
class TooManyRedirects(RequestException):
"""Too many redirects."""
class MissingSchema(RequestException, ValueError):
"""The URL schema (e.g. http or https) is missing."""
class InvalidSchema(RequestException, ValueError):
"""See defaults.py for valid schemas."""
class InvalidURL(RequestException, ValueError):
"""The URL provided was somehow invalid."""
class InvalidHeader(RequestException, ValueError):
"""The header value provided was somehow invalid."""
class ChunkedEncodingError(RequestException):
"""The server declared chunked encoding but sent an invalid chunk."""
class ContentDecodingError(RequestException, BaseHTTPError):
"""Failed to decode response content"""
class StreamConsumedError(RequestException, TypeError):
"""The content for this response was already consumed"""
class RetryError(RequestException):
"""Custom retries logic failed"""
class UnrewindableBodyError(RequestException):
"""Requests encountered an error when trying to rewind a body"""
# Warnings
class RequestsWarning(Warning):
"""Base warning for Requests."""
pass
class FileModeWarning(RequestsWarning, DeprecationWarning):
"""A file was opened in text mode, but Requests determined its binary length."""
pass
class RequestsDependencyWarning(RequestsWarning):
"""An imported dependency doesn't match the expected version range."""
pass
- 使用示例
import requests
from requests.exceptions import ReadTimeout
from requests.exceptions import ConnectionError
from requests.exceptions import RequestException
try:
response = requests.get("http://httpbin.org/get", timout=0.1)
print(response.status_code)
except ReadTimeout:
print("timeout")
except ConnectionError:
print("connection Error")
except RequestException:
print("error")
raise_for_status()
: 如果HTTP响应状态码不是200,就主动抛出异常
import requests
if __name__ == '__main__':
try:
resp = requests.get('http://httpbin.org/status/404')
resp.raise_for_status() # 如果HTTP响应状态码不是 200,就主动抛出异常
except requests.RequestException as e:
print(e)
else:
print(resp)
- 运行结果如下: