requests库进阶用法——timeouts, retries, hooks

requests库进阶用法——timeouts, retries, hooks

Request hooks

判断判断网络请求的状态是否是4XX/5XX,如果是则产生一个断言

response = requests.get('https://api.github.com/user/repos?page=1')
# Assert that there were no errors
response.raise_for_status()

如果有多个请求需要断言,可以实现requests提供的hooks接口确保从同一session对象发出的每个请求都被检查。

# Create a custom requests object, modifying the global module throws an error
http = requests.Session()

assert_status_hook = lambda response, *args, **kwargs: response.raise_for_status()
http.hooks["response"] = [assert_status_hook]

http.get("https://api.github.com/user/repos?page=1")

> HTTPError: 401 Client Error: Unauthorized for url: https://api.github.com/user/repos?page=1

设置base URLs

requests中可以用两种方法指定URL:

  1. 每次调用使用全部的URL地址

    requests.get('https://api.org/list/')
    requests.get('https://api.org/list/3/item')
    
  2. 安装requests_toolbelt库,使用BaseUrlSession指定base_url

    from requests_toolbelt import sessions
    http = sessions.BaseUrlSession(base_url="https://api.org")
    http.get("/list")
    http.get("/list/item")
    

设置默认timeout值

如果你的python程序是同步的,忘记设置请求的默认timeout可能会导致你的程序卡在发送网络请求上。

timeout的设定同样有两种方法:

  1. 每次都在get语句中指出timeout的值:

    requests.get('https://github.com/', timeout=0.001)
    
  2. 使用Transport Adapters设置统一的timeout时间:

    from requests.adapters import HTTPAdapter
    
    DEFAULT_TIMEOUT = 5 # seconds
    
    class TimeoutHTTPAdapter(HTTPAdapter):
        def __init__(self, *args, **kwargs):
            self.timeout = DEFAULT_TIMEOUT
            if "timeout" in kwargs:
                self.timeout = kwargs["timeout"]
                del kwargs["timeout"]
            super().__init__(*args, **kwargs)
    
        def send(self, request, **kwargs):
            timeout = kwargs.get("timeout")
            if timeout is None:
                kwargs["timeout"] = self.timeout
            return super().send(request, **kwargs)
    
    import requests
    
    http = requests.Session()
    
    # Mount it for both http and https usage
    adapter = TimeoutHTTPAdapter(timeout=2.5)
    http.mount("https://", adapter)
    http.mount("http://", adapter)
    
    # Use the default 2.5s timeout
    response = http.get("https://api.twilio.com/")
    
    # Override the timeout as usual for specific requests
    response = http.get("https://api.twilio.com/", timeout=10)
    

失败时重试

具有鲁棒性的程序应当考虑连接失败并具有重试策略。我们依然可以通过实现HTTPAdapter来自定义自己的重试策略。

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

retry_strategy = Retry(
    total=3,
    status_forcelist=[429, 500, 502, 503, 504],
    method_whitelist=["HEAD", "GET", "OPTIONS"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
http = requests.Session()
http.mount("https://", adapter)
http.mount("http://", adapter)

response = http.get("https://en.wikipedia.org/w/api.php")

其他参数:

  • 最大重试次数total=10
  • 引起重试的HTTP状态码status_forcelist=[413, 429, 503]
  • 允许重试的请求方法method_whitelist=["HEAD", "GET", "PUT", "DELETE", "OPTIONS", "TRACE"]
  • 两次重试的间隔参数backoff_factor=0

合并超时与重试

综合上面学到的,我们可以通过这种方法将超时与充实结合到同一个Adapter中

retries = Retry(total=3, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504])
http.mount("https://", TimeoutHTTPAdapter(max_retries=retries))

调试HTTP请求

如果一个HTTP请求失败了,可以用下面两种方法获取失败的信息:

  • 使用内置的调试日志
  • 使用request hooks

打印HTTP headers

将logging debug level设置为大于0的值都会将HTTP请求的头部打印在日志中。当返回体过大或为字节流不便于日志时,打印头部将是十分便利有用的。

import requests
import http

http.client.HTTPConnection.debuglevel = 1

requests.get("https://www.google.com/")

# Output
send: b'GET / HTTP/1.1\r\nHost: www.google.com\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Fri, 28 Feb 2020 12:13:26 GMT
header: Expires: -1
header: Cache-Control: private, max-age=0

打印所有HTTP内容

当API返回内容不太大时,我们可以使用request hooks与requests_toolbelt的dump工具库输出所有HTTP请求相应内容。

import requests
from requests_toolbelt.utils import dump

def logging_hook(response, *args, **kwargs):
    data = dump.dump_all(response)
    print(data.decode('utf-8'))

http = requests.Session()
http.hooks["response"] = [logging_hook]

http.get("https://api.openaq.org/v1/cities", params={"country": "BA"})

# Output
< GET /v1/cities?country=BA HTTP/1.1
< Host: api.openaq.org

> HTTP/1.1 200 OK
> Content-Type: application/json; charset=utf-8
> Transfer-Encoding: chunked
> Connection: keep-alive
>
{
   "meta":{
      "name":"openaq-api",
      "license":"CC BY 4.0",
      "website":"https://docs.openaq.org/",
      "page":1,
      "limit":100,
      "found":1
   },
   "results":[
      {
         "country":"BA",
         "name":"Goražde",
         "city":"Goražde",
         "count":70797,
         "locations":1
      }
   ]
}

dump工具的用法:https://toolbelt.readthedocs.io/en/latest/dumputils.html

测试与mocking

测试第三方API有时不能一直发送真实的请求(比如按次收费的接口,还有没开发完的=_=),测试中我们可以用getsentry/responses作为桩模块拦截程序发出的请求并返回预定的数据,造成返回成功的假象。

class TestAPI(unittest.TestCase):
    @responses.activate  # intercept HTTP calls within this method
    def test_simple(self):
        response_data = {
                "id": "ch_1GH8so2eZvKYlo2CSMeAfRqt",
                "object": "charge",
                "customer": {"id": "cu_1GGwoc2eZvKYlo2CL2m31GRn", "object": "customer"},
            }
        # mock the Stripe API
        responses.add(
            responses.GET,
            "https://api.stripe.com/v1/charges",
            json=response_data,
        )

        response = requests.get("https://api.stripe.com/v1/charges")
        self.assertEqual(response.json(), response_data)

一旦拦截成立就不能再向其他未设定过的URL发请求了不然会报错。

模仿浏览器行为

有些网页会根据不同浏览器发送不同HTML代码(为了反爬或适配设备),可以在发送请求时指定User-Agent将自己伪装成特定浏览器。

import requests
http = requests.Session()
http.headers.update({
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0"
})

本文按个人理解翻译自:Advanced usage of Python requests - timeouts, retries, hooks

水平有限敬请见谅QAQ

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值