详解Python requests 超时和重试的方法-转载

最新推荐文章于 2024-04-30 21:18:40 发布

weixin_30376453

最新推荐文章于 2024-04-30 21:18:40 发布

阅读量299

点赞数

文章标签： python 爬虫

原文链接：http://www.cnblogs.com/tianleblog/p/11496177.html

版权

转自：https://www.jb51.net/article/152963.htm

网络请求不可避免会遇上请求超时的情况，在 requests 中，如果不设置你的程序可能会永远失去响应。

超时又可分为连接超时和读取超时。

连接超时

连接超时指的是在你的客户端实现到远端机器端口的连接时（对应的是 connect() ），Request 等待的秒数。

 
           import 
           time 
          
           import 
           requests 
          
           url  
           = 
           'http://www.google.com.hk' 
          
           print 
           (time.strftime( 
           '%Y-%m-%d %H:%M:%S' 
           )) 
          
           try 
           : 
          
           html  
           = 
           requests.get(url, timeout 
           = 
           5 
           ).text 
          
           print 
           ( 
           'success' 
           ) 
          
           except 
           requests.exceptions.RequestException as e: 
          
           print 
           (e) 
          
           print 
           (time.strftime( 
           '%Y-%m-%d %H:%M:%S' 
           ))

因为 google 被墙了，所以无法连接，错误信息显示 connect timeout（连接超时）。

2018-12-14 14:38:20
HTTPConnectionPool(host='www.google.com.hk', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x00000000047F80F0>, 'Connection to www.google.com.hk timed out. (connect timeout=5)'))
2018-12-14 14:38:25

就算不设置，也会有一个默认的连接超时时间（我测试了下，大概是21秒）。

读取超时

读取超时指的就是客户端等待服务器发送请求的时间。（特定地，它指的是客户端要等待服务器发送字节之间的时间。在 99.9% 的情况下这指的是服务器发送第一个字节之前的时间）。

简单的说，连接超时就是发起请求连接到连接建立之间的最大时长，读取超时就是连接成功开始到服务器返回响应之间等待的最大时长。

读取超时是没有默认值的，如果不设置，程序将一直处于等待状态。我们的爬虫经常卡死又没有任何的报错信息，原因就在这里了。

如果你设置了一个单一的值作为 timeout，如下所示：

1	`r` `=` `requests.get(` `'https://github.com'` `, timeout` `=` `5` `)`

这一 timeout 值将会用作 connect 和 read 二者的 timeout。如果要分别制定，就传入一个元组：

1	`r` `=` `requests.get(` `'https://github.com'` `, timeout` `=` `(` `3.05` `,` `27` `))`

黑板课爬虫闯关的第四关正好网站人为设置了一个15秒的响应等待时间，拿来做说明最好不过了。

 
           import 
           time 
          
           import 
           requests 
          
           url_login  
           = 
           'http://www.heibanke.com/accounts/login/?next=/lesson/crawler_ex03/' 
          
           session  
           = 
           requests.Session() 
          
           session.get(url_login) 
          
           token  
           = 
           session.cookies[ 
           'csrftoken' 
           ] 
          
           session.post(url_login, data 
           = 
           { 
           'csrfmiddlewaretoken' 
           : token,  
           'username' 
           :  
           'guliang21' 
           ,  
           'password' 
           :  
           '123qwe' 
           }) 
          
           print 
           (time.strftime( 
           '%Y-%m-%d %H:%M:%S' 
           )) 
          
           url_pw  
           = 
           'http://www.heibanke.com/lesson/crawler_ex03/pw_list/' 
          
           try 
           : 
          
           html  
           = 
           session.get(url_pw, timeout 
           = 
           ( 
           5 
           ,  
           10 
           )).text 
          
           print 
           ( 
           'success' 
           ) 
          
           except 
           requests.exceptions.RequestException as e: 
          
           print 
           (e) 
          
           print 
           (time.strftime( 
           '%Y-%m-%d %H:%M:%S' 
           ))

错误信息中显示的是 read timeout（读取超时）。

2018-12-14 15:20:47
HTTPConnectionPool(host='www.heibanke.com', port=80): Read timed out. (read timeout=10)
2018-12-14 15:20:57

超时重试

一般超时我们不会立即返回，而会设置一个三次重连的机制。

 
           def 
           gethtml(url): 
          
           i  
           = 
           0 
          
           while 
           i <  
           3 
           : 
          
           try 
           : 
          
           html  
           = 
           requests.get(url, timeout 
           = 
           5 
           ).text 
          
           return 
           html 
          
           except 
           requests.exceptions.RequestException: 
          
           i  
           + 
           = 
           1

其实 requests 已经帮我们封装好了。（但是代码好像变多了…）

 
           import 
           time 
          
           import 
           requests 
          
           from 
           requests.adapters  
           import 
           HTTPAdapter 
          
           s  
           = 
           requests.Session() 
          
           s.mount( 
           'http://' 
           , HTTPAdapter(max_retries 
           = 
           3 
           )) 
          
           s.mount( 
           'https://' 
           , HTTPAdapter(max_retries 
           = 
           3 
           )) 
          
           print 
           (time.strftime( 
           '%Y-%m-%d %H:%M:%S' 
           )) 
          
           try 
           : 
          
           r  
           = 
           s.get( 
           'http://www.google.com.hk' 
           , timeout 
           = 
           5 
           ) 
          
           return 
           r.text 
          
           except 
           requests.exceptions.RequestException as e: 
          
           print 
           (e) 
          
           print 
           (time.strftime( 
           '%Y-%m-%d %H:%M:%S' 
           ))

max_retries 为最大重试次数，重试3次，加上最初的一次请求，一共是4次，所以上述代码运行耗时是20秒而不是15秒

2018-12-14 15:34:03
HTTPConnectionPool(host='www.google.com.hk', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000000013269630>, 'Connection to www.google.com.hk timed out. (connect timeout=5)'))
2018-12-14 15:34:23

以上就是本文的全部内容，希望对大家的学习有所帮助。

转载于:https://www.cnblogs.com/tianleblog/p/11496177.html

weixin_30376453

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
详解Python requests 超时和重试的方法-转载

转自：https://www.jb51.net/article/152963.htm网络请求不可避免会遇上请求超时的情况，在 requests 中，如果不设置你的程序可能会永远失去响应。超时又可分为连接超时和读取超时。连接超时连接超时指的是在你的客户端实现到远端机器端口的连接时（对应的是 connect() ），Request 等待的秒数。?...
复制链接

扫一扫