Python网络爬虫案例实战：静态网页爬取：设置超时

最新推荐文章于 2024-09-14 19:55:48 发布

andyyah晓波

最新推荐文章于 2024-09-14 19:55:48 发布

阅读量124

点赞数 3

分类专栏： Python网络爬虫案例实战文章标签： python 爬虫开发语言

本文链接：https://blog.csdn.net/andyyah/article/details/141099110

版权

Python网络爬虫案例实战专栏收录该内容

31 篇文章 0 订阅

订阅专栏

Python网络爬虫案例实战：静态网页爬取：设置超时

有时爬虫会遇到服务器长时间不返回，这时爬虫程序就会一直等待，造成爬虫程序没能顺利地执行。因此，可以用 Requests在 timeout参数设定的秒数结束之后停止等待响应。也就是说，如果服务器在timeout 秒内没有应答，就返回异常。

把这个秒数设置为0.001秒，看看会抛出什么异常，这是为了让大家体验timeout异常的效果而设置的值，一般会把这个值设置为20秒。

>>> requests.get('http://github.com',timeout = 0.001)

Traceback (most recent call last):
  File "D:\Python37\lib\site-packages\requests\adapters.py", line 497, in send
    chunked=chunked,
  File "D:\Python37\lib\site-packages\urllib3\connectionpool.py", line 846, in urlopen
    method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "D:\Python37\lib\site-packages\urllib3\util\retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='github.com', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000017A29F6E048>, 'Connection to github.com timed out. (connect timeout=0.001)'))

During handling of the above exception, another exception occurred: