在scrapy我们可以设置一些参数,如DOWNLOAD_TIMEOUT,一般我会设置为10,意思是请求下载时间最大是10秒,文档介绍
如果下载超时会抛出一个错误,比如说
def start_requests(self):
yield scrapy.Request('https://www.baidu.com/', meta={
'download_timeout': 0.1})
日志设为DEBUG级别,重试设为3次,运行之后的日志
2019-05-23 19:38:01 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.baidu.com/> (failed 1 times): User timeout caused connection failure: Getting https://www.baidu.com/ took longer than 0.1 seconds..
2019-05-23 19:38:01 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.baidu.com/> (failed 2 times): User t