爬虫repo地址:https://github.com/Karmenzind/EasyGoSpider
此处需求为:
- 返回json中带有
{"code": 0}
时,将此请求加入重试队列 - 假如json中含有cookie被禁信息,对cookie列表进行修正
源码注释中有一句:
Failed pages are collected on the scraping process and rescheduled at the end, once the spider has finished crawling all regular (non failed) pages.
继而根据Scrapy doc对通用Download Middleware中process_response的介绍:
If it returns a Request object, the middleware ch