背景:
我在使用Scrapy+scrapy_redis进行分布式爬虫。
Scrapy=2.11.1
scrapy-redis=0.7.3
问题:运行爬虫的时候碰到报错TypeError: ExecutionEngine.crawl() got an unexpected keyword argument 'spider'。
分析:
这个错误是因为Scrapy的ExecutionEngine.crawl()方法不接受spider作为关键字参数。Scrapy-Redis的schedule_next_requests方法尝试将spider作为关键字参数传递给ExecutionEngine.crawl(),但是最新版本的Scrapy中,ExecutionEngine.crawl()并不接受这个参数。
Release notes — Scrapy 2.11.2 documentation
-
Passing a
spider
argument to thespider_is_idle()
,crawl()
anddownload()
methods ofscrapy.core.engine.ExecutionEngine
, deprecated in Scrapy 2.6, is no longer supported. (issue 5994, issue 5998)
解决:
1、使用scrapy 2.9.0.
2、替换schedule_next_requests 方法
def schedule_next_requests(self): """Schedules a request if available""" # TODO: While there is capacity, schedule a batch of redis requests. for req in self.next_requests(): # self.crawler.engine.crawl(req, spider=self) self.crawler.engine.crawl(req)