我们在使用scrapy时多次采集数据,会遇到reactor already installed'这个报错(如果使用
from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings
......
process = CrawlerProcess(get_project_settings())
process.crawl('xxx')
process.start()
结局方法有两个:
一、根据官方文档Common Practices — Scrapy 2.6.1 documentation修改
from twisted.internet import reactor import scrapy from scrapy.crawler import CrawlerRunner from scrapy.utils.log import configure_logging class MySpider(scrapy.Spider): # Your spider definition ... configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'}) runner = CrawlerRunner() d = runner.crawl(MySpider) d.addBoth(lambda _: reactor.stop()) reactor.run() # the script will block here until the crawling is finished
二、我们在process.start()前将其删除
import sys
if "twisted.internet.reactor" in sys.modules: del sys.modules["twisted.internet.reactor"]
process.start()
这样就可以完美的解决这个问题了