Locally run all of the spiders in Scrapy

最新推荐文章于 2018-12-27 20:59:24 发布

woshizoe

最新推荐文章于 2018-12-27 20:59:24 发布

阅读量1k

点赞数

分类专栏： Python

Python 专栏收录该内容

42 篇文章 1 订阅

订阅专栏

http://stackoverflow.com/questions/15564844/locally-run-all-of-the-spiders-in-scrapy

7 down vote accepted

Here is an example that does not run inside a custom command, but runs the Reactor manually and creates a new Crawler for each spider:

from twisted.internet import reactor
from scrapy.crawler import Crawler
# scrapy.conf.settings singlton was deprecated last year
from scrapy.utils.project import get_project_settings
from scrapy import log

def setup_crawler(spider_name):
    crawler = Crawler(settings)
    crawler.configure()
    spider = crawler.spiders.create(spider_name)
    crawler.crawl(spider)
    crawler.start()

log.start()
settings = get_project_settings()
crawler = Crawler(settings)
crawler.configure()

for spider_name in crawler.spiders.list():
    setup_crawler(spider_name)

reactor.run()

You will have to design some signal system to stop the reactor when all spiders are finished.

EDIT: And here is how you can run multiple spiders in a custom command:

from scrapy.command import ScrapyCommand
from scrapy.utils.project import get_project_settings
from scrapy.crawler import Crawler

class Command(ScrapyCommand):

    requires_project = True

    def syntax(self):
        return '[options]'

    def short_desc(self):
        return 'Runs all of the spiders'

    def run(self, args, opts):
        settings = get_project_settings()

        for spider_name in self.crawler.spiders.list():
            crawler = Crawler(settings)
            crawler.configure()
            spider = crawler.spiders.create(spider_name)
            crawler.crawl(spider)
            crawler.start()

        self.crawler.start()

edited Mar

woshizoe

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Locally run all of the spiders in Scrapy

http://stackoverflow.com/questions/15564844/locally-run-all-of-the-spiders-in-scrapy7down voteacceptedHere is an example that does not run inside a custom command, but runs
复制链接

扫一扫

专栏目录