爬虫基础13(框架Scrapy自定制启动爬虫命令)

最新推荐文章于 2022-04-06 08:26:18 发布

weixin_30284355

最新推荐文章于 2022-04-06 08:26:18 发布

阅读量175

点赞数

文章标签：爬虫 python

原文链接：http://www.cnblogs.com/L5251/articles/9276393.html

版权

框架Scrapy自定制启动爬虫命令

1、在spiders同级创建任意目录，如：commands

2、在其中创建 crawlall.py 文件（此处文件名就是自定义的命令）

from scrapy.commands import ScrapyCommand
    from scrapy.utils.project import get_project_settings


    class Command(ScrapyCommand):

        requires_project = True

        def syntax(self):
            return '[options]'

        def short_desc(self):
            return 'Runs all of the spiders'

        def run(self, args, opts):
            spider_list = self.crawler_process.spiders.list()
            for name in spider_list:
                self.crawler_process.crawl(name, **opts.__dict__)
            self.crawler_process.start()

crawlall.py

3、在settings.py 中添加配置 COMMANDS_MODULE = '项目名称.目录名称'

4、在settings.py 中添加配置 COMMANDS_MODULE = '项目名称.目录名称'

　　启动多个或者单个爬虫（建立start.py文件）

import sys
from scrapy.cmdline import execute

if __name__ == '__main__':
    execute(["scrapy","github","--nolog"])
    # 启动所有爬虫
    # execute(["scrapy","crawlall","--nolog"])