scrapy 通过 CrawlerProcess 来同时运行多个爬虫

最新推荐文章于 2024-06-27 08:37:41 发布

辉辉咯

最新推荐文章于 2024-06-27 08:37:41 发布

阅读量6.2k

点赞数 2

分类专栏： scrapy框架

本文链接：https://blog.csdn.net/qq_41020281/article/details/82780382

版权

scrapy框架专栏收录该内容

13 篇文章 2 订阅

订阅专栏

直接上例子代码：

# coding: utf8
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from werkzeug.utils import import_string, find_modules


scope = 'all'
process = CrawlerProcess(settings=get_project_settings())

for module_string in find_modules('CheckSpider.spiders'):
    module = import_string(module_string)
    class_string = module_string.split('.')[-1].capitalize() + 'Spider'
    spider_class = getattr(module, class_string)
    process.crawl(spider_class, scope)

process.start()

这是我在工作中的一个用例，总共有十个爬虫，同时启动十个爬虫。

利用werkzeug 库来实现批量导入所对应的spidercls（爬虫对应的类），初始化CrawlerProcess需要将setting对象传入，通过get_project_settings获取setting配置。

欢迎关注公众号：日常bug，每天写至少一篇技术文章，每天进步一点点。

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

辉辉咯

关注关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
scrapy 通过 CrawlerProcess 来同时运行多个爬虫

直接上例子代码：# coding: utf8from scrapy.crawler import CrawlerProcessfrom scrapy.utils.project import get_project_settingsfrom werkzeug.utils import import_string, find_modulesscope = 'all'process...
复制链接

扫一扫