scrapy多个爬虫同时运行

飞锡2024

已于 2022-02-19 14:48:25 修改

阅读量485

点赞数

分类专栏：爬虫文章标签：爬虫 python

于 2021-08-16 15:51:02 首次发布

本文链接：https://blog.csdn.net/weixin_38235865/article/details/119734595

版权

爬虫专栏收录该内容

14 篇文章 0 订阅

订阅专栏

运行爬虫

import  datetime as dt
#同时爬取
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
file_name_A="爬虫A"+dt.datetime.now().strftime('%Y-%m-%d') +".json"
file_name_B="爬虫B"+dt.datetime.now().strftime('%Y-%m-%d') +".json"

settings = get_project_settings()
crawler = CrawlerProcess(settings)
#参数 输出结果文件
spargs_A ={'-o':file_name_A}
spargs_B ={'-o':file_name_B}
crawler.crawl('爬虫A',spargs_A )
crawler.crawl('爬虫B',spargs_B )

crawler.start()
crawler.start()

爬虫运行参数
在这里插入图片描述

不同爬虫设置不同的pipeline

方法1

class CrawlersPipeline:
    def process_item(self, item, spider):
        if spider.name in ['爬虫A','爬虫B'] :
            return item

方法2 用custom_setting配置。

ITEM_PIPELINES = {
    'medicine_crawlers.pipelines.ACrawlersPipeline': 300,
    'medicine_crawlers.pipelines.BCrawlersPipeline': 300,
}

爬虫A的spider文件里

class longyi_spider(scrapy.Spider):
    name = '***'
    allowed_domains = ['***.com']
    start_urls = ['***']
    custom_settings={
    'ITEM_PIPELINES':{medicine_crawlers.pipelines.ACrawlersPipeline}
    }

爬虫B格式同上

飞锡2024

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
scrapy多个爬虫同时运行

运行爬虫import datetime as dt#同时爬取from scrapy.crawler import CrawlerProcessfrom scrapy.utils.project import get_project_settingsfile_name_A="爬虫A"+dt.datetime.now().strftime('%Y-%m-%d') +".json"file_name_B="爬虫B"+dt.datetime.now().strftime('%Y-%m-%d') +".
复制链接

扫一扫