1.开启多个命令行,分别执行scrapy cralw xxxx
2.编写一个脚本,写入以下代码,执行工程下的所有爬虫:
# -*- coding: utf-8 -*-
# @Time : 25/12/2016 5:35 PM
# @Author : ddvv
# @Site :
# @File : run.py
# @Software: PyCharm
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess
def main():
setting = get_project_settings()
process = CrawlerProcess(setting)
didntWorkSpider = ['sample']
for spider_name in process.spiders.list():
if spider_name in didntWorkSpider :
continue
print("Running spider %s" % (spider_name))
process.crawl(spider_name)
process.start()
3.使用scrapyd,部署爬虫,通过scrapyd的API调用爬虫
4.推荐使用spiderkeeper或者gerapy,这两个提供的WebUI都很好用,个人更喜欢spiderkeeper一些,因为可以定时运行爬虫。如图: