crawl语法: scrapy crawl
是否需要项目: yes
1.在cmd窗口执行$ scrapy crawl myspider
[ ... myspider starts crawling ... ]
2.pycharm运行
当你运行 scrapy command arg 这样的命令时,这里的 scrapy 实质是一个 python 脚本,它接受参数,调用 scrapy/cmdline.py 中的 execute() 函数.通过几下几步配置执行命令。
3.通过python脚本调用
这里主要通过scrapy.crawler.CrawlerProcess来实现在脚本里运行一个spider。# -*- coding: utf-8 -*-
from scrapy.crawler import CrawlerProcess
from scrapy.settings import Settings
#引用spider
from project.spiders.spider_name import spider_class
#配置setting,可自定义setting属性
settings = Settings()
process = CrawlerProcess()
#执行spider
process.crawl(CtripSpider)
process.start()
4.通过CrawlerRunner运行一个spider# -*- coding: utf-8 -*-
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.settings import Settings
#引用spider
from project.spiders.spider_name import spider_class
#配置setting,可自定义setting属性
settings = Settings()
#执行spider
runner = CrawlerRunner(settings)
runner.crawl(spider_class)
reactor.run()