https://doc.scrapy.org/en/latest/topics/api.html#crawler-api
方法 | 描述 | 其他 |
---|---|---|
crawl(crawler_or_spidercls, *args, **kwargs) | 根据传入的参数启动一个爬虫 | |
crawlers | 查看已经添加的爬虫 | |
create_crawler(crawler_or_spidercls) | 创建一个爬虫 | |
join() | Returns a deferred that is fired when all managed crawlers have completed their executions. | |
start(stop_after_crawl=True) | ||
stop() | 停止 |
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
# 'followall' is the name of one of the spiders of the project.
process.crawl('followall', domain='scrapinghub.com')
process.start() # the script will block here until the crawling is finished