scrapy爬虫多次启动异常

最新推荐文章于 2024-06-27 08:37:41 发布

nicajonh

最新推荐文章于 2024-06-27 08:37:41 发布

阅读量6k

点赞数

分类专栏： Python 文章标签： Python

本文链接：https://blog.csdn.net/nicajonh/article/details/78071265

版权

Python 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

最近在scrapy爬虫项目中遇到一些问题，手动通过CrawlProcess调度爬虫,报出异常错误“Scrapy - Reactor not Restartable”,原因是在同一个进程中无法重启twisted框架中的reactor堆。

解决方案：

通过另外一个进程中启动reactor,示例代码

import scrapy
import scrapy.crawler as crawler
from multiprocessing import Process, Queue
from twisted.internet import reactor

# your spider
class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ['http://quotes.toscrape.com/tag/humor/']

    def parse(self, response):
        for quote in response.css('div.quote'):
            print(quote.css('span.text::text').extract_first())


# the wrapper to make it run more times
def run_spider():
    def f(q):
        try:
            runner = crawler.CrawlerRunner()
            deferred = runner.crawl(QuotesSpider)
            deferred.addBoth(lambda _: reactor.stop())
            reactor.run()
            q.put(None)
        except Exception as e:
            q.put(e)

    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    result = q.get()
    p.join()

    if result is not None:
        raise result


print('first run:')
run_spider()

print('\nsecond run:')
run_spider()

结果：

first run:
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
“A day without sunshine is like, you know, night.”
...

second run:
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
“A day without sunshine is like, you know, night.”