python多个for_python – 在for循环中运行多个spider

我尝试实例化多个蜘蛛.第一个工作正常,但第二个给我一个错误:ReactorNotRestartable.

feeds = {

'nasa': {

'name': 'nasa',

'url': 'https://www.nasa.gov/rss/dyn/breaking_news.rss',

'start_urls': ['https://www.nasa.gov/rss/dyn/breaking_news.rss']

},

'xkcd': {

'name': 'xkcd',

'url': 'http://xkcd.com/rss.xml',

'start_urls': ['http://xkcd.com/rss.xml']

}

}

通过上面的项目,我尝试在循环中运行两个蜘蛛,如下所示:

from scrapy.crawler import CrawlerProcess

from scrapy.spiders import XMLFeedSpider

class MySpider(XMLFeedSpider):

name = None

def __init__(self, **kwargs):

this_feed = feeds[self.name]

self.start_urls = this_feed.get('start_urls')

self.iterator = 'iternodes'

self.itertag = 'items'

super(MySpider, self).__init__(**kwargs)

def parse_node(self, response, node):

pass

def start_crawler():

process = CrawlerProcess({

'USER_AGENT': CONFIG['USER_AGENT'],

'DOWNLOAD_HANDLERS': {'s3': None} # boto issues

})

for feed_name in feeds.keys():

MySpider.name = feed_name

process.crawl(MySpider)

process.start()

第二个循环的例外看起来像这样,蜘蛛打开了,但随后:

...

2015-11-22 00:00:00 [scrapy] INFO: Spider opened

2015-11-22 00:00:00 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2015-11-22 00:00:00 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023

2015-11-21 23:54:05 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023

Traceback (most recent call last):

File "env/bin/start_crawler", line 9, in

load_entry_point('feed-crawler==0.0.1', 'console_scripts', 'start_crawler')()

File "/Users/bling/py-feeds-crawler/feed_crawler/crawl.py", line 51, in start_crawler

process.start() # the script will block here until the crawling is finished

File "/Users/bling/py-feeds-crawler/env/lib/python2.7/site-packages/scrapy/crawler.py", line 251, in start

reactor.run(installSignalHandlers=False) # blocking call

File "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1193, in run

self.startRunning(installSignalHandlers=installSignalHandlers)

File "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1173, in startRunning

ReactorBase.startRunning(self)

File "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", line 684, in startRunning

raise error.ReactorNotRestartable()

twisted.internet.error.ReactorNotRestartable

我是否必须使第一个MySpider无效或我做错了什么,需要改变它的工作原理.提前致谢.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值