2021-01-29

NameError: Module 'scrapy_redis.dupefilter' doesn't define any object named 'RFPDuperFilter'

运行RedisCrawlSpider时报错如下:

(crawlvenv) [hdfs@miaoshou1x spiders]$ scrapy runspider zce_spider1.py 
2021-01-29 15:30:38 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: futures_market)
2021-01-29 15:30:38 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.7.0 (default, Jan 28 2021, 23:24:42) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)], pyOpenSSL 20.0.1 (OpenSSL 1.1.1i  8 Dec 2020), cryptography 3.3.1, Platform Linux-3.10.0-1062.el7.x86_64-x86_64-with-centos-7.7.1908-Core
2021-01-29 15:30:38 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2021-01-29 15:30:38 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'futures_market',
 'DOWNLOAD_DELAY': 2,
 'DUPEFILTER_CLASS': 'scrapy_redis.dupefilter.RFPDuperFilter',
 'NEWSPIDER_MODULE': 'futures_market.spiders',
 'SCHEDULER': 'scrapy_redis.scheduler.Scheduler',
 'SPIDER_LOADER_WARN_ONLY': True,
 'SPIDER_MODULES': ['futures_market.spiders']}
2021-01-29 15:30:38 [scrapy.extensions.telnet] INFO: Telnet Password: a0e84c030ce0acdf
2021-01-29 15:30:38 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2021-01-29 15:30:38 [zce_spider1] INFO: Reading start URLs from redis key 'zce:start_urls' (batch size: 16, encoding: utf-8
2021-01-29 15:30:38 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:57145/session {"capabilities": {"firstMatch": [{}], "alwaysMatch": {"browserName": "chrome", "platformName": "any", "goog:chromeOptions": {"extensions": [], "args": ["--headless", "--no-sandbox", "--disable-dev-shm-usage", "blink-settings=imagesEnabled=false", "--disable-gpu"]}}}, "desiredCapabilities": {"browserName": "chrome", "version": "", "platform": "ANY", "goog:chromeOptions": {"extensions": [], "args": ["--headless", "--no-sandbox", "--disable-dev-shm-usage", "blink-settings=imagesEnabled=false", "--disable-gpu"]}}}
2021-01-29 15:30:38 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 127.0.0.1:57145
2021-01-29 15:30:38 [urllib3.connectionpool] DEBUG: http://127.0.0.1:57145 "POST /session HTTP/1.1" 200 716
2021-01-29 15:30:38 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2021-01-29 15:30:38 [scrapy.middleware] INFO: Enabled downloader middlewares:
['futures_market.middlewares.ZCESpiderSeleniumDownloaderMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-01-29 15:30:38 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2021-01-29 15:30:38 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy_redis.pipelines.RedisPipeline']
2021-01-29 15:30:38 [scrapy.core.engine] INFO: Spider opened
2021-01-29 15:30:38 [scrapy.core.engine] INFO: Closing spider (shutdown)
2021-01-29 15:30:38 [scrapy.core.engine] ERROR: Scraper close failure
Traceback (most recent call last):
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/utils/misc.py", line 65, in load_object
    obj = getattr(mod, name)
AttributeError: module 'scrapy_redis.dupefilter' has no attribute 'RFPDuperFilter'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/crawler.py", line 89, in crawl
    yield self.engine.open_spider(self.spider, start_requests)
NameError: Module 'scrapy_redis.dupefilter' doesn't define any object named 'RFPDuperFilter'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/core/engine.py", line 325, in <lambda>
    dfd.addBoth(lambda _: self.scraper.close_spider(spider))
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/core/scraper.py", line 86, in close_spider
    slot.closing = defer.Deferred()
AttributeError: 'NoneType' object has no attribute 'closing'
2021-01-29 15:30:38 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method CoreStats.spider_closed of <scrapy.extensions.corestats.CoreStats object at 0x7fb346a40860>>
Traceback (most recent call last):
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/utils/misc.py", line 65, in load_object
    obj = getattr(mod, name)
AttributeError: module 'scrapy_redis.dupefilter' has no attribute 'RFPDuperFilter'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/crawler.py", line 89, in crawl
    yield self.engine.open_spider(self.spider, start_requests)
NameError: Module 'scrapy_redis.dupefilter' doesn't define any object named 'RFPDuperFilter'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/utils/defer.py", line 157, in maybeDeferred_coro
    result = f(*args, **kw)
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/extensions/corestats.py", line 31, in spider_closed
    elapsed_time = finish_time - self.start_time
TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'NoneType'
2021-01-29 15:30:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'log_count/DEBUG': 4, 'log_count/ERROR': 2, 'log_count/INFO': 9}
2021-01-29 15:30:38 [scrapy.core.engine] INFO: Spider closed (shutdown)
Unhandled error in Deferred:
2021-01-29 15:30:38 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/crawler.py", line 192, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/crawler.py", line 196, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
--- <exception caught here> ---
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/crawler.py", line 89, in crawl
    yield self.engine.open_spider(self.spider, start_requests)
builtins.NameError: Module 'scrapy_redis.dupefilter' doesn't define any object named 'RFPDuperFilter'

2021-01-29 15:30:38 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/utils/misc.py", line 65, in load_object
    obj = getattr(mod, name)
AttributeError: module 'scrapy_redis.dupefilter' has no attribute 'RFPDuperFilter'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/hdfs/.virtualenvs/crawlvenv/lib/python3.7/site-packages/scrapy/crawler.py", line 89, in crawl
    yield self.engine.open_spider(self.spider, start_requests)
NameError: Module 'scrapy_redis.dupefilter' doesn't define any object named 'RFPDuperFilter'

根本没有提及具体是哪个模块错了,看着这些报错的路径都是解释器的问题,可能是想说我的程序在解释器里的底层代码跑不通吧。但是

scrapy_redis.dupefilter.RFPDuperFilter

我看着眼熟。找来找去。。

原因:

# Scrapy-Redis相关配置
# 调度器配置,将之前默认的scrapy调度器改为Scrapy-Redis组件提供的调度器,确保request存储到redis中
SCHEDULER = "scrapy_redis.scheduler.Scheduler"

# 使用Scrapy-Redis组件提供的去重类型
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDuperFilter"

# 将item pipeline 设置为 Scrapy-Redis组件RedisPipeline
ITEM_PIPELINES = {
   # 'fang.pipelines.FangPipeline': 300,
   'scrapy_redis.pipelines.RedisPipeline': 300,
}

# 调度缓存设置,用于在Redis中保持Scrapy-Redis组件用到的队列,
# 缓存队列中的数据,从而可以实现暂停和恢复的功能。(以免总是从第一个url开始重复爬取数据)
SCHEDULER_PERSIST = True

# 设置Redis的连接信息
REDIS_HOST = '192.168.1.21'
REDIS_PORT = 6379

在settings文件里这段专门给scrapy_redis模块复制粘贴的代码里出问题了!

DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDuperFilter"

RFPDuperFilter,字母拼写错误,改成RFPDupeFilter如下:

DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"

最终运行成功了,真的一点都不能错呀,以后碰到这种情况要把报错的地方仔细看,看是否有空格就或拼写错误。把报错的内容到各个模块下查找。

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值