项目实战-招标网站关键字段爬取错误分析

项目场景:

需要爬取招标网站上的关键字段

问题描述:

运行爬虫文件时,KeyError: ‘form_data’

2020-11-22 15:59:26 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: None)
2020-11-22 15:59:28 [scrapy.core.scraper] ERROR: Error downloading <POST https://ss.ebnew.com/tradingSearch/index.htm>
Traceback (most recent call last):
  File "D:\python3.8.6\lib\site-packages\twisted\internet\defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "D:\python3.8.6\lib\site-packages\twisted\python\failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\middleware.py", line 45, in process_request
    return (yield download_func(request=request, spider=spider))
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\defer.py", line 55, in mustbe_deferred
    result = f(*args, **kw)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 75, in download_request
    return handler.download_request(request, spider)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 88, in download_request
    return agent.download_request(request)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 342, in download_request
    agent = self._get_agent(request, timeout)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 301, in _get_agent
    _, _, proxyHost, proxyPort, proxyParams = _parse(proxy)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\webclient.py", line 36, in _parse
    return _parsed_url_args(parsed)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\webclient.py", line 19, in _parsed_url_args
    host = to_bytes(parsed.hostname, encoding="ascii")
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 106, in to_bytes
    raise TypeError('to_bytes must receive a str or bytes '
TypeError: to_bytes must receive a str or bytes object, got NoneType
2020-11-22 15:59:30 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: https://ss.ebnew.com/tradingSearch/index.htm)
2020-11-22 15:59:31 [scrapy.core.scraper] ERROR: Spider error processing <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: https://ss.ebnew.com/tradingSearch/index.htm)
Traceback (most recent call last):
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\defer.py", line 120, in iter_errback
    yield next(it)
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
    return next(self.data)
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
    return next(self.data)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 340, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\爬虫\pythonProject\CSDN热门爬虫\myspider\myspider\spiders\bilian.py", line 96, in parse_page1
    form_data=response.meta['form_data']
KeyError: 'form_data'
2020-11-22 15:59:31 [scrapy.core.engine] INFO: Closing spider (finished)

原因分析:

根据错误提示,定位到

File "D:\爬虫\pythonProject\CSDN热门爬虫\myspider\myspider\spiders\bilian.py", line 96, in parse_page1
    form_data=response.meta['form_data']
KeyError: 'form_data'
    def parse_page1(self, response):
        form_data=response.meta['form_data']
        keyword=form_data.get('key')

咨询了业务资深人士,
返回和请求需要针对的是同一个才可以,
见如下解决方案

解决方案:

 requset.meta['form_data'] = form_data
            yield requset

    def parse_page1(self, response):
        form_data = response.meta['form_data']
        keyword = form_data.get('key')

需要源码的同学可访问:
咨询公司招标信息采集平台

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值