项目场景:
需要爬取招标网站上的关键字段
问题描述:
运行爬虫文件时,KeyError: ‘form_data’
2020-11-22 15:59:26 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: None)
2020-11-22 15:59:28 [scrapy.core.scraper] ERROR: Error downloading <POST https://ss.ebnew.com/tradingSearch/index.htm>
Traceback (most recent call last):
File "D:\python3.8.6\lib\site-packages\twisted\internet\defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "D:\python3.8.6\lib\site-packages\twisted\python\failure.py", line 512, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\middleware.py", line 45, in process_request
return (yield download_func(request=request, spider=spider))
File "D:\python3.8.6\lib\site-packages\scrapy\utils\defer.py", line 55, in mustbe_deferred
result = f(*args, **kw)
File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 75, in download_request
return handler.download_request(request, spider)
File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 88, in download_request
return agent.download_request(request)
File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 342, in download_request
agent = self._get_agent(request, timeout)
File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 301, in _get_agent
_, _, proxyHost, proxyPort, proxyParams = _parse(proxy)
File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\webclient.py", line 36, in _parse
return _parsed_url_args(parsed)
File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\webclient.py", line 19, in _parsed_url_args
host = to_bytes(parsed.hostname, encoding="ascii")
File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 106, in to_bytes
raise TypeError('to_bytes must receive a str or bytes '
TypeError: to_bytes must receive a str or bytes object, got NoneType
2020-11-22 15:59:30 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: https://ss.ebnew.com/tradingSearch/index.htm)
2020-11-22 15:59:31 [scrapy.core.scraper] ERROR: Spider error processing <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: https://ss.ebnew.com/tradingSearch/index.htm)
Traceback (most recent call last):
File "D:\python3.8.6\lib\site-packages\scrapy\utils\defer.py", line 120, in iter_errback
yield next(it)
File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
return next(self.data)
File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
return next(self.data)
File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
for x in result:
File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 340, in <genexpr>
return (_set_referer(r) for r in result or ())
File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "D:\爬虫\pythonProject\CSDN热门爬虫\myspider\myspider\spiders\bilian.py", line 96, in parse_page1
form_data=response.meta['form_data']
KeyError: 'form_data'
2020-11-22 15:59:31 [scrapy.core.engine] INFO: Closing spider (finished)
原因分析:
根据错误提示,定位到
File "D:\爬虫\pythonProject\CSDN热门爬虫\myspider\myspider\spiders\bilian.py", line 96, in parse_page1
form_data=response.meta['form_data']
KeyError: 'form_data'
def parse_page1(self, response):
form_data=response.meta['form_data']
keyword=form_data.get('key')
咨询了业务资深人士,
返回和请求需要针对的是同一个才可以,
见如下解决方案
解决方案:
requset.meta['form_data'] = form_data
yield requset
def parse_page1(self, response):
form_data = response.meta['form_data']
keyword = form_data.get('key')
需要源码的同学可访问:
咨询公司招标信息采集平台