scrapy项目开发过程中,scrapy运行请求返回解析错误:
2018-12-21 13:02:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://map.haodf.com/hospital/DE4raCNSz6Om-9cfC2nM4CIa/map.htm> (failed 1 times): [<twisted.python.failure.Failure twisted.web._newclient.ParseError: (u'wrong number of parts', 'HTTP/1.1 301')>]
解决方案:
在项目中定义此方法:
class CrawlHaodfSpider(BaseSpider):
......
......
......
def _monkey_patching_HTTPClientParser_statusReceived(self):
from twisted.web._newclient import HTTPClientParser, ParseError
old_sr = HTTPClientParser.statusReceived
def statusReceived(self, status):
try:
return old_sr(self, status)
except ParseError as e:
if e.args[0] == 'wrong number of parts':
return old_sr(self, status + ' OK')
raise
statusReceived.__doc__ = old_sr.__doc__
HTTPClientParser.statusReceived = statusReceived
然后在重写的start_requests方法中每个初始请求加上这一行
def start_requests(self):
for url in self.start_urls:
self._monkey_patching_HTTPClientParser_statusReceived()
yield Request(url, dont_filter=True)