问题描述:
将已经写好的scrapy分布式项目部署到linux环境中后执行spider.py文件报错,如下
2019-01-20 23:05:08 [boto] ERROR: Caught exception reading instance data
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url
r = opener.open(req, timeout=timeout)
File "/usr/lib/python2.7/urllib2.py", line 429, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 447, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)
URLError: <urlopen error timed out>
2019-01-20 23:05:08 [boto] ERROR: Unable to read instance data, giving up
解决办法:
网络上给出的解决办法是在项目的settings.py文件中,加上
DOWNLOAD_HANDLERS = {'S3': None,}
但是我按照这个方法添加完运行错误还在,最后在spider.py文件中加入
from scrapy import optional_features
optional_features.remove('boto')
再次运行spider.py文件。成功
原文:https://blog.csdn.net/u014408532/article/details/72961118