最近在学习使用python的Scrapy爬虫框架练习爬取网站,在爬取的时候总是执行失败,具体错误如下:
2017-03-09 13:58:34 [scrapy] INFO: Enabled item pipelines:
[]
2017-03-09 13:58:34 [scrapy] INFO: Spider opened
2017-03-09 13:58:34 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-03-09 13:58:34 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-03-09 13:58:34 [scrapy] ERROR: Error downloading <GET http://www.23us.com/robots.txt>: 'float' object is not iterable
Traceback (most recent call last):
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request
return handler.download_request(request, spider)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request
return agent.download_request(request)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request
method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1631, in request
parsedURI.originForm)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint
d = self._pool.getConnection(key, endpoint)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1294, in getConnection
return self._newConnection(key, endpoint)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection
return endpoint.connect(factory)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect
EndpointReceiver, self._hostText, portNumber=self._port
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName
onAddress = self._simpleResolver.getHostByName(hostName)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName
d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName
timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable
2017-03-09 13:58:34 [scrapy] ERROR: Error downloading <GET http://www.23us.com/class/1_1.html>
TypeError: 'float' object is not iterable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request
return handler.download_request(request, spider)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request
return agent.download_request(request)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request
method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1631, in request
parsedURI.originForm)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint
d = self._pool.getConnection(key, endpoint)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1294, in getConnection
return self._newConnection(key, endpoint)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection
return endpoint.connect(factory)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect
EndpointReceiver, self._hostText, portNumber=self._port
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName
onAddress = self._simpleResolver.getHostByName(hostName)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName
d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
File "D:\greenSoftware\anaconda3-python\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName
timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable
2017-03-09 13:58:34 [scrapy] ERROR: Error downloading <GET http://www.23us.com/class/2_1.html>
TypeError: 'float' object is not iterable
During handling of the above exception, another exception occurred:
经过搜索,发现是本地的Twisted
库的版本问题(具体可以参见这个)。
而我在本地使用的是anaconda python
发行版,在安装Scrapy的时候默认安装的Twisted
库是17.1.0。只要把Twisted
库降级到16.6.0即可(使用conda install Twisted==16.6.0
安装)。