爬虫Scrapy框架的安装配置
突然想萌生了学习爬虫的想法,于是我就去安装scrapy这个框架,但是————,但是配置的过程真是太糟心了,scrapy依赖的组件相当多,而且安装的组件对Python的版本也有要求。。。
费了很大力气,找了网上各种大神的资料,逛了好几遍Stack Overflow,总是安装成功了 = _ =。在这里写一下安装的过程和心得<–
1.下载python2并安装
这个就不用多说了,去Python 官网 https://www.python.org/downloads/ 下一个安装包就行,但是要记住安装包的版本,我以前安装的是3.5版本的。
然后配置python环境变量(写这段可能是废话,都应该知道环境变量的配置吧。。):
电脑->属性->高级->环境变量->在系统变量中的Path和用户变量中的path的结尾加上python路径名(我的安装目录为D:\Python)
打开windows命令行cmd,并输入Python,显示:
说明python安装和环境变量配置成功。
2.安装Scrapy
首先更新pip版本
打开cmd命令行,输入:pip install –upgrade pip
尝试安装scrapy
输入:pip install Scrapy
这会报下面的错误:
这个错误原因是python安装第三方库超时了。输入:pip –default-timeout=100 install -U Pillow来安装pillow、设置超时时间。
再次输入:pip install Scrapy
这里提示twisted的版本与python的不对应。。再次设置超时时间
pip –default-timeout=100
设置成功。安装Scrapy
输入:pip install Scrapy
这个时候又报错了。。。(/(ㄒoㄒ)/~~)
这个错误原因是 缺少 Microsoft Visual C++ 14.0 .
解决方法:在 http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted 手动下载对应的版本(cp后面跟的是对应python的版本号,我的是3.5的,对应就是cp35。amd64表示是64位)。
将下载的文件放到一个已知目录下,然后在cmd中转到这个目录下,输入:pip install Twisted-17.5.0-cp35-cp35m-win_amd64.whl(安装刚下载的文件)
最后执行:pip install Scrapy
显示successfully installed Scrapy-1.4.0总算是安装成功了。。。
测试scrapy
下载完成了,我们测试一下scrapy,控制台输入:scrapy bench
如果出现下面错误:
Traceback (most recent call last):
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\twisted\internet\defer.py", line 1260, in _inlineCallbacks
result = g.send(result)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\core\engine.py", line 68, in __init__
self.downloader = downloader_cls(crawler)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\core\downloader\__init__.py", line 88, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\utils\misc.py", line 44, in load_object
mod = import_module(module)
File "c:\users\aobo\appdata\local\programs\python\python35\lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 986, in _gcd_import
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 662, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\downloadermiddlewares\retry.py", line 23, in <module>
from scrapy.xlib.tx import ResponseFailed
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\scrapy\xlib\tx\__init__.py", line 3, in <module>
from twisted.web import client
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\twisted\web\client.py", line 42, in <module>
from twisted.internet.endpoints import TCP4ClientEndpoint, SSL4ClientEndpoint
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\twisted\internet\endpoints.py", line 36, in <module>
from twisted.internet.stdio import StandardIO, PipeAddress
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\twisted\internet\stdio.py", line 30, in <module>
from twisted.internet import _win32stdio
File "c:\users\aobo\appdata\local\programs\python\python35\lib\site-packages\twisted\internet\_win32stdio.py", line 9, in <module>
import win32api
ImportError: DLL load failed: 找不到指定的模块。
这说明系统中缺少pywin32程序,我们去下载一下。
https://sourceforge.net/projects/pywin32/files/
http://www.lfd.uci.edu/~gohlke/pythonlibs/
注意下载的pywin32版本一定要与Python一致
如果上面的网站下载速度过慢导致下载失败的话,去百度一个也行。
安装完成后,控制台输入:scrapy bench
测试结果显示你的电脑平均每分钟能爬多少页。
最后 Python ExtensionPackage 网站是Windows平台下Python的扩展包下载地址,里面包含了Python开发可能用到的所有扩展包,建议收藏。