1. 安装
按照官方文档的安装指南,一步步走就行了。我安装在windows下
http://scrapy-chs.readthedocs.org/zh_CN/latest/intro/install.html#windows
2. 初探
还是官方文档,继续
http://scrapy-chs.readthedocs.org/zh_CN/latest/intro/tutorial.html
但是在运行爬虫程序的时候报错了,如下:
E:\Python Workspace\tutorial>scrapy crawl dmoz
:0: UserWarning: You do not have a working installation of the service_identity
module: 'No module named service_identity'. Please install it from <https://pyp
i.python.org/pypi/service_identity> and make sure all of its dependencies are sa
tisfied. Without the service_identity module and a recent enough pyOpenSSL to s
upport it, Twisted can perform only rudimentary TLS client hostname verification
. Many valid certificate/hostname mappings may be rejected.
2015-05-28 11:23:20+0800 [scrapy] INFO: Scrapy 0.24.6 started (bot: tutorial)
根据提示,去下载和安装service_identity
,地址为:https://pypi.python.org/pypi/service_identity#downloads,下载whl文件
使用pip安装:pip install service_identity-14.0.0-py2.py3-none-any.whl
再次运行,继续报错:
raise ImportError("Error loading object '%s': %s" % (path, e))
ImportError: Error loading object 'scrapy.core.downloader.handlers.s3.S3Download
Handler': DLL load failed: 找不到指定的模块。
需要安装pywin32,这里有个问题,在官网下载的安装时会报错,我使用了另一个之前下载的,版本号、大小完全一样,但安装没问题。