参照上上篇安装scrapy的文章,把scarpy的依赖项以及scrapy安装完成之后,用scrapy startproject 创建了一个爬虫项目,然后写了一个最简单的爬虫模块
保存在项目的spider目录下,代码也很简单,如下
#coding:utf-8 import scrapy class CnblogsSpider(scrapy.Spider): name = "cnblogs" allowd_domains = ["cnblogs.com"] start_urls = ["http://www.cnblogs.com/qiyeboy/default.html?page=1"] def parse(self,response): pass
然后执行scrapy crawl cnblogs ,总是报一大堆错误,错误信息如下
D:\cnblogs\cnblogsSpider>scrapy crawl cnblogs
2017-08-02 19:16:34+0800 [scrapy] INFO: Scrapy 0.14.4 started (bot: cnblogsSpider)
2017-08-02 19:16:35+0800 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
Traceback (most recent call last):
File "C:\Python27\Scripts\scrapy", line 4, in <module>
__import__('pkg_resources').run_script('Scrapy==0.14.4', 'scrapy')
File "C:\Python27\lib\site-packages\pkg_resources\__init__.py", line 743, in run_script