一、安装scrapyd和scrapyd-client
1、pip install scrapyd
2、pip install scrapyd-client
二、在命令行敲scrapyd打开web service端口
出现如上 说明启动scrapyd启动成功
若出现了以下错误:
File "/usr/local/lib/python2.7/dist-packages/scrapyd-1.1.0-py2.7.egg/scrapyd/utils.py", line 61, in get_spider_queues
d[project] = SqliteSpiderQueue(dbpath)
File "/usr/local/lib/python2.7/dist-packages/scrapyd-1.1.0-py2.7.egg/scrapyd/spiderqueue.py", line 12, in __init__
self.q = JsonSqlitePriorityQueue(database, table)
File "/usr/local/lib/python2.7/dist-packages/scrapyd-1.1.0-py2.7.egg/scrapyd/sqlite.py", line 95, in __init__
self.conn = sqlite3.connect(self.database, check_same_thread=False)
sqlite3.OperationalError: unable to open database file
通过安装sqlite3后,可以解决该错误
sqlite3 下载地址https://github.com/lgastako/db-sqlite3,
启动scrapy后可以在web上打开http:localhost:6800浏览
三、打开scrapy项目的scrapy.cfg,
[settings]
default = courtannounce.settings
[deploy]
#url = http://localhost:6800/
project = courtannounce
将deploy下的url注释去掉
默认的target是defualt,不过也可以改成:
[deploy:scrapy1]
将target修改成scrapy1
然后通过scrapyd-deploy scrapy1 -p courtannounce将scrapy(scrapy1为target, courtannounce是spider name)加入scrapyd进行监听
四、curl http://localhost:6800/schedule.json -d project=courtannounce -d spider=courtannouncement
通过curl启动scrapy进行爬虫
在web页面上通过job一栏可以查看爬虫的状态和log信息