spiderkeeper是一款开源的spider管理工具,可以方便的进行爬虫的启动,暂停,定时,同时可以查看分布式情况下所有爬虫日志,查看爬虫执行情况等功能。
#安装
安装环境
ubuntu16.04
python3.5
pip3 install scrapy
pip3 install scrapyd
pip3 install scrapyd-client
pip3 install scrapy-redis
pip3 install SpiderKeeper
部署爬虫
##1 进入到写好的scrapy项目路径中,启动scrapyd
python@ubuntu:~$ scrapyd
启动之后就可以打开本地运行的scrapyd,浏览器中访问本地6800端口可以查看scrapyd的监控界面
启动成功显示如下:
:0: UserWarning: You do not have a working installation of the service_identity module: 'cannot import name 'opentype''. Please install it from <https://pypi.python.org/pypi/service_identity> and make sure all of its dependencies are satisfied. Without the service_identity module, Twisted can perform only rudimentary TLS client hostname verification. Many valid certificate/hostname mappings may be rejected.
2018-08-18T18:55:20+0800 [-] Loading /usr/local/lib/python3.5/dist-packages/scrapyd/txapp.py...
2018-08-18T18:55:20+0800 [-] Scrapyd web console available at http://127.0.0.1:6800/
2018-08-18T18:55:20+0800 [-] Loaded.
2018-08-18T18:55:20+0800 [twisted.scripts._twistd_unix.UnixAppLogger#i