系统为Ubuntu16.04TLS。
1. Installtion
通过使用scrapy-client中的scrapy-deploy将scrapy project部署到scrapyd server。
# 安装scrapyd
pip install scrapyd
# 安装scrapy-client
# for python2.x
pip install git+https://github.com/scrapy/scrapyd-client
# for python3.6
pip install scrapy-client
2. Usage
a. 配置scrapy.cfg
[settings]
default = njupt.settings
[deploy:server-njupt]
url = http://localhost:6800/
project = njupt
b. 配置scrapyd
配置文件可参考scrapy文档进行配置。
其加载顺序为:
/etc/scrapyd/scrapyd.conf
/etc/scrapyd/conf.d/*
scrapyd.conf
~/.scrapyd.conf
example:
[scrapyd]
eggs_dir = eggs
logs_dir = logs
items_dir =
jobs_to_keep = 5
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
bind_address = 127.0.0.1
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
c. 启动scrapyd
scrapyd
d. 发布
# 进入scrapy project根目录
scrapyd-deploy server-njupt -p njupt
# 指定版本号,默认为当前时间戳
scrapyd-deploy server-njupt -p njupt --version 1.0
scrapy-deploy
的命令请看其帮助
e. 执行爬虫任务
curl http://localhost:6800/schedule.json -d project=njupt -d spider=njupt
可通过scrapyd-client spiders -p njupt 查看project=njupt下的spider。
3. Security
可以在scrapyd前面加一层反向代理来实现用户认证。以nginx
为例, 配置nginx
server {
listen 6801;
location / {
proxy_pass http://127.0.0.1:6800/;
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/htpasswd/user.htpasswd;
}
}
在/etc/nginx/htpasswd/user.htpasswd
里设置用户名和密码,假设都为test。修改scrapy.cfg如下:
[settings]
default = njupt.settings
[deploy:server-njupt]
url = http://localhost:6800/
project = njupt
username = test
password = test
4. API
参考官方文档API。