python27安装pyspider_pyspider安装

操作系统

CentOS Linux release 7.0.1406 (Core)

Python环境

Python安装

安装依赖:

yum install gcc # 安装python必须

yum install zlib # 以下四个安装setuptools必须,如果安装在python后,则需要重新make python

yum install zlib-devel

yum install openssl

yum install openssl-devel

cd Python-2.7.13

./configure --prefix=/python2.7

make

make install

配置环境变量

# vi ~/.bash_profile

export PATH=/python2.7/bin:$PATH

安装pip

依赖:setuptools

依赖:six-1.10.0.tar.gz packaging-16.8.tar.gz pyparsing-2.2.0.tar.gz appdirs-1.4.3.tar.gz

cd pip-9.0.1

# python setup.py install

安装pyspider

从github下载最新版pyspider

依赖系统包:

tcl  protobuf libcurl-devel libxslt-devel  libxml2

使用yum install 安装他们。。。

cd pyspider

# 安装依赖包并安装

pip install -r requirements.txt

python setup.py install

由于requirements.txt中的mysql-connector无法下载,所以选择安装其它版本的mysql-connector

pip install mysql-connector==2.1.4

安装mysql数据库

用yum安装完后,参考http://www.itnose.net/detail/6310643.html,完成数据库的安装。

# 重启mysql

service mysqld restart

# mysql -u root

# 修改root密码

mysql> use msyql

mysql> update user set password=password('123456') where user='root';

# 创建数据库并授权

mysql> create database taskdb;

mysql> create database projectdb;

mysql> create database resultdb;

mysql> create user 'pyspider'@'%';

mysql> create user pyspider@'localhost' identified by 'pyspider-pass';

mysql> grant select,insert,update,references,delete,create,drop,alter,index,trigger,create view,show view,execute,alter routine,create routine,create temporary tables,lock tables,event on taskdb.* to 'pyspider'@'%';

mysql> grant select,insert,update,references,delete,create,drop,alter,index,trigger,create view,show view,execute,alter routine,create routine,create temporary tables,lock tables,event on projectdb.* to 'pyspider'@'%';

mysql> grant select,insert,update,references,delete,create,drop,alter,index,trigger,create view,show view,execute,alter routine,create routine,create temporary tables,lock tables,event on resultdb.* to 'pyspider'@'%';

mysql> flush privileges;

修改配置文件(为集群做准备)

vi /etc/my.cnf

bind-address = 0.0.0.0

# 重启数据库

service mysqld restart

安装redis

下载redis,并解压到/root/training目录下

安装redis

cd /root/training/redis-2.8.12

make

make test

make install

# 为集群做准备

cd /root/training/redis-3.2.8

cp redis.conf /etc/

vi /etc/redis.conf

bind 0.0.0.0

# 启动 redis

redis-server /etc/redis.conf &

启动成功标志:The server is now ready to accept connections on port 6379

防火墙

查看防火墙状态:

firewall-cmd --state

自己两条配置:

iptables -A INPUT -s 127.0.0.1 -p tcp --dport 6379 -j ACCEPT

iptables -A INPUT -p tcp --dport 6379 -j DROP

关闭firewall:

systemctl stop firewalld.service #停止firewall

systemctl disable firewalld.service #禁止firewall开机启动

如果不会配置,最好停止防火墙。

安装phantomjs

下载:wget https://bbuseruploads.s3.amazonaws.com/fd96ed93-2b32-46a7-9d2b-ecbc0988516a/downloads/396e7977-71fd-4592-8723-495ca4cfa7cc/phantomjs-2.1.1-linux-x86_64.tar.bz2?Signature=guF7TAUW11qr9nZXcTBHu7dg1ds%3D&Expires=1488510600&AWSAccessKeyId=AKIAIVFPT2YJYYZY3H4A&versionId=null&response-content-disposition=attachment%3B%20filename%3D%22phantomjs-2.1.1-linux-x86_64.tar.bz2%22

下载phantomjs-2.1.1-linux-x86_64.tar.bz2到/root目录下,解压

将 phantomjs/bin目录下的phantomjs文件拷贝到/python2.7/bin目录下

配置文件

====================================================================

pyspider配置文件如下:

{

"taskdb": "mysql+taskdb://pyspider:pyspider-pass@localhost:3306/taskdb",

"projectdb": "mysql+projectdb://pyspider:pyspider-pass@localhost:3306/projectdb",

"resultdb": "mysql+resultdb://pyspider:pyspider-pass@localhost:3306/resultdb",

"message_queue": "redis://localhost:6379/db",

"webui": {

"port":5555,

"username": "pyspider",

"password": "pyspider-pass",

"need-auth": true

}

}

=========================================

# 为安全起见,我们新建一个普通用户来存储配置文件

useradd -md /pyspider pyspider

# 保存配置文件

/pyspider/config.json

# 权限设置

chown -R pyspider:pyspider /pyspider

chmod 400 config.json

启动pyspider

启动pyspider

/anaconda2/bin/pyspider -c /pyspider/config.json

结果如下:

# pyspider -c /pyspider/config.json

[W 170516 17:45:05 __init__:54] redis DB must zero-based numeric index, using 0 instead

[I 170516 17:45:05 result_worker:49] result_worker starting...

[W 170516 17:45:06 __init__:54] redis DB must zero-based numeric index, using 0 instead

[W 170516 17:45:06 __init__:54] redis DB must zero-based numeric index, using 0 instead

[W 170516 17:45:06 __init__:54] redis DB must zero-based numeric index, using 0 instead

[W 170516 17:45:06 __init__:54] redis DB must zero-based numeric index, using 0 instead

[I 170516 17:45:06 processor:211] processor starting...

[W 170516 17:45:07 __init__:54] redis DB must zero-based numeric index, using 0 instead

[W 170516 17:45:07 __init__:54] redis DB must zero-based numeric index, using 0 instead

[I 170516 17:45:07 tornado_fetcher:638] fetcher starting...

[W 170516 17:45:07 __init__:54] redis DB must zero-based numeric index, using 0 instead

[W 170516 17:45:07 __init__:54] redis DB must zero-based numeric index, using 0 instead

[W 170516 17:45:07 __init__:54] redis DB must zero-based numeric index, using 0 instead

[I 170516 17:45:09 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333

[I 170516 17:45:09 scheduler:647] scheduler starting...

phantomjs fetcher running on port 25555

[I 170516 17:45:09 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0

[W 170516 17:45:10 __init__:54] redis DB must zero-based numeric index, using 0 instead

[W 170516 17:45:10 __init__:54] redis DB must zero-based numeric index, using 0 instead

[W 170516 17:45:10 __init__:54] redis DB must zero-based numeric index, using 0 instead

[W 170516 17:45:10 __init__:54] redis DB must zero-based numeric index, using 0 instead

[W 170516 17:45:10 __init__:54] redis DB must zero-based numeric index, using 0 instead

[I 170516 17:45:10 app:76] webui running on 0.0.0.0:5555

///目前这块还有问题

安装supervisor,监控所有进程

supervisor用来监控pyspider进程,如果停止则立即启动,下载supervisor-3.3.1到/root目录下,并解压。

cd /root/supervisor-3.3.1

python setup.py install

pip install supervisor

创建默认的配置文件并设置

# /python2.7/bin/echo_supervisord_conf > /python2.7/conf/supervisor.conf

; Sample supervisor config file.

;

; For more information on the config file, please see:

; http://supervisord.org/configuration.html

;

; Notes:

; - Shell expansion ("~" or "$HOME") is not supported. Environment

; variables can be expanded using this syntax: "%(ENV_HOME)s".

; - Comments must have a leading space: "a=b ;comment" not "a=b;comment".

[unix_http_server]

file=/tmp/supervisor.sock ; (the path to the socket file)

chmod=0700 ; socket file mode (default 0700)

chown=root:root ; socket file uid:gid owner

;username=user ; (default is no username (open server))

;password=123 ; (default is no password (open server))

[inet_http_server] ; inet (TCP) server disabled by default

port=127.0.0.1:9001 ; (ip_address:port specifier, *:port for all iface)

username=supervisor ; (default is no username (open server))

password=123 ; (default is no password (open server))

[supervisord]

logfile=/tmp/supervisord.log ; (main log file;default $CWD/supervisord.log)

logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB)

logfile_backups=10 ; (num of main logfile rotation backups;default 10)

loglevel=info ; (log level;default info; others: debug,warn,trace)

pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid)

nodaemon=false ; (start in foreground if true;default false)

minfds=1024 ; (min. avail startup file descriptors;default 1024)

minprocs=200 ; (min. avail process descriptors;default 200)

;umask=022 ; (process file creation umask;default 022)

;user=chrism ; (default is current user, required if root)

;identifier=supervisor ; (supervisord identifier, default is 'supervisor')

;directory=/tmp ; (default is not to cd during start)

;nocleanup=true ; (don't clean up tempfiles at start;default false)

;childlogdir=/tmp ; ('AUTO' child log dir, default $TEMP)

;environment=KEY="value" ; (key value pairs to add to environment)

;strip_ansi=false ; (strip ansi escape codes in logs; def. false)

; the below section must remain in the config file for RPC

; (supervisorctl/web interface) to work, additional interfaces may be

; added by defining them in separate rpcinterface: sections

[rpcinterface:supervisor]

supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]

serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket

;serverurl=http://127.0.0.1:9001 ; use an http:// url to specify an inet socket

username=suppervisor ; should be same as http_username if set

password=123 ; should be same as http_password if set

prompt=mysupervisor ; cmd line prompt (default "supervisor")

history_file=~/.sc_history ; use readline history if available

; The below sample program section shows all possible program subsection values,

; create one or more 'real' program: sections to be able to control them under

; supervisor.

;[program:theprogramname]

;command=/bin/cat ; the program (relative uses PATH, can take args)

;process_name=%(program_name)s ; process_name expr (default %(program_name)s)

;numprocs=1 ; number of processes copies to start (def 1)

;directory=/tmp ; directory to cwd to before exec (def no cwd)

;umask=022 ; umask for process (default None)

;priority=999 ; the relative start priority (default 999)

;autostart=true ; start at supervisord start (default: true)

;startsecs=1 ; # of secs prog must stay up to be running (def. 1)

;startretries=3 ; max # of serial start failures when starting (default 3)

;autorestart=unexpected ; when to restart if exited after running (def: unexpected)

;exitcodes=0,2 ; 'expected' exit codes used with autorestart (default 0,2)

;stopsignal=QUIT ; signal used to kill process (default TERM)

;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10)

;stopasgroup=false ; send stop signal to the UNIX process group (default false)

;killasgroup=false ; SIGKILL the UNIX process group (def false)

;user=chrism ; setuid to this UNIX account to run the program

;redirect_stderr=true ; redirect proc stderr to stdout (default false)

;stdout_logfile=/a/path ; stdout log path, NONE for none; default AUTO

;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB)

;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10)

;stdout_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0)

;stdout_events_enabled=false ; emit events on stdout writes (default false)

;stderr_logfile=/a/path ; stderr log path, NONE for none; default AUTO

;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB)

;stderr_logfile_backups=10 ; # of stderr logfile backups (default 10)

;stderr_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0)

;stderr_events_enabled=false ; emit events on stderr writes (default false)

;environment=A="1",B="2" ; process environment additions (def no adds)

;serverurl=AUTO ; override serverurl computation (childutils)

; The below sample eventlistener section shows all possible

; eventlistener subsection values, create one or more 'real'

; eventlistener: sections to be able to handle event notifications

; sent by supervisor.

;[eventlistener:theeventlistenername]

;command=/bin/eventlistener ; the program (relative uses PATH, can take args)

;process_name=%(program_name)s ; process_name expr (default %(program_name)s)

;numprocs=1 ; number of processes copies to start (def 1)

;events=EVENT ; event notif. types to subscribe to (req'd)

;buffer_size=10 ; event buffer queue size (default 10)

;directory=/tmp ; directory to cwd to before exec (def no cwd)

;umask=022 ; umask for process (default None)

;priority=-1 ; the relative start priority (default -1)

;autostart=true ; start at supervisord start (default: true)

;startsecs=1 ; # of secs prog must stay up to be running (def. 1)

;startretries=3 ; max # of serial start failures when starting (default 3)

;autorestart=unexpected ; autorestart if exited after running (def: unexpected)

;exitcodes=0,2 ; 'expected' exit codes used with autorestart (default 0,2)

;stopsignal=QUIT ; signal used to kill process (default TERM)

;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10)

;stopasgroup=false ; send stop signal to the UNIX process group (default false)

;killasgroup=false ; SIGKILL the UNIX process group (def false)

;user=chrism ; setuid to this UNIX account to run the program

;redirect_stderr=false ; redirect_stderr=true is not allowed for eventlisteners

;stdout_logfile=/a/path ; stdout log path, NONE for none; default AUTO

;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB)

;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10)

;stdout_events_enabled=false ; emit events on stdout writes (default false)

;stderr_logfile=/a/path ; stderr log path, NONE for none; default AUTO

;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB)

;stderr_logfile_backups=10 ; # of stderr logfile backups (default 10)

;stderr_events_enabled=false ; emit events on stderr writes (default false)

;environment=A="1",B="2" ; process environment additions

;serverurl=AUTO ; override serverurl computation (childutils)

; The below sample group section shows all possible group values,

; create one or more 'real' group: sections to create "heterogeneous"

; process groups.

;[group:thegroupname]

;programs=progname1,progname2 ; each refers to 'x' in [program:x] definitions

;priority=999 ; the relative start priority (default 999)

; The [include] section can just contain the "files" setting. This

; setting can list multiple files (separated by whitespace or

; newlines). It can also contain wildcards. The filenames are

; interpreted as relative to this file. Included files *cannot*

; include files themselves.

;[include]

;files = relative/directory/*.ini

[group:pyspider]

programs=pyspider-fetcher,pyspider-processor

[program:pyspider-fetcher]

command=/python2.7/bin/pyspider -c /pyspider/config.json fetcher

autorestart=true

autostart=true

user=root

group=pyspider

stopasgroup=true

[program:pyspider-processor]

command=/python2.7/bin/pyspider -c /pyspider/config.json processor

autorestart=true

autostart=true

user=root

group=pyspider

stopasgroup=true

stderr_logfile=/var/Spider/Log/Process/spider_process_err.log

stdout_logfile=/var/Spider/Log/Process/spider_process_out.log

启动supervisor

# supervisord -c /etc/supervisor.conf

注:config.json配置修改后需要重载

# supervisorctl reload

目前为止pyspider已安装完成

登陆pyspider

http://ip:5555/

排错:

ImportError: pycurl: libcurl link-time ssl backend (nss) is different from compile-time ssl backend (none/other)

# pip uninstall pycurl

# export PYCURL_SSL_LIBRARY=nss

# pip install pycurl

ImportError: No module named _sqlite3

# find / -name _sqlite*.so

/usr/lib64/python2.7/lib-dynload/_sqlite3.so

/usr/lib64/python2.7/site-packages/_sqlitecache.so

# cp /usr/lib64/python2.7/lib-dynload/_sqlite3.so /python2.7/lib/python2.7/lib-dynload/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值