依赖软件
- Spark
- Zookeeper
- Hbase
- Elasticsearch
下载PredictionIO并解压到相应的目录下
1.下载地址
2.解压到安装目录
#access youself install directory
tar zxvf apache-predictionio-0.10.0-incubating.tar
$ ./make-distribution.sh
(下载pio-assembly-0.10.0-incubating.jar(108M),但是非常的慢,建议自己下载好依赖包放到解压后的apache-predictionio-0.10.0-incubating/lib中。)
配置参数
修改PredictionIO配置文件PredictionIO-0.10.0-incubating/conf/pio-env.sh
cp pio-env.sh.template pio-env.sh
vim PredictionIO-0.10.0-incubating/conf/pio-env.sh
pio-env.sh文件在使用Hbase作为存储,Elasticsearch作为检索,hdfs为本地文件系统时,配置如下表所示:
# PredictionIO Main Configuration
# This section controls core behavior of PredictionIO. It is very likely that
# you need to change these to fit your site.
# SPARK_HOME: Apache Spark is a hard dependency and must be configured.
SPARK_HOME=/home/spark/workspace/spark-1.6.3-bin-hadoop2.6
# ES_CONF_DIR: You must configure this if you have advanced configuration for
# your Elasticsearch setup.
ES_CONF_DIR=/home/spark/workspace/elasticsearch-1.4.4
# HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO
# with Hadoop 2.
HADOOP_CONF_DIR=/home/spark/workspace/hadoop-2.6.0
# HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO
# with HBase on a remote cluster.
HBASE_CONF_DIR=/home/spark/workspace/hbase-1.0.0/conf
# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
# PredictionIO Storage Configuration
#
# This section controls programs that make use of PredictionIO's built-in
# storage facilities. Default values are shown below.
#
# For more information on storage configuration please refer to
# http://predictionio.incubator.apache.org/system/anotherdatastore/
# Storage Repositories
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
# Elasticsearch Example
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch_cluster_name
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=master
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/home/spark/workspace/elasticsearch-1.4.4
# Local File System Example
PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
# HBase Example
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=/home/spark/workspace/hbase-1.0.0
设置用户环境变量
在~/.bashrc中添加如下内容
#prodictionio envirenment variables
export PIO_HOME=/home/spark/workspace/predictionio-0.10.0
export PATH=$PIO_HOME/bin:$PATH
更新.bashrc文件
$ service ~/.bashrc
检查环境变量设置正确性
$ pio version
0.10.0-incubating
修改PredictionIO-0.10.0/bin中pio-start-all和pio-stop-all中代码
注释掉pio-start-all和pio-stop-all里面关于progressSQL的代码
(PredictionIO默认的存储数据库是progresssql,我们设置的是Hbase,要不会要求输入密码,有报错出现)
#PGSQL
:<<!
pgsqlStatus="$(ps auxwww | grep postgres | wc -l)"
if [[ "$pgsqlStatus" < 5 ]]; then
# Detect OS
OS=`uname`
if [[ "$OS" = "Darwin" ]]; then
pg_cmd=`which pg_ctl`
if [[ "$pg_cmd" != "" ]]; then
pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
fi
elif [[ "$OS" = "Linux" ]]; then
sudo service postgresql start
else
echo -e "\033[1;31mYour OS $OS is not yet supported for automatic postgresql startup:(\033[0m"
echo -e "\033[1;31mPlease do a manual startup!\033[0m"
${PIO_HOME}/bin/pio-stop-all
exit 1
fi
fi
!
启动ProdictionIO
$ pio-start-all
starting Elasticsearch…
starting HBase…
starting master, logging to /home/abc/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/bin/../logs/hbase-abc- master-yourhost.local.out
waiting 10 seconds for HBase to fully initialize…
starting PredictionIO Event Server…
查看PredictionIO启动情况
$ jps -l
( 应该包含有以下进程:)
org.apache.hadoop.hbase.master.HMaster
org.apache.predictionio.tools.console.Console
org.elasticsearch.bootstrap.Elasticsearch
查看PredictionIO是否正常启动
$pio status