apache madlib 教程_madlib centos yum 包安装

使用centos 测试安装madlib sql 机器学习类库

安装步骤

添加pg 10 repo

yum install https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm

安装pg python 基本包

yum -y install postgresql10-plpython supervisor

安装madlib 依赖包

注意python 版本,我使用python 2.7 安装时候失败了,修改为了python34

yum update -y && yum install -y \

git \

gcc \

wget \

postgresql10-devel \

openssl \

m4 \

vim \

flex \

bison \

graphviz \

java \

epel-release \

python34-devel

安装pip 包

默认一般是包含的

yum install -y python34-pip

pg_conf 配置(环境变量)

PATH="$PATH:/usr/pgsql-10/bin"

安装python 依赖(通过pip)

pip3 install awscli pygresql paramiko --upgrade

安装apache-madlib

下载rpm 包

wget https://dist.apache.org/repos/dist/release/madlib/1.15.1/apache-madlib-1.15.1-bin-Linux.rpm

安装

yum install -y apache-madlib-1.15.1-bin-Linux.rpm

启动pg 数据库

/usr/pgsql-10/bin/postgresql-10-setup initdb

systemctl enable postgresql-10

systemctl start postgresql-10

修改pg_hba.conf 添加访问支持

之后修改之后,需要重启服务,systemctl restart postgresql-10

/var/lib/pgsql/10/data/pg_hba.conf

修改如下:

# "local" is for Unix domain socket connections only

local all all trust

# IPv4 local connections:

host all all 127.0.0.1/32 trust

# IPv6 local connections:

host all all ::1/128 trust

初始化madlib 服务

/usr/local/madlib/bin/madpack -s madlib -p postgres -c postgres@localhost:5432/postgres install

安装结果如下:

madpack.py: INFO : Detected PostgreSQL version 10.6.

madpack.py: INFO : *** Installing MADlib ***

madpack.py: INFO : MADlib tools version = 1.15.1 (/usr/local/madlib/Versions/1.15.1/bin/../madpack/madpack.py)

madpack.py: INFO : MADlib database version = None (host=localhost:5432, db=postgres, schema=madlib)

madpack.py: INFO : Testing PL/Python environment...

madpack.py: INFO : > Creating language PL/Python...

madpack.py: INFO : > PL/Python environment OK (version: 2.7.5)

madpack.py: INFO : > Preparing objects for the following modules:

madpack.py: INFO : > - array_ops

madpack.py: INFO : > - bayes

madpack.py: INFO : > - crf

madpack.py: INFO : > - elastic_net

madpack.py: INFO : > - linalg

madpack.py: INFO : > - pmml

madpack.py: INFO : > - prob

madpack.py: INFO : > - sketch

madpack.py: INFO : > - svec

madpack.py: INFO : > - svm

madpack.py: INFO : > - tsa

madpack.py: INFO : > - stemmer

madpack.py: INFO : > - conjugate_gradient

madpack.py: INFO : > - knn

madpack.py: INFO : > - lda

madpack.py: INFO : > - stats

madpack.py: INFO : > - svec_util

madpack.py: INFO : > - utilities

madpack.py: INFO : > - assoc_rules

madpack.py: INFO : > - convex

madpack.py: INFO : > - glm

madpack.py: INFO : > - graph

madpack.py: INFO : > - linear_systems

madpack.py: INFO : > - recursive_partitioning

madpack.py: INFO : > - regress

madpack.py: INFO : > - sample

madpack.py: INFO : > - summary

madpack.py: INFO : > - kmeans

madpack.py: INFO : > - pca

madpack.py: INFO : > - validation

madpack.py: INFO : Installing MADlib:

madpack.py: INFO : > Created madlib schema

madpack.py: INFO : > Created madlib.MigrationHistory table

madpack.py: INFO : > Wrote version info in MigrationHistory table

madpack.py: INFO : MADlib 1.15.1 installed successfully in madlib schema.

检查madlib 服务的安装

/usr/local/madlib/bin/madpack -s madlib -p postgres -c postgres@localhost:5432/postgres install-check

输出结果如下:

madpack.py: INFO : Detected PostgreSQL version 10.6.

TEST CASE RESULT|Module: bayes|bayes.ic.sql_in|PASS|Time: 117 milliseconds

TEST CASE RESULT|Module: crf|crf_train_small.ic.sql_in|PASS|Time: 112 milliseconds

TEST CASE RESULT|Module: crf|crf_test_small.ic.sql_in|PASS|Time: 131 milliseconds

TEST CASE RESULT|Module: elastic_net|elastic_net.ic.sql_in|PASS|Time: 123 milliseconds

TEST CASE RESULT|Module: linalg|linalg.ic.sql_in|PASS|Time: 43 milliseconds

TEST CASE RESULT|Module: linalg|svd.ic.sql_in|PASS|Time: 151 milliseconds

TEST CASE RESULT|Module: linalg|matrix_ops.ic.sql_in|PASS|Time: 231 milliseconds

TEST CASE RESULT|Module: prob|prob.ic.sql_in|PASS|Time: 21 milliseconds

TEST CASE RESULT|Module: svm|svm.ic.sql_in|PASS|Time: 151 milliseconds

TEST CASE RESULT|Module: tsa|arima.ic.sql_in|PASS|Time: 130 milliseconds

TEST CASE RESULT|Module: conjugate_gradient|conj_grad.ic.sql_in|PASS|Time: 35 milliseconds

TEST CASE RESULT|Module: knn|knn.ic.sql_in|PASS|Time: 107 milliseconds

TEST CASE RESULT|Module: lda|lda.ic.sql_in|PASS|Time: 109 milliseconds

TEST CASE RESULT|Module: stats|correlation.ic.sql_in|PASS|Time: 93 milliseconds

TEST CASE RESULT|Module: stats|f_test.ic.sql_in|PASS|Time: 25 milliseconds

TEST CASE RESULT|Module: stats|robust_and_clustered_variance_coxph.ic.sql_in|PASS|Time: 116 milliseconds

TEST CASE RESULT|Module: stats|pred_metrics.ic.sql_in|PASS|Time: 98 milliseconds

TEST CASE RESULT|Module: stats|wsr_test.ic.sql_in|PASS|Time: 29 milliseconds

TEST CASE RESULT|Module: stats|mw_test.ic.sql_in|PASS|Time: 26 milliseconds

TEST CASE RESULT|Module: stats|cox_prop_hazards.ic.sql_in|PASS|Time: 101 milliseconds

TEST CASE RESULT|Module: stats|ks_test.ic.sql_in|PASS|Time: 28 milliseconds

TEST CASE RESULT|Module: stats|chi2_test.ic.sql_in|PASS|Time: 28 milliseconds

TEST CASE RESULT|Module: stats|t_test.ic.sql_in|PASS|Time: 26 milliseconds

TEST CASE RESULT|Module: stats|anova_test.ic.sql_in|PASS|Time: 29 milliseconds

TEST CASE RESULT|Module: utilities|minibatch_preprocessing.ic.sql_in|PASS|Time: 104 milliseconds

TEST CASE RESULT|Module: utilities|pivot.ic.sql_in|PASS|Time: 74 milliseconds

TEST CASE RESULT|Module: utilities|path.ic.sql_in|PASS|Time: 88 milliseconds

TEST CASE RESULT|Module: utilities|sessionize.ic.sql_in|PASS|Time: 74 milliseconds

TEST CASE RESULT|Module: utilities|text_utilities.ic.sql_in|PASS|Time: 78 milliseconds

TEST CASE RESULT|Module: utilities|utilities.ic.sql_in|PASS|Time: 77 milliseconds

TEST CASE RESULT|Module: utilities|transform_vec_cols.ic.sql_in|PASS|Time: 93 milliseconds

TEST CASE RESULT|Module: utilities|encode_categorical.ic.sql_in|PASS|Time: 97 milliseconds

TEST CASE RESULT|Module: assoc_rules|assoc_rules.ic.sql_in|PASS|Time: 114 milliseconds

TEST CASE RESULT|Module: convex|mlp.ic.sql_in|PASS|Time: 226 milliseconds

TEST CASE RESULT|Module: convex|lmf.ic.sql_in|PASS|Time: 118 milliseconds

TEST CASE RESULT|Module: glm|glm.ic.sql_in|PASS|Time: 234 milliseconds

TEST CASE RESULT|Module: graph|graph.ic.sql_in|PASS|Time: 218 milliseconds

TEST CASE RESULT|Module: linear_systems|dense_linear_sytems.ic.sql_in|PASS|Time: 88 milliseconds

TEST CASE RESULT|Module: linear_systems|sparse_linear_sytems.ic.sql_in|PASS|Time: 91 milliseconds

TEST CASE RESULT|Module: recursive_partitioning|random_forest.ic.sql_in|PASS|Time: 155 milliseconds

TEST CASE RESULT|Module: recursive_partitioning|decision_tree.ic.sql_in|PASS|Time: 130 milliseconds

TEST CASE RESULT|Module: regress|clustered.ic.sql_in|PASS|Time: 115 milliseconds

TEST CASE RESULT|Module: regress|robust.ic.sql_in|PASS|Time: 99 milliseconds

TEST CASE RESULT|Module: regress|logistic.ic.sql_in|PASS|Time: 101 milliseconds

TEST CASE RESULT|Module: regress|multilogistic.ic.sql_in|PASS|Time: 98 milliseconds

TEST CASE RESULT|Module: regress|marginal.ic.sql_in|PASS|Time: 279 milliseconds

TEST CASE RESULT|Module: regress|linear.ic.sql_in|PASS|Time: 24 milliseconds

TEST CASE RESULT|Module: sample|train_test_split.ic.sql_in|PASS|Time: 76 milliseconds

TEST CASE RESULT|Module: sample|sample.ic.sql_in|PASS|Time: 21 milliseconds

TEST CASE RESULT|Module: sample|stratified_sample.ic.sql_in|PASS|Time: 76 milliseconds

TEST CASE RESULT|Module: sample|balance_sample.ic.sql_in|PASS|Time: 81 milliseconds

TEST CASE RESULT|Module: summary|summary.ic.sql_in|PASS|Time: 86 milliseconds

TEST CASE RESULT|Module: kmeans|kmeans.ic.sql_in|PASS|Time: 155 milliseconds

TEST CASE RESULT|Module: pca|pca_project.ic.sql_in|PASS|Time: 152 milliseconds

TEST CASE RESULT|Module: pca|pca.ic.sql_in|PASS|Time: 242 milliseconds

TEST CASE RESULT|Module: validation|cross_validation.ic.sql_in|PASS|Time: 110 milliseconds

添加数据库测试函数

创建数据表&&添加数据:

CREATE TABLE array_tbl (

id integer,

array1 integer[],

array2 integer[]

);

INSERT INTO "public"."array_tbl"("id","array1","array2")

VALUES

(1,E'{1,2,3,4,5,6}',E'{6,5,4,3,2,1}'),

(2,E'{1,1,0,0,99,8}',E'{0,0,0,-5,2,1}');

查看插入的结果

select * from array_tbl;

id | array1 | array2

----+----------------+----------------

1 | {1,2,3,4,5,6} | {6,5,4,3,2,1}

2 | {1,1,0,0,99,8} | {0,0,0,-5,2,1}

(2 行记录)

使用madlib API:

说明: 主要是计算数组的最大、最小值

select id,madlib.array_min(array1) min, madlib.array_max(array1) max from array_tbl;

select id,madlib.array_min(array1) min, madlib.array_max(array1) max from array_tbl;

id | min | max

----+-----+-----

1 | 1 | 6

2 | 0 | 99

(2 行记录)

说明

这个只是简单的安装,实际生产还需要好多东西需要调整,同时注意madlib schema 的安装,是对应到数据库的,当然

测试环境我们可以直接使用docker 版本的,尽管有点大,但是还是很方便的

参考资料

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值