Maven编译安装Mahout for Hadoop2

最新推荐文章于 2020-05-22 15:59:44 发布

猫又全宗

最新推荐文章于 2020-05-22 15:59:44 发布

阅读量754

点赞数

分类专栏： hadoop mahout 文章标签： hadoop mahout maven

本文链接：https://blog.csdn.net/dusizhong/article/details/41893595

版权

hadoop 同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

mahout

1 篇文章 0 订阅

订阅专栏

0 准备工作

安装Mahout的主机需要安装有JDK、Hadoop执行环境和Maven管理工具。

1 编译安装Mahout 1.0

Mahout 0.9默认不支持hadoop2，要使用必须修改dependency。

下载源码：

git clone git@github.com:apache/mahout.git

进入到mahout根目录，编译：

mvn -Dhadoop.version=2.5.1 clean compile

打包：

mvn -Dhadoop.version=2.5.1 -DskipTests=true clean package

安装（可选）：

mvn -Dhadoop.version=2.5.1 clean install

添加环境变量：

export JAVA_HOME=/usr/java/jdk1.8.0_25
export M2_HOME=/usr/apache-maven
export M2=$M2_HOME/bin
export HADOOP_HOME=/usr/hadoop
export MAHOUT_HOME=/usr/mahout
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
export PATH=$MAHOUT_HOME/bin:$M2_HOME/bin:/usr/hadoop/bin:/usr/hadoop/sbin:$HIVE_HOME/bin:$JAVA_HOME/bin:$PATH
export CLASSPATH=.:/usr/hadoop/share/hadoop/common/:/usr/hadoop/share/hadoop/common/lib/:/usr/hive/lib/:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

测试：

[root@hive ~]# mahout
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/hadoop/etc/hadoop
MAHOUT-JOB: /usr/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
An example program must be given as the first argument.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  buildforest: : Build the random forest classifier
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump: : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters…

wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

hdfs dfs -mkdir testdata
hdfs dfs -put synthetic_control.data testdata/
hadoop jar /usr/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

mahout clusterdump -i output/clusters-10-final -p output/clusteredPoints -o ~/test

2 参考

http://mahout.apache.org/developers/buildingmahout.html

http://mahout.apache.org/users/basics/quickstart.html

猫又全宗

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Maven编译安装Mahout for Hadoop2

0 准备工作安装Mahout的主机需要安装有JDK、Hadoop执行环境和Maven管理工具。1 编译安装Mahout下载源码：git clone [email protected]:apache/mahout.git进入到mahout根目录，编译：mvn clean compile打包：mvn -DskipTests=true clean package安装
复制链接

扫一扫