相对于快速安装来说,编译安装可能要相对麻烦点,首先,编译安装需要我们确保系统中已经安装了jdk1.6和maven3.0以上的版本才行。
具体安装这里不说了:
[root@localhostQiumingLu]#
mvn -version
ApacheMaven 3.1.1 (0728685237757ffbf44136acec0402957f723d9a; 2013-09-1723:22:22+0800)
Mavenhome: /home/QiumingLu/mycloud/maven/apache-maven-3.1.1
Javaversion: 1.7.0_51, vendor: Oracle Corporation
Javahome: /home/QiumingLu/mycloud/jdk/jdk1.7.0_51/jre
Defaultlocale: zh_CN, platform encoding: UTF-8
OSname: "linux", version: "2.6.32-431.11.2.el6.x86_64",arch: "amd64", family: "unix"
安装mahout
下载最新的源码
通过mahout的svn库来下载当前Mahout的最新版本,Mahout将被下载到当前的目录中:
svn co http://svn.apache.org/repos/asf/mahout/trunk
执行安装
进入mahout的根目录,输入命令安装:
cd trunk
mvn install
这个过程相对来说比较长。
采用svn下载的Mahout最新源码有很多好处,比快速安装的例子多。
配置环境变量
[root@localhostQiumingLu]#
cat etc/profile
#setjava environment
exportJAVA_HOME=/home/QiumingLu/mycloud/jdk/jdk1.7.0_51
exportJRE_HOME=/home/QiumingLu/mycloud/jdk/jdk1.7.0_51/jre
exportCLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
exportPATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
exportHADOOP_HOME=/home/QiumingLu/hadoop-2.4.0
exportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
exportHADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
exportHADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
exportHADOOP_PREFIX=/home/QiumingLu/hadoop-2.4.0
exportHADOOP_COMMON_HOME=${HADOOP_PREFIX}
exportHADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop
exportMAHOUT_HOME=/home/QiumingLu/mycloud/trunk
exportMAHOUT_CONF_DIR=/home/QiumingLu/mycloud/trunk/src/conf
exportPATH=${MAHOUT_HOME}/conf:${MAHOUT_HOME}/bin:$PATH
exportMAVEN_HOME=/home/QiumingLu/mycloud/maven/apache-maven-3.1.1
exportPATH=$PATH:$MAVEN_HOME/bin
二:聚类测试
Synthetic_control.data数据集下载地址
http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
该数据集含有600行60列数据,每100行代表一种趋势图,共6种,具体说明参看:
SyntheticControl Chart Time Series
运行例程指南
Example:Synthetic control data
注意使用synthetic_control.data测试时要先上传至hdfs:localhost:9000/user/root/testdata
命令行使用方法
$MAHOUT_HOME/bin/mahoutorg.apache.mahout.clustering.syntheticcontrol.${clustering.type}.Job
${clustering.type}可以是canopy、kmeans、fuzzykmeans、dirichlet其中一种,比如使用kmeans使用如下命令:
$MAHOUT_HOME/bin/mahoutorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job
[root@localhosthadoop-2.4.0]#
bin/hdfsdfs -mkdir -p /user/root/testdata
14/04/1805:48:47 WARN util.NativeCodeLoader: Unable to load native-hadooplibrary for your platform... using builtin-java classes whereapplicable
[root@localhosthadoop-2.4.0]#
bin/hdfsdfs -put synthetic_control.data /user/hadoop/testdata/
[root@localhosthadoop-2.4.0]# cd ..
[root@localhostQiumingLu]# cd mycloud/trunk/
[root@localhosttrunk]#
bin/mahoutorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job