一、mahout 简单例子测试
mahout 安装配置可以参考:mahout安装配置
1、kmeans 聚类算法测试数据来源:
地址:http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
2、下载数据,把数据存放到hdfs上(hadoop2.6.1 已经启动)
创建测试目录testdata,并把数据导入到这个tastdata目录中(这里的目录的名字只能是testdata)
$ hdfs dfs -mkdir testdata
$ hdfs dfs -put /home/lin/hadoop/mahout-distribution-0.10.0/test.data testdata
3、执行kmeans算法,等待运行结果
$ hadoop jar /home/lin/hadoop/mahout-distribution-0.10.0/mahout-examples-0.10.0-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
4、运行成功查看运行结果
hdfs dfs -ls output
显示如下结果证明运行成功:
lin@lin162:~/hadoop/hadoop-2.6.1/etc/hadoop$ hdfs dfs -ls output
Found 15 items
-rw-r--r-- 2 lin supergroup 194 2015-12-01 12:27 output/_policy
drwxr-xr-x - lin supergroup 0 2015-12-01 12:27 output/clusteredPoints
drwxr-xr-x - lin supergroup 0 2015-12-01 12:22 output/clusters-0
drwxr-xr-x - lin supergroup 0 2015-12-01 12:23 output/clusters-1
drwxr-xr-x - lin supergroup 0 2015-12-01 12:27 output/clusters-10-final
drwxr-xr-x - lin supergroup 0 2015-12-01 12:23 output/clusters-2
drwxr-xr-x - lin supergroup 0 2015-12-01 12:24 output/clusters-3
drwxr-xr-x - lin supergroup 0 2015-12-01 12:24 output/clusters-4
drwxr-xr-x - lin supergroup 0 2015-12-01 12:25 output/clusters-5
drwxr-xr-x - lin supergroup 0 2015-12-01 12:25 output/clusters-6
drwxr-xr-x - lin supergroup 0 2015-12-01 12:25 output/clusters-7
drwxr-xr-x - lin supergroup 0 2015-12-01 12:26 output/clusters-8
drwxr-xr-x - lin supergroup 0 2015-12-01 12:26 output/clusters-9
drwxr-xr-x - lin supergroup 0 2015-12-01 12:22 output/data
drwxr-xr-x - lin supergroup 0 2015-12-01 12:22 output/random-seeds