Kmeans数据为 http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
1、启动hadoop
$HADOOP_HOME/bin/start-all.sh
将数据拷贝到HDFS中
$HADOOP_HOME/bin/hadoop fs -mkdir testdata
$HADOOP_HOME/bin/hadoop fs -put <PATH TO synthetic_control.data> testdata
(HDFS input directory name should be testdata)
2、运行mahout
$MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
3、查看结果
$HADOOP_HOME/bin/hadoop fs -ls output
$mahout seqdumper -i output/clusters-9/part-r-00000 -o ~/data/aaa/ttt
(该将part-r-00000从分布式上导入到本地)