1.下载数据样本
http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
2.将此数据文件拷贝到$MAHOUT_HOME
3.启动hadoop
start-all.sh
4.
hadoop fs -mkdir testdata
5.将数据上传到hdfs
hadoop fs -put $MAHOUT_HOME/synthetic_control.data /user/hadoop/testdata
6.
hadoop jar $MAHOUT_HOME/mahout-examples-0.9-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
7.hdfs的output文件夹里会有很多clusters