1、启动Hadoop
2、下载测试数据
http://archive.ics.uci.edu/ml/databases/synthetic_control/链接中的synthetic_control.data
或者百度一下也很容易找到这个示例数据。
3、上传测试数据
hadoop fs -put synthetic_control.data testdata
4、 使用Mahout中的kmeans聚类算法,执行命令:
mahout -core org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
花费9分钟左右完成聚类 。
5、查看聚类结果
执行hadoop fs -ls /user/root/output,查看聚类结果。
- [jediael@mastermahout-distribution-0.9]$hadoopfs-lsoutput
- Found15items
- -rw-r--r--2jediaelsupergroup1942015-03-0715:07/user/jediael/output/_policy
- drwxr-xr-x-jediaelsupergroup02015-03-0715:07/user/jediael/output/clusteredPoints
- drwxr-xr-x-jediaelsupergroup02015-03-0715:02/user/jediael/output/clusters-0
- drwxr-xr-x-jediaelsupergroup02015-03-0715:02/user/jediael/output/clusters-1
- drwxr-xr-x-jediaelsupergroup02015-03-0715:07/user/jediael/output/clusters-10-final
- drwxr-xr-x-jediaelsupergroup02015-03-0715:03/user/jediael/output/clusters-2
- drwxr-xr-x-jediaelsupergroup02015-03-0715:03/user/jediael/output/clusters-3
- drwxr-xr-x-jediaelsupergroup02015-03-0715:04/user/jediael/output/clusters-4
- drwxr-xr-x-jediaelsupergroup02015-03-0715:04/user/jediael/output/clusters-5
- drwxr-xr-x-jediaelsupergroup02015-03-0715:05/user/jediael/output/clusters-6
- drwxr-xr-x-jediaelsupergroup02015-03-0715:05/user/jediael/output/clusters-7
- drwxr-xr-x-jediaelsupergroup02015-03-0715:06/user/jediael/output/clusters-8
- drwxr-xr-x-jediaelsupergroup02015-03-0715:07/user/jediael/output/clusters-9
- drwxr-xr-x-jediaelsupergroup02015-03-0715:02/user/jediael/output/data
- drwxr-xr-x-jediaelsupergroup02015-03-0715:02/user/jediael/output/random-seeds