ClusterDumper输出聚类中心点

最新推荐文章于 2024-07-22 06:30:00 发布

奥康姆剃刀

最新推荐文章于 2024-07-22 06:30:00 发布

阅读量1.2k

点赞数

文章标签： ClusterDumper Mahout KMeans 保存聚类结果

本文链接：https://blog.csdn.net/zhoujianfeng3/article/details/37567037

版权

Mahout中关于KMeans方法调用的代码：

Path directoryContainingConvertedInput = new Path(output, DIRECTORY_CONTAINING_CONVERTED_INPUT);
log.info("Preparing Input");
InputDriver.runJob(input, directoryContainingConvertedInput, "org.apache.mahout.math.RandomAccessSparseVector");
log.info("Running random seed to get initial clusters");
Path clusters = new Path(output, "random-seeds");
clusters = RandomSeedGenerator.buildRandom(conf, directoryContainingConvertedInput, clusters, k, measure);
log.info("Running KMeans with k = "+ k);
KMeansDriver.run(conf, directoryContainingConvertedInput, clusters, output, convergenceDelta,
maxIterations, true, 0.0, false);
// run ClusterDumper
Path outGlob = new Path(output, "clusters-*-final");
Path clusteredPoints = new Path(output,"clusteredPoints");
log.info("Dumping out clusters from clusters:"+outGlob+" and clusteredPoints: "+ clusteredPoints);

// ClusterDumper clusterDumper = new ClusterDumper(outGlob, clusteredPoints);
clusterDumper.printClusters(null);

这儿初始化聚类中心点是随机选择的，通过调用 KMeansDriver 的run方法；我们可以得到了结果文件clusters-*-final 和 clusteredPoints 。但是文件都是二进制文件，没法直接查看；我们可以通过Mahout 提供了 ClusterDumper 类，调用它的打印方法查看结果。很多时候，由于聚类的样本空间比较大，直接打印效果并不理想；那有没有保存至文本中的方法呢？答案是有的。

网上已有的解决方案：

参考地址 = 》 http://blog.csdn.net/fansy1990/article/details/17589287

不过查看Mahout官网资料的时候，发现它提供了解决方式：

--seqFileDir <MAHOUT_HOME>/examples/output/clusters-10 
--pointsDir <MAHOUT_HOME>/examples/output/clusteredPoints 
--output <MAHOUT_HOME>/examples/output/clusteranalyze.txt

官网地址 = 》 http://mahout.apache.org/users/clustering/cluster-dumper.html

查看了 ClusterDumper 源代码，发现并没有 output 属性；但是发现 ClusterDumper 的父类 AbstractJob 有两个关键属性 outputPath 和 outputFile；所以我想在实例化ClusterDumper 构造函数的时候设置 outputFile 或 outputPath 属性。

/**
* 保存至本地文件
* @param seqFileDir
* @param pointsDir
* @param outputPath
*/
public ClusterDumper(Path seqFileDir, Path pointsDir,String outputPath) {
super();
this.seqFileDir = seqFileDir;
this.pointsDir = pointsDir;
try {
this.outputFile = new File(outputPath);
} catch (Exception e) {
// TODO Auto-generated catch block
System.out.println("********************* something was wrong....."+ e.getMessage());
}
init();
}