mahout聚类结果的输出和可视化

最新推荐文章于 2023-02-15 11:20:18 发布

wanghailong000

最新推荐文章于 2023-02-15 11:20:18 发布

阅读量2.5k

点赞数

分类专栏： mahout 文章标签：机器学习 mahout

本文链接：https://blog.csdn.net/wanghailong000/article/details/53413949

版权

mahout 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1、在mahout中，org.apache.mahout.utils.clustering.ClusterDumper类可以将聚类结果输出，如果是打印在控制台，则可以使用：

ClusterDumper clusterdumper=new ClusterDumper(sequentialfile,clusterpoints);
clusterdumper.printClusters(null);

其中第一个参数表示聚类结果的簇中心序列化的文件路径的path类，第二个参数表示聚类结果的中心序列化文件路径的path类

如果要输出到文件，则可以在控制台通过命令运行该ClusterDumper.java文件，如何要在eclipse中运行的话，则给ClusterDumper.java添加所需要的参数，然后run即可，参数说明如下：

--help                               Print out help 
--input (-i) input                   The directory containing Sequence
                                       Files for the Clusters   （聚类结果的序列化的簇中心文件路径）    
--output (-o) output                 The output file.  If not specified,（反序列化后的结果输出路径）
                                       dumps to the console.
--outputFormat (-of) outputFormat    The optional output format to write
                                       the results as. Options: TEXT, CSV, or GRAPH_ML       
--substring (-b) substring           The number of chars of the     
                       asFormatString() to print    
--pointsDir (-p) pointsDir           The directory containing points  
                                       sequence files mapping input vectors
                                       to their cluster.  If specified, 
                                       then the program will output the 
                                       points associated with a cluster （聚类结果的数据点序列化文件）
--dictionary (-d) dictionary         The dictionary file.
--dictionaryType (-dt) dictionaryType    The dictionary file type       
                                     (text|sequencefile)
--distanceMeasure (-dm) distanceMeasure  The classname of the DistanceMeasure.
                                           Default is SquaredEuclidean.
--numWords (-n) numWords             The number of top terms to print 
--tempDir tempDir                    Intermediate output directory
--startPhase startPhase              First phase to run
--endPhase endPhase                  Last phase to run
--evaluate (-e)                      Run ClusterEvaluator and CDbwEvaluator over the
                                      input. The output will be appended to the rest of
                                      the output at the end.

其中红色的部分参数是必须的。

2、可视化聚类结果：

在mahout源码中，org.apache.mahout.clustering.display包下有对应的可视化类，之间运行即可看到结果，是用java swing写的

wanghailong000

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
mahout聚类结果的输出和可视化

1、在mahout中，org.apache.mahout.utils.clustering.ClusterDumper类可以将聚类结果输出，如果是打印在控制台，则可以使用： ClusterDumper clusterdumper=new ClusterDumper(sequentialfile,clusterpoints); clusterdumper.printClusters
复制链接

扫一扫