weka+em算法+java使用_Weka:call for the EM algorithm to achieve clustering.(EM算法)

本文档展示了如何使用Java结合Weka库实现EM(期望最大化)算法进行数据聚类。首先,从文件中加载数据,然后通过Remove过滤器去除特定属性。接着,创建并配置EM聚类器,设定最大迭代次数和簇的数量,最后执行聚类并打印评估结果。
摘要由CSDN通过智能技术生成

packageEMAlg;import java.io.*;import weka.core.*;importweka.filters.Filter;importweka.filters.unsupervised.attribute.Remove;import weka.clusterers.*;public classEMAlg {publicEMAlg() {//TODO Auto-generated constructor stub

System.out.println("this is the EMAlg");

}public static void main(String[] args) throwsException {//TODO Auto-generated method stub

String file="C:\\Program Files/DataMining/Weka-3-6-10/data/labor.arff";

FileReader FReader=newFileReader(file);

BufferedReader Reader= newBufferedReader(FReader);

Instances data=newInstances(Reader);

data.setClassIndex(data.numAttributes()-1);//设置最后一个属性作为分类属性

Remove filter=newRemove();

System.out.println("''+data.classIndex()的输出内容是:"+""+data.classIndex());

System.out.println("读取数据的属性个数一共有:"+data.numAttributes()+"个.");

filter.setAttributeIndices(""+(data.classIndex()+1));/*filter.setAttributeIndices();

* Set which attributes are to be deleted (or kept if invert is true)

* 用来设置哪一个属性应该被删除的方法。

* Parameters:

* rangeList - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.

* eg: first-3,5,6-last*/filter.setInputFormat(data);/** public boolean setInputFormat(Instances instanceInfo)throws java.lang.Exception

* Sets the format of the input instances(设置输入数据的格式). If the filter is able to determine the output format before seeing any input instances, it does so here(如果过滤器在查看任何输入文件之前可以决定 输入文件的格式,那么这个函数就放在这里).

* This default implementation clears the output format and output queue, and the new batch flag is set.

* Overriders should call super.setInputFormat(Instances)

* Parameters:

* instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).

* Returns:

* true if the outputFormat may be collected immediately

* Throws:

* java.lang.Exception - if the inputFormat can't be set successfully*/Instances dataCluster=Filter.useFilter(data, filter);/*public static Instances useFilter(Instances data,Filter filter)throws java.lang.Exception

* Filters an entire set of instances through a filter and returns the new set.

* 传入两个参数,第一个是需要进行过滤的数据,第二个是使用的过滤器,返回只为新的数据集。

* Parameters:

* data - the data to be filtered

* filter - the filter to be used

* Returns:

* the filtered set of data

* Throws:

* java.lang.Exception - if the filter can't be used successfully*/EM clusterer=newEM();/** public class EM

* extends RandomizableDensityBasedClusterer

* implements NumberOfClustersRequestable, WeightedInstancesHandler

* Simple EM (expectation maximisation) class.

* EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. EM can decide how many clusters to create by cross validation, or you may specify apriori how many clusters to generate.

* The cross validation performed to determine the number of clusters is done in the following steps:

* 1. the number of clusters is set to 1

* 2. the training set is split randomly into 10 folds.

* 3. EM is performed 10 times using the 10 folds the usual CV way.

* 4. the loglikelihood is averaged over all 10 results.

* 5. if loglikelihood has increased the number of clusters is increased by 1 and the program continues at step 2.

* The number of folds is fixed to 10, as long as the number of instances in the training set is not smaller 10. If this is the case the number of folds is set equal to the number of instances.

* Valid options are:

* -N

* number of clusters. If omitted or -1 specified, then cross validation is used to select the number of clusters.

* -I

* max iterations.(default 100)

* -V

* verbose.

* -M

* minimum allowable standard deviation for normal density computation

* (default 1e-6)

* -O

* Display model in old format (good when there are many clusters)

* -S

* Random number seed.(default 100)*/String [] options=new String[4];//max. iterations//最大迭代次数

options[0] = "-I";

options[1] = "100";//set cluster numbers,设置簇的个数

options[2]="-N";

options[3]="2";

clusterer.setOptions(options);

clusterer.buildClusterer(dataCluster);//clusterer.buildClusterer(dataClusterer);//evaluate clusterer

ClusterEvaluation eval = newClusterEvaluation();

eval.setClusterer(clusterer);

eval.evaluateClusterer(data);//print results

System.out.println("数据总数:"+data.numInstances()+"属性个数为:"+data.numAttributes());

System.out.println(eval.clusterResultsToString());

}

}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值