相对于机器学习,关联规则的apriori算法更偏向于数据挖掘。
1) 测试文档中调用weka的关联规则apriori算法,如下
try{
File file= new File("F:\\tools/lib/data/contact-lenses.arff");
ArffLoader loader= newArffLoader();
loader.setFile(file);
Instances m_instances=loader.getDataSet();
Discretize discretize= newDiscretize();
discretize.setInputFormat(m_instances);
m_instances=Filter.useFilter(m_instances, discretize);
Apriori apriori= newApriori();
apriori.buildAssociations(m_instances);
System.out.println(apriori.toString());
}catch(Exception e) {
e.printStackTrace();
}
步骤
1 读取数据集data,并提取样本集instances
2 离散化属性Discretize
3 创建Apriori 关联规则模型
4 输出大频率项集和关联规则集
2) 创建分类器的时候,调用设置默认参数方法
public voidresetOptions() {
m_removeMissingCols= false;
m_verbose= false;
m_delta= 0.05;
m_minMetric= 0.90;
m_numRules= 10;
m_lowerBoundMinSupport= 0.1;
m_upperBoundMinSupport= 1.0;
m_significanceLevel= -1;
m_outputItemSets= false;
m_car= false;
m_classIndex= -1;
}
参数详细解析,见后面的备注1
3)buildAssociations方法的解析,源码如下
public voidbuildAssociations(Instances instances) throws Exception {double[] confidences, supports;int[] indices;
FastVector[] sortedRuleSet;int necSupport = 0;
instances= newInstances(instances);if(m_removeMissingCols) {
instances=removeMissingColumns(instances);
}if (m_car && m_metricType !=CONFIDENCE)throw new Exception("For CAR-Mining metric type has to be confidence!");//only set class index if CAR is requested
if(m_car) {if (m_classIndex == -1) {
instances.setClassIndex(instances.numAttributes()- 1);
}else if (m_classIndex <= instances.numAttributes() && m_classIndex > 0) {
instances.setClassIndex(m_classIndex- 1);
}else{throw new Exception("Invalid class index.");
}
}//can associator handle the data?
getCapabilities().testWithFail(instances);
m_cycles= 0;//make sure that the lower bound is equal to at least one instance
double lowerBoundMinSupportToUse =(m_lowerBoundMinSupport* instances.numInstances() < 1.0) ? 1.0 /instances.numInstances()
: m_lowerBoundMinSupport;if(m_car) {//m_instances does not contain the class attribute
m_instances = LabeledItemSet.divide(instances, false);//m_onlyClass contains only the class attribute
m_onlyClass = LabeledItemSet.divide(instances, true);
}elsem_instances=instances;if (m_car && m_numRules ==Integer.MAX_VALUE) {//Set desired minimum support
m_minSupport =lowerBoundMinSupportToUse;
}else{//Decrease minimum support until desired number of rules foun