Apriori算法

目录

Apriori算法实现... 2

一、实验背景... 2

二、算法描述... 2

1.Apriori介绍... 2

2.连接步和剪枝步... 2

3.Apriori算法的步骤... 3

4. 由频繁项集产生关联规则... 3

三、 实验目的... 4

1.4

2.4

四、 实验要求... 4

五、 实验环境... 4

1.操作系统:... 4

2.编译环境:... 4

3.编程语言:... 4

六、实验实现... 5

1.程序:... 5

2.结果截屏:... 19

七、实验总结... 19

 

  Apriori算法实现 

一、实验背景 

现在, 数据挖掘作为从数据中获取信息的有效方法, 越来越受到人们的重视。1993年,Agrawal等人首先提出关联规则概念,关联规则挖掘便迅速受到数据挖掘领域专家的广泛关注.迄今关联规则挖掘技术得到了较为深入的发展。Apriori算法是关联规则挖掘经典算法。关联规则是数据挖掘的重要研究方向,它是要找出隐藏在数据间的相互关系。关联规则的挖掘问题就是在事务数据库T中找出具有用户给定的满足一定条件的最小支持度MinS和最小置信度MinC的关联规则。

二、算法描述 

1.Apriori介绍 

Apriori算法使用频繁项集的先验知识,使用一种称作逐层搜索的迭代方法,k项集用于探索(k+1)项集。首先,通过扫描事务记录,找出所有的频繁1项集,该集合记做L1,然后利用L1找频繁2项集的集合L2,L2找L3,如此下去,直到不能再找到任何频繁k项集。最后再在所有的频繁集中找出强规则,即产生用户感兴趣的关联规则。

2.连接步和剪枝步 

在上述的关联规则挖掘过程的两个步骤中,第一步往往是总体性能的瓶颈。Apriori算法采用连接步和剪枝步两种方式来找出所有的频繁项集。 

连接步为找出Lk(所有的频繁k项集的集合),通过将Lk-1(所有的频繁k-1项集的集合)与自身连接产生候选k项集的集合。候选集合记作Ck。设l1和l2是Lk-1中的成员。记li[j]表示li中的第j项。假设Apriori算法对事务或项集中的项按字典次序排序,即对于(k-1)项集li,li*1+<li*2+<……….<li*k-1]。将Lk-1与自身连接,如果(l1[1]=l2[1])&&( l1[2]=l2*2+)&&……..&& (l1[k-2]=l2[k-2])&&(l1[k-1]<l2[k-1]),那认为l1和l2是可连接。连接l1和l2 产生的结果是,l1*1+,l1*2+,……,l1*k-1],l2[k-1]}。剪枝步 CK是LK的超集,也就是说,CK的成员可能是也可能不是频繁的。通过扫描所有的事务,确定CK中每个候选的计数,判断是否小于最小支持度计数,如果不是,则认为该候选是频繁的。

3.Apriori算法的步骤 

第一步:设定最小支持度MinS和最小置信度MinC; 

第二步:Apriori算法使用候选项集。首先产生出候选的项的集合,即候选项集,若候选项集的支持度大于或等于最小支持度,则该候选项集为频繁项集; 

第三步:在Apriori算法的过程中,首先从数据库读入所有的事务,每个项都被看作候选1-项集,得出各项的支持度,再使用频繁1-项集集合来产生候选2-项集集合,因为先验原理保证所有非频繁的1-项集的超集都是非频繁的; 

第四步:再扫描数据库,得出候选2-项集集合,再找出频繁2-项集,并利用这些频繁2-项集集合来产生候选3-项集; 

第五步:重复扫描数据库,与最小支持度比较,产生更高层次的频繁项集,再从该集合里产生下一级候选项集,直到不再产生新的候选项集为止。 

4. 由频繁项集产生关联规则 

Confidence(X->Y)=P(B|A)=support(XY)/suppor(X) 关联规则产生步骤如下: 

第一步:对于每个频繁项集l,产生其所有非空真子集; 

第二步:对于每个非空真子集s,如果support(l)/support (s)>=MinC,则输出 s->(l-s),其中,MinC是最小置信度阈值。

三、 实验目的 

1. 学会用Apriori算法对数据进行发现频繁项集和生成关联规则的挖掘,加强对Apriori算法的理解。

2.锻炼分析问题、解决问题并动手实践的能力。

四、 实验要求 

使用一种你熟悉的程序设计语言,如C++或Java,实现Apriori算法。

五、 实验环境 

1.操作系统:

Win10 操作系统

2.编译环境:

编译器eclipse、jdk 9

3.编程语言:

Java面向对象的程序设计语言

六、实验实现

1.程序:

package com.apriori;

 

importjava.util.ArrayList;

importjava.util.Collections;

importjava.util.HashMap;

import java.util.List;

import java.util.Map;

import java.util.Set;

 

publicclass Apriori {

 

         privatefinalstaticintSUPPORT = 2; // 支持度阈值

         privatefinalstaticdoubleCONFIDENCE = 0.7; // 置信度阈值

 

         privatefinalstatic String ITEM_SPLIT=","; // 的分隔符

         privatefinalstatic String CON="->"; // 的分隔符

 

         privatefinalstaticList<String> transList=newArrayList<String>(); //所有交易

 

         static{//初始化交易记录

                   transList.add("1,2,5,");

                   transList.add("2,4,");

                   transList.add("2,3,");

                   transList.add("1,2,4,");

                   transList.add("1,3,");

                   transList.add("1,2,3,");

                   transList.add("1,3,");

                   transList.add("1,2,3,5,");

                   transList.add("1,2,3,");

         }

 

       

         publicMap<String,Integer> getFC(){

        Map<String,Integer> frequentCollectionMap=newHashMap<String,Integer>();//所有的繁集

 

        frequentCollectionMap.putAll(getItem1FC());

 

        Map<String,Integer> itemkFcMap=newHashMap<String,Integer>();

        itemkFcMap.putAll(getItem1FC());

        while(itemkFcMap!=null&&itemkFcMap.size()!=0){

          Map<String,Integer> candidateCollection=getCandidateCollection(itemkFcMap);

          Set<String> ccKeySet=candidateCollection.keySet();

 

          //项进行累加

          for(String trans:transList){

             for(String candidate:ccKeySet){

                      booleanflag=true;// 用来判断交易中是否出现该选项,如果出数加1

                      String[] candidateItems=candidate.split(ITEM_SPLIT);

                      for(String candidateItem:candidateItems){

                               if(trans.indexOf(candidateItem+ITEM_SPLIT)==-1){

                                         flag=false;

                                         break;

                               }

                      }

                      if(flag){

                               Integer count=candidateCollection.get(candidate);

                               candidateCollection.put(candidate, count+1);

                      }

             }

          }

 

          //从候集中找到符合支持度的繁集

          itemkFcMap.clear();

          for(String candidate:ccKeySet){

             Integer count=candidateCollection.get(candidate);

             if(count>=SUPPORT){

                 itemkFcMap.put(candidate, count);

             }

          }

 

          //合并所有繁集

          frequentCollectionMap.putAll(itemkFcMap);

 

        }

 

        returnfrequentCollectionMap;

         }

 

       

         privateMap<String,Integer> getCandidateCollection(Map<String,Integer> itemkFcMap){

                   Map<String,Integer> candidateCollection=newHashMap<String,Integer>();

                   Set<String> itemkSet1=itemkFcMap.keySet();

                   Set<String> itemkSet2=itemkFcMap.keySet();

 

                   for(String itemk1:itemkSet1){

                            for(String itemk2:itemkSet2){

                                     //

                                     String[] tmp1=itemk1.split(ITEM_SPLIT);

                                     String[] tmp2=itemk2.split(ITEM_SPLIT);

 

                                     String c="";

                                     if(tmp1.length==1){

                                               if(tmp1[0].compareTo(tmp2[0])<0){

                                                       c=tmp1[0]+ITEM_SPLIT+tmp2[0]+ITEM_SPLIT;

                                              }

                                     }else{

                                               booleanflag=true;

                    for(inti=0;i<tmp1.length-1;i++){

                           if(!tmp1[i].equals(tmp2[i])){

                                    flag=false;

                                    break;

                           }

                    }

                    if(flag&&(tmp1[tmp1.length-1].compareTo(tmp2[tmp2.length-1])<0)){

                           c=itemk1+tmp2[tmp2.length-1]+ITEM_SPLIT;

                    }

                                     }

 

                                     //行剪枝

                                     booleanhasInfrequentSubSet = false;

                                     if (!c.equals("")) {

                                              String[] tmpC = c.split(ITEM_SPLIT);

                                               for (inti = 0; i < tmpC.length; i++) {

                                                       String subC = "";

                                                       for (intj = 0; j < tmpC.length; j++) {

                                                                if (i != j) {

                                                                          subC = subC+tmpC[j]+ITEM_SPLIT;

                                                                }

                                                       }

                                                        if (itemkFcMap.get(subC) == null) {

                                                                hasInfrequentSubSet = true;

                                                                break;

                                                       }

                                              }

                                     }else{

                                               hasInfrequentSubSet=true;

                                     }

 

                                     if(!hasInfrequentSubSet){

                                               candidateCollection.put(c, 0);

                                     }

                            }

                   }

 

                   returncandidateCollection;

         }

 

       

         privateMap<String,Integer> getItem1FC(){

                   Map<String,Integer> sItem1FcMap=newHashMap<String,Integer>();

                   Map<String,Integer> rItem1FcMap=new HashMap<String,Integer>();//1

 

                   for(String trans:transList){

                            String[] items=trans.split(ITEM_SPLIT);

                            for(String item:items){

                                     Integer count=sItem1FcMap.get(item+ITEM_SPLIT);

                                     if(count==null){

                                               sItem1FcMap.put(item+ITEM_SPLIT, 1);

                                     }else{

                                               sItem1FcMap.put(item+ITEM_SPLIT, count+1);

                                     }

                            }

                   }

 

                   Set<String> keySet=sItem1FcMap.keySet();

                   for(String key:keySet){

                            Integer count=sItem1FcMap.get(key);

                            if(count>=SUPPORT){

                                     rItem1FcMap.put(key, count);

                            }

                   }

                   returnrItem1FcMap;

         }

 

  

         publicMap<String,Double> getRelationRules(Map<String,Integer> frequentCollectionMap){

                   Map<String,Double> relationRules=newHashMap<String,Double>();

                   Set<String> keySet=frequentCollectionMap.keySet();

                   for (String key : keySet) {

                            doublecountAll=frequentCollectionMap.get(key);

                            String[] keyItems = key.split(ITEM_SPLIT);

                            if(keyItems.length>1){

                                    List<String> source=newArrayList<String>();

                                    Collections.addAll(source, keyItems);

                                    List<List<String>> result=newArrayList<List<String>>();

 

                                    buildSubSet(source,result);//source的所有非空子集

 

                                     for(List<String>itemList:result){

                    if(itemList.size()<source.size()){//理真子集

                           List<String> otherList=newArrayList<String>();

                           for(String sourceItem:source){

                                    if(!itemList.contains(sourceItem)){

                                             otherList.add(sourceItem);

                                    }

                           }

                        String reasonStr="";//前置

                        String resultStr="";//

                        for(String item:itemList){

                                reasonStr=reasonStr+item+ITEM_SPLIT;

                        }

                        for(String item:otherList){

                                resultStr=resultStr+item+ITEM_SPLIT;

                        }

 

                        doublecountReason=frequentCollectionMap.get(reasonStr);

                        doubleitemConfidence=countAll/countReason;//算置信度

                        if(itemConfidence>=CONFIDENCE){

                                String rule=reasonStr+CON+resultStr;

                                relationRules.put(rule, itemConfidence);

                        }

                    }

                                     }

                            }

                   }

 

                   returnrelationRules;

         }

 

       

         private  voidbuildSubSet(List<String> sourceSet,List<List<String>> result) {

                   // 有一个元素递归终止。此非空子集仅为其自身,所以直接添加到result

                   if (sourceSet.size() == 1) {

                            List<String> set = newArrayList<String>();

                            set.add(sourceSet.get(0));

                            result.add(set);

                   } elseif (sourceSet.size() > 1){

                            // 当有n个元素递归求出前n-1个子集,在于result

                            buildSubSet(sourceSet.subList(0, sourceSet.size() - 1), result);

                            intsize = result.size();// 求出此result度,用于后面的追加第n个元素时计

                            // 把第n个元素加入到集合中

                            List<String> single = newArrayList<String>();

                            single.add(sourceSet.get(sourceSet.size() - 1));

                            result.add(single);

                            // 在保留前面的n-1子集的情况下,把第n个元素分别加到前n个子集中,并把新的集加入到result中;

                            // 保留原有n-1的子集,所以需要先行复制

                            List<String> clone;

                            for (inti = 0; i < size; i++) {

                                     clone = newArrayList<String>();

                                     for (String str : result.get(i)) {

                                               clone.add(str);

                                     }

                                     clone.add(sourceSet.get(sourceSet.size() - 1));

 

                                     result.add(clone);

                            }

                   }

         }

 

         publicstaticvoid main(String[] args){

                   Apriori apriori=new Apriori();

                   Map<String,Integer> frequentCollectionMap=apriori.getFC();

                   System.out.println("----------------------------------------");

                   System.out.println("----------------繁集"+"----------------");

                   System.out.println("----------------------------------------");

                   Set<String> fcKeySet=frequentCollectionMap.keySet();

                   for(String fcKey:fcKeySet){

                            System.out.println("集:{"+fcKey+"}"+"  : "+支持数:"+frequentCollectionMap.get(fcKey));

                   }

        Map<String,Double> relationRulesMap=apriori.getRelationRules(frequentCollectionMap);

        System.out.println("----------------------------------------");

        System.out.println("----------------联规则"+"----------------");

        System.out.println("----------------------------------------");

        Set<String> rrKeySet=relationRulesMap.keySet();

        for(String rrKey:rrKeySet){

                            System.out.println(rrKey+"  : "+"支持数:"+relationRulesMap.get(rrKey));

                   }

         }

}

 

2.结果截屏:

 

七、实验总结

通过这次试验让我对数据挖掘有了进一步的认识,对数据挖掘中要解决的问题,和面临的困难有了进一步的体会,同时学会了最基本的Apriori算法,掌握了发现发现频繁集和生成关联规则的方法。关联算法基本原理学习思路简单,只需一步一步找出频集。再通过支持度算出可信度。同时也了解了一些写数据挖掘算法的一般过程,提高了自己的实践动手能和加深对java的熟练程度。

 

 目录

Apriori算法实现... 2

一、实验背景... 2

二、算法描述... 2

1.Apriori介绍... 2

2.连接步和剪枝步... 2

3.Apriori算法的步骤... 3

4. 由频繁项集产生关联规则... 3

三、 实验目的... 4

1.4

2.4

四、 实验要求... 4

五、 实验环境... 4

1.操作系统:... 4

2.编译环境:... 4

3.编程语言:... 4

六、实验实现... 5

1.程序:... 5

2.结果截屏:... 19

七、实验总结... 19

 

  Apriori算法实现 

一、实验背景 

现在, 数据挖掘作为从数据中获取信息的有效方法, 越来越受到人们的重视。1993年,Agrawal等人首先提出关联规则概念,关联规则挖掘便迅速受到数据挖掘领域专家的广泛关注.迄今关联规则挖掘技术得到了较为深入的发展。Apriori算法是关联规则挖掘经典算法。关联规则是数据挖掘的重要研究方向,它是要找出隐藏在数据间的相互关系。关联规则的挖掘问题就是在事务数据库T中找出具有用户给定的满足一定条件的最小支持度MinS和最小置信度MinC的关联规则。

二、算法描述 

1.Apriori介绍 

Apriori算法使用频繁项集的先验知识,使用一种称作逐层搜索的迭代方法,k项集用于探索(k+1)项集。首先,通过扫描事务记录,找出所有的频繁1项集,该集合记做L1,然后利用L1找频繁2项集的集合L2,L2找L3,如此下去,直到不能再找到任何频繁k项集。最后再在所有的频繁集中找出强规则,即产生用户感兴趣的关联规则。

2.连接步和剪枝步 

在上述的关联规则挖掘过程的两个步骤中,第一步往往是总体性能的瓶颈。Apriori算法采用连接步和剪枝步两种方式来找出所有的频繁项集。 

连接步为找出Lk(所有的频繁k项集的集合),通过将Lk-1(所有的频繁k-1项集的集合)与自身连接产生候选k项集的集合。候选集合记作Ck。设l1和l2是Lk-1中的成员。记li[j]表示li中的第j项。假设Apriori算法对事务或项集中的项按字典次序排序,即对于(k-1)项集li,li*1+<li*2+<……….<li*k-1]。将Lk-1与自身连接,如果(l1[1]=l2[1])&&( l1[2]=l2*2+)&&……..&& (l1[k-2]=l2[k-2])&&(l1[k-1]<l2[k-1]),那认为l1和l2是可连接。连接l1和l2 产生的结果是,l1*1+,l1*2+,……,l1*k-1],l2[k-1]}。剪枝步 CK是LK的超集,也就是说,CK的成员可能是也可能不是频繁的。通过扫描所有的事务,确定CK中每个候选的计数,判断是否小于最小支持度计数,如果不是,则认为该候选是频繁的。

3.Apriori算法的步骤 

第一步:设定最小支持度MinS和最小置信度MinC; 

第二步:Apriori算法使用候选项集。首先产生出候选的项的集合,即候选项集,若候选项集的支持度大于或等于最小支持度,则该候选项集为频繁项集; 

第三步:在Apriori算法的过程中,首先从数据库读入所有的事务,每个项都被看作候选1-项集,得出各项的支持度,再使用频繁1-项集集合来产生候选2-项集集合,因为先验原理保证所有非频繁的1-项集的超集都是非频繁的; 

第四步:再扫描数据库,得出候选2-项集集合,再找出频繁2-项集,并利用这些频繁2-项集集合来产生候选3-项集; 

第五步:重复扫描数据库,与最小支持度比较,产生更高层次的频繁项集,再从该集合里产生下一级候选项集,直到不再产生新的候选项集为止。 

4. 由频繁项集产生关联规则 

Confidence(X->Y)=P(B|A)=support(XY)/suppor(X) 关联规则产生步骤如下: 

第一步:对于每个频繁项集l,产生其所有非空真子集; 

第二步:对于每个非空真子集s,如果support(l)/support (s)>=MinC,则输出 s->(l-s),其中,MinC是最小置信度阈值。

三、 实验目的 

1. 学会用Apriori算法对数据进行发现频繁项集和生成关联规则的挖掘,加强对Apriori算法的理解。

2.锻炼分析问题、解决问题并动手实践的能力。

四、 实验要求 

使用一种你熟悉的程序设计语言,如C++或Java,实现Apriori算法。

五、 实验环境 

1.操作系统:

Win10 操作系统

2.编译环境:

编译器eclipse、jdk 9

3.编程语言:

Java面向对象的程序设计语言

六、实验实现

1.程序:

package com.apriori;

 

importjava.util.ArrayList;

importjava.util.Collections;

importjava.util.HashMap;

import java.util.List;

import java.util.Map;

import java.util.Set;

 

publicclass Apriori {

 

         privatefinalstaticintSUPPORT = 2; // 支持度阈值

         privatefinalstaticdoubleCONFIDENCE = 0.7; // 置信度阈值

 

         privatefinalstatic String ITEM_SPLIT=","; // 的分隔符

         privatefinalstatic String CON="->"; // 的分隔符

 

         privatefinalstaticList<String> transList=newArrayList<String>(); //所有交易

 

         static{//初始化交易记录

                   transList.add("1,2,5,");

                   transList.add("2,4,");

                   transList.add("2,3,");

                   transList.add("1,2,4,");

                   transList.add("1,3,");

                   transList.add("1,2,3,");

                   transList.add("1,3,");

                   transList.add("1,2,3,5,");

                   transList.add("1,2,3,");

         }

 

       

         publicMap<String,Integer> getFC(){

        Map<String,Integer> frequentCollectionMap=newHashMap<String,Integer>();//所有的繁集

 

        frequentCollectionMap.putAll(getItem1FC());

 

        Map<String,Integer> itemkFcMap=newHashMap<String,Integer>();

        itemkFcMap.putAll(getItem1FC());

        while(itemkFcMap!=null&&itemkFcMap.size()!=0){

          Map<String,Integer> candidateCollection=getCandidateCollection(itemkFcMap);

          Set<String> ccKeySet=candidateCollection.keySet();

 

          //项进行累加

          for(String trans:transList){

             for(String candidate:ccKeySet){

                      booleanflag=true;// 用来判断交易中是否出现该选项,如果出数加1

                      String[] candidateItems=candidate.split(ITEM_SPLIT);

                      for(String candidateItem:candidateItems){

                               if(trans.indexOf(candidateItem+ITEM_SPLIT)==-1){

                                         flag=false;

                                         break;

                               }

                      }

                      if(flag){

                               Integer count=candidateCollection.get(candidate);

                               candidateCollection.put(candidate, count+1);

                      }

             }

          }

 

          //从候集中找到符合支持度的繁集

          itemkFcMap.clear();

          for(String candidate:ccKeySet){

             Integer count=candidateCollection.get(candidate);

             if(count>=SUPPORT){

                 itemkFcMap.put(candidate, count);

             }

          }

 

          //合并所有繁集

          frequentCollectionMap.putAll(itemkFcMap);

 

        }

 

        returnfrequentCollectionMap;

         }

 

       

         privateMap<String,Integer> getCandidateCollection(Map<String,Integer> itemkFcMap){

                   Map<String,Integer> candidateCollection=newHashMap<String,Integer>();

                   Set<String> itemkSet1=itemkFcMap.keySet();

                   Set<String> itemkSet2=itemkFcMap.keySet();

 

                   for(String itemk1:itemkSet1){

                            for(String itemk2:itemkSet2){

                                     //

                                     String[] tmp1=itemk1.split(ITEM_SPLIT);

                                     String[] tmp2=itemk2.split(ITEM_SPLIT);

 

                                     String c="";

                                     if(tmp1.length==1){

                                               if(tmp1[0].compareTo(tmp2[0])<0){

                                                       c=tmp1[0]+ITEM_SPLIT+tmp2[0]+ITEM_SPLIT;

                                              }

                                     }else{

                                               booleanflag=true;

                    for(inti=0;i<tmp1.length-1;i++){

                           if(!tmp1[i].equals(tmp2[i])){

                                    flag=false;

                                    break;

                           }

                    }

                    if(flag&&(tmp1[tmp1.length-1].compareTo(tmp2[tmp2.length-1])<0)){

                           c=itemk1+tmp2[tmp2.length-1]+ITEM_SPLIT;

                    }

                                     }

 

                                     //行剪枝

                                     booleanhasInfrequentSubSet = false;

                                     if (!c.equals("")) {

                                              String[] tmpC = c.split(ITEM_SPLIT);

                                               for (inti = 0; i < tmpC.length; i++) {

                                                       String subC = "";

                                                       for (intj = 0; j < tmpC.length; j++) {

                                                                if (i != j) {

                                                                          subC = subC+tmpC[j]+ITEM_SPLIT;

                                                                }

                                                       }

                                                        if (itemkFcMap.get(subC) == null) {

                                                                hasInfrequentSubSet = true;

                                                                break;

                                                       }

                                              }

                                     }else{

                                               hasInfrequentSubSet=true;

                                     }

 

                                     if(!hasInfrequentSubSet){

                                               candidateCollection.put(c, 0);

                                     }

                            }

                   }

 

                   returncandidateCollection;

         }

 

       

         privateMap<String,Integer> getItem1FC(){

                   Map<String,Integer> sItem1FcMap=newHashMap<String,Integer>();

                   Map<String,Integer> rItem1FcMap=new HashMap<String,Integer>();//1

 

                   for(String trans:transList){

                            String[] items=trans.split(ITEM_SPLIT);

                            for(String item:items){

                                     Integer count=sItem1FcMap.get(item+ITEM_SPLIT);

                                     if(count==null){

                                               sItem1FcMap.put(item+ITEM_SPLIT, 1);

                                     }else{

                                               sItem1FcMap.put(item+ITEM_SPLIT, count+1);

                                     }

                            }

                   }

 

                   Set<String> keySet=sItem1FcMap.keySet();

                   for(String key:keySet){

                            Integer count=sItem1FcMap.get(key);

                            if(count>=SUPPORT){

                                     rItem1FcMap.put(key, count);

                            }

                   }

                   returnrItem1FcMap;

         }

 

  

         publicMap<String,Double> getRelationRules(Map<String,Integer> frequentCollectionMap){

                   Map<String,Double> relationRules=newHashMap<String,Double>();

                   Set<String> keySet=frequentCollectionMap.keySet();

                   for (String key : keySet) {

                            doublecountAll=frequentCollectionMap.get(key);

                            String[] keyItems = key.split(ITEM_SPLIT);

                            if(keyItems.length>1){

                                    List<String> source=newArrayList<String>();

                                    Collections.addAll(source, keyItems);

                                    List<List<String>> result=newArrayList<List<String>>();

 

                                    buildSubSet(source,result);//source的所有非空子集

 

                                     for(List<String>itemList:result){

                    if(itemList.size()<source.size()){//理真子集

                           List<String> otherList=newArrayList<String>();

                           for(String sourceItem:source){

                                    if(!itemList.contains(sourceItem)){

                                             otherList.add(sourceItem);

                                    }

                           }

                        String reasonStr="";//前置

                        String resultStr="";//

                        for(String item:itemList){

                                reasonStr=reasonStr+item+ITEM_SPLIT;

                        }

                        for(String item:otherList){

                                resultStr=resultStr+item+ITEM_SPLIT;

                        }

 

                        doublecountReason=frequentCollectionMap.get(reasonStr);

                        doubleitemConfidence=countAll/countReason;//算置信度

                        if(itemConfidence>=CONFIDENCE){

                                String rule=reasonStr+CON+resultStr;

                                relationRules.put(rule, itemConfidence);

                        }

                    }

                                     }

                            }

                   }

 

                   returnrelationRules;

         }

 

       

         private  voidbuildSubSet(List<String> sourceSet,List<List<String>> result) {

                   // 有一个元素递归终止。此非空子集仅为其自身,所以直接添加到result

                   if (sourceSet.size() == 1) {

                            List<String> set = newArrayList<String>();

                            set.add(sourceSet.get(0));

                            result.add(set);

                   } elseif (sourceSet.size() > 1){

                            // 当有n个元素递归求出前n-1个子集,在于result

                            buildSubSet(sourceSet.subList(0, sourceSet.size() - 1), result);

                            intsize = result.size();// 求出此result度,用于后面的追加第n个元素时计

                            // 把第n个元素加入到集合中

                            List<String> single = newArrayList<String>();

                            single.add(sourceSet.get(sourceSet.size() - 1));

                            result.add(single);

                            // 在保留前面的n-1子集的情况下,把第n个元素分别加到前n个子集中,并把新的集加入到result中;

                            // 保留原有n-1的子集,所以需要先行复制

                            List<String> clone;

                            for (inti = 0; i < size; i++) {

                                     clone = newArrayList<String>();

                                     for (String str : result.get(i)) {

                                               clone.add(str);

                                     }

                                     clone.add(sourceSet.get(sourceSet.size() - 1));

 

                                     result.add(clone);

                            }

                   }

         }

 

         publicstaticvoid main(String[] args){

                   Apriori apriori=new Apriori();

                   Map<String,Integer> frequentCollectionMap=apriori.getFC();

                   System.out.println("----------------------------------------");

                   System.out.println("----------------繁集"+"----------------");

                   System.out.println("----------------------------------------");

                   Set<String> fcKeySet=frequentCollectionMap.keySet();

                   for(String fcKey:fcKeySet){

                            System.out.println("集:{"+fcKey+"}"+"  : "+支持数:"+frequentCollectionMap.get(fcKey));

                   }

        Map<String,Double> relationRulesMap=apriori.getRelationRules(frequentCollectionMap);

        System.out.println("----------------------------------------");

        System.out.println("----------------联规则"+"----------------");

        System.out.println("----------------------------------------");

        Set<String> rrKeySet=relationRulesMap.keySet();

        for(String rrKey:rrKeySet){

                            System.out.println(rrKey+"  : "+"支持数:"+relationRulesMap.get(rrKey));

                   }

         }

}

 

2.结果截屏:

 

七、实验总结

通过这次试验让我对数据挖掘有了进一步的认识,对数据挖掘中要解决的问题,和面临的困难有了进一步的体会,同时学会了最基本的Apriori算法,掌握了发现发现频繁集和生成关联规则的方法。关联算法基本原理学习思路简单,只需一步一步找出频集。再通过支持度算出可信度。同时也了解了一些写数据挖掘算法的一般过程,提高了自己的实践动手能和加深对java的熟练程度。

 

 目录

Apriori算法实现... 2

一、实验背景... 2

二、算法描述... 2

1.Apriori介绍... 2

2.连接步和剪枝步... 2

3.Apriori算法的步骤... 3

4. 由频繁项集产生关联规则... 3

三、 实验目的... 4

1.4

2.4

四、 实验要求... 4

五、 实验环境... 4

1.操作系统:... 4

2.编译环境:... 4

3.编程语言:... 4

六、实验实现... 5

1.程序:... 5

2.结果截屏:... 19

七、实验总结... 19

 

  Apriori算法实现 

一、实验背景 

现在, 数据挖掘作为从数据中获取信息的有效方法, 越来越受到人们的重视。1993年,Agrawal等人首先提出关联规则概念,关联规则挖掘便迅速受到数据挖掘领域专家的广泛关注.迄今关联规则挖掘技术得到了较为深入的发展。Apriori算法是关联规则挖掘经典算法。关联规则是数据挖掘的重要研究方向,它是要找出隐藏在数据间的相互关系。关联规则的挖掘问题就是在事务数据库T中找出具有用户给定的满足一定条件的最小支持度MinS和最小置信度MinC的关联规则。

二、算法描述 

1.Apriori介绍 

Apriori算法使用频繁项集的先验知识,使用一种称作逐层搜索的迭代方法,k项集用于探索(k+1)项集。首先,通过扫描事务记录,找出所有的频繁1项集,该集合记做L1,然后利用L1找频繁2项集的集合L2,L2找L3,如此下去,直到不能再找到任何频繁k项集。最后再在所有的频繁集中找出强规则,即产生用户感兴趣的关联规则。

2.连接步和剪枝步 

在上述的关联规则挖掘过程的两个步骤中,第一步往往是总体性能的瓶颈。Apriori算法采用连接步和剪枝步两种方式来找出所有的频繁项集。 

连接步为找出Lk(所有的频繁k项集的集合),通过将Lk-1(所有的频繁k-1项集的集合)与自身连接产生候选k项集的集合。候选集合记作Ck。设l1和l2是Lk-1中的成员。记li[j]表示li中的第j项。假设Apriori算法对事务或项集中的项按字典次序排序,即对于(k-1)项集li,li*1+<li*2+<……….<li*k-1]。将Lk-1与自身连接,如果(l1[1]=l2[1])&&( l1[2]=l2*2+)&&……..&& (l1[k-2]=l2[k-2])&&(l1[k-1]<l2[k-1]),那认为l1和l2是可连接。连接l1和l2 产生的结果是,l1*1+,l1*2+,……,l1*k-1],l2[k-1]}。剪枝步 CK是LK的超集,也就是说,CK的成员可能是也可能不是频繁的。通过扫描所有的事务,确定CK中每个候选的计数,判断是否小于最小支持度计数,如果不是,则认为该候选是频繁的。

3.Apriori算法的步骤 

第一步:设定最小支持度MinS和最小置信度MinC; 

第二步:Apriori算法使用候选项集。首先产生出候选的项的集合,即候选项集,若候选项集的支持度大于或等于最小支持度,则该候选项集为频繁项集; 

第三步:在Apriori算法的过程中,首先从数据库读入所有的事务,每个项都被看作候选1-项集,得出各项的支持度,再使用频繁1-项集集合来产生候选2-项集集合,因为先验原理保证所有非频繁的1-项集的超集都是非频繁的; 

第四步:再扫描数据库,得出候选2-项集集合,再找出频繁2-项集,并利用这些频繁2-项集集合来产生候选3-项集; 

第五步:重复扫描数据库,与最小支持度比较,产生更高层次的频繁项集,再从该集合里产生下一级候选项集,直到不再产生新的候选项集为止。 

4. 由频繁项集产生关联规则 

Confidence(X->Y)=P(B|A)=support(XY)/suppor(X) 关联规则产生步骤如下: 

第一步:对于每个频繁项集l,产生其所有非空真子集; 

第二步:对于每个非空真子集s,如果support(l)/support (s)>=MinC,则输出 s->(l-s),其中,MinC是最小置信度阈值。

三、 实验目的 

1. 学会用Apriori算法对数据进行发现频繁项集和生成关联规则的挖掘,加强对Apriori算法的理解。

2.锻炼分析问题、解决问题并动手实践的能力。

四、 实验要求 

使用一种你熟悉的程序设计语言,如C++或Java,实现Apriori算法。

五、 实验环境 

1.操作系统:

Win10 操作系统

2.编译环境:

编译器eclipse、jdk 9

3.编程语言:

Java面向对象的程序设计语言

六、实验实现

1.程序:

package com.apriori;

 

importjava.util.ArrayList;

importjava.util.Collections;

importjava.util.HashMap;

import java.util.List;

import java.util.Map;

import java.util.Set;

 

publicclass Apriori {

 

         privatefinalstaticintSUPPORT = 2; // 支持度阈值

         privatefinalstaticdoubleCONFIDENCE = 0.7; // 置信度阈值

 

         privatefinalstatic String ITEM_SPLIT=","; // 的分隔符

         privatefinalstatic String CON="->"; // 的分隔符

 

         privatefinalstaticList<String> transList=newArrayList<String>(); //所有交易

 

         static{//初始化交易记录

                   transList.add("1,2,5,");

                   transList.add("2,4,");

                   transList.add("2,3,");

                   transList.add("1,2,4,");

                   transList.add("1,3,");

                   transList.add("1,2,3,");

                   transList.add("1,3,");

                   transList.add("1,2,3,5,");

                   transList.add("1,2,3,");

         }

 

       

         publicMap<String,Integer> getFC(){

        Map<String,Integer> frequentCollectionMap=newHashMap<String,Integer>();//所有的繁集

 

        frequentCollectionMap.putAll(getItem1FC());

 

        Map<String,Integer> itemkFcMap=newHashMap<String,Integer>();

        itemkFcMap.putAll(getItem1FC());

        while(itemkFcMap!=null&&itemkFcMap.size()!=0){

          Map<String,Integer> candidateCollection=getCandidateCollection(itemkFcMap);

          Set<String> ccKeySet=candidateCollection.keySet();

 

          //项进行累加

          for(String trans:transList){

             for(String candidate:ccKeySet){

                      booleanflag=true;// 用来判断交易中是否出现该选项,如果出数加1

                      String[] candidateItems=candidate.split(ITEM_SPLIT);

                      for(String candidateItem:candidateItems){

                               if(trans.indexOf(candidateItem+ITEM_SPLIT)==-1){

                                         flag=false;

                                         break;

                               }

                      }

                      if(flag){

                               Integer count=candidateCollection.get(candidate);

                               candidateCollection.put(candidate, count+1);

                      }

             }

          }

 

          //从候集中找到符合支持度的繁集

          itemkFcMap.clear();

          for(String candidate:ccKeySet){

             Integer count=candidateCollection.get(candidate);

             if(count>=SUPPORT){

                 itemkFcMap.put(candidate, count);

             }

          }

 

          //合并所有繁集

          frequentCollectionMap.putAll(itemkFcMap);

 

        }

 

        returnfrequentCollectionMap;

         }

 

       

         privateMap<String,Integer> getCandidateCollection(Map<String,Integer> itemkFcMap){

                   Map<String,Integer> candidateCollection=newHashMap<String,Integer>();

                   Set<String> itemkSet1=itemkFcMap.keySet();

                   Set<String> itemkSet2=itemkFcMap.keySet();

 

                   for(String itemk1:itemkSet1){

                            for(String itemk2:itemkSet2){

                                     //

                                     String[] tmp1=itemk1.split(ITEM_SPLIT);

                                     String[] tmp2=itemk2.split(ITEM_SPLIT);

 

                                     String c="";

                                     if(tmp1.length==1){

                                               if(tmp1[0].compareTo(tmp2[0])<0){

                                                       c=tmp1[0]+ITEM_SPLIT+tmp2[0]+ITEM_SPLIT;

                                              }

                                     }else{

                                               booleanflag=true;

                    for(inti=0;i<tmp1.length-1;i++){

                           if(!tmp1[i].equals(tmp2[i])){

                                    flag=false;

                                    break;

                           }

                    }

                    if(flag&&(tmp1[tmp1.length-1].compareTo(tmp2[tmp2.length-1])<0)){

                           c=itemk1+tmp2[tmp2.length-1]+ITEM_SPLIT;

                    }

                                     }

 

                                     //行剪枝

                                     booleanhasInfrequentSubSet = false;

                                     if (!c.equals("")) {

                                              String[] tmpC = c.split(ITEM_SPLIT);

                                               for (inti = 0; i < tmpC.length; i++) {

                                                       String subC = "";

                                                       for (intj = 0; j < tmpC.length; j++) {

                                                                if (i != j) {

                                                                          subC = subC+tmpC[j]+ITEM_SPLIT;

                                                                }

                                                       }

                                                        if (itemkFcMap.get(subC) == null) {

                                                                hasInfrequentSubSet = true;

                                                                break;

                                                       }

                                              }

                                     }else{

                                               hasInfrequentSubSet=true;

                                     }

 

                                     if(!hasInfrequentSubSet){

                                               candidateCollection.put(c, 0);

                                     }

                            }

                   }

 

                   returncandidateCollection;

         }

 

       

         privateMap<String,Integer> getItem1FC(){

                   Map<String,Integer> sItem1FcMap=newHashMap<String,Integer>();

                   Map<String,Integer> rItem1FcMap=new HashMap<String,Integer>();//1

 

                   for(String trans:transList){

                            String[] items=trans.split(ITEM_SPLIT);

                            for(String item:items){

                                     Integer count=sItem1FcMap.get(item+ITEM_SPLIT);

                                     if(count==null){

                                               sItem1FcMap.put(item+ITEM_SPLIT, 1);

                                     }else{

                                               sItem1FcMap.put(item+ITEM_SPLIT, count+1);

                                     }

                            }

                   }

 

                   Set<String> keySet=sItem1FcMap.keySet();

                   for(String key:keySet){

                            Integer count=sItem1FcMap.get(key);

                            if(count>=SUPPORT){

                                     rItem1FcMap.put(key, count);

                            }

                   }

                   returnrItem1FcMap;

         }

 

  

         publicMap<String,Double> getRelationRules(Map<String,Integer> frequentCollectionMap){

                   Map<String,Double> relationRules=newHashMap<String,Double>();

                   Set<String> keySet=frequentCollectionMap.keySet();

                   for (String key : keySet) {

                            doublecountAll=frequentCollectionMap.get(key);

                            String[] keyItems = key.split(ITEM_SPLIT);

                            if(keyItems.length>1){

                                    List<String> source=newArrayList<String>();

                                    Collections.addAll(source, keyItems);

                                    List<List<String>> result=newArrayList<List<String>>();

 

                                    buildSubSet(source,result);//source的所有非空子集

 

                                     for(List<String>itemList:result){

                    if(itemList.size()<source.size()){//理真子集

                           List<String> otherList=newArrayList<String>();

                           for(String sourceItem:source){

                                    if(!itemList.contains(sourceItem)){

                                             otherList.add(sourceItem);

                                    }

                           }

                        String reasonStr="";//前置

                        String resultStr="";//

                        for(String item:itemList){

                                reasonStr=reasonStr+item+ITEM_SPLIT;

                        }

                        for(String item:otherList){

                                resultStr=resultStr+item+ITEM_SPLIT;

                        }

 

                        doublecountReason=frequentCollectionMap.get(reasonStr);

                        doubleitemConfidence=countAll/countReason;//算置信度

                        if(itemConfidence>=CONFIDENCE){

                                String rule=reasonStr+CON+resultStr;

                                relationRules.put(rule, itemConfidence);

                        }

                    }

                                     }

                            }

                   }

 

                   returnrelationRules;

         }

 

       

         private  voidbuildSubSet(List<String> sourceSet,List<List<String>> result) {

                   // 有一个元素递归终止。此非空子集仅为其自身,所以直接添加到result

                   if (sourceSet.size() == 1) {

                            List<String> set = newArrayList<String>();

                            set.add(sourceSet.get(0));

                            result.add(set);

                   } elseif (sourceSet.size() > 1){

                            // 当有n个元素递归求出前n-1个子集,在于result

                            buildSubSet(sourceSet.subList(0, sourceSet.size() - 1), result);

                            intsize = result.size();// 求出此result度,用于后面的追加第n个元素时计

                            // 把第n个元素加入到集合中

                            List<String> single = newArrayList<String>();

                            single.add(sourceSet.get(sourceSet.size() - 1));

                            result.add(single);

                            // 在保留前面的n-1子集的情况下,把第n个元素分别加到前n个子集中,并把新的集加入到result中;

                            // 保留原有n-1的子集,所以需要先行复制

                            List<String> clone;

                            for (inti = 0; i < size; i++) {

                                     clone = newArrayList<String>();

                                     for (String str : result.get(i)) {

                                               clone.add(str);

                                     }

                                     clone.add(sourceSet.get(sourceSet.size() - 1));

 

                                     result.add(clone);

                            }

                   }

         }

 

         publicstaticvoid main(String[] args){

                   Apriori apriori=new Apriori();

                   Map<String,Integer> frequentCollectionMap=apriori.getFC();

                   System.out.println("----------------------------------------");

                   System.out.println("----------------繁集"+"----------------");

                   System.out.println("----------------------------------------");

                   Set<String> fcKeySet=frequentCollectionMap.keySet();

                   for(String fcKey:fcKeySet){

                            System.out.println("集:{"+fcKey+"}"+"  : "+支持数:"+frequentCollectionMap.get(fcKey));

                   }

        Map<String,Double> relationRulesMap=apriori.getRelationRules(frequentCollectionMap);

        System.out.println("----------------------------------------");

        System.out.println("----------------联规则"+"----------------");

        System.out.println("----------------------------------------");

        Set<String> rrKeySet=relationRulesMap.keySet();

        for(String rrKey:rrKeySet){

                            System.out.println(rrKey+"  : "+"支持数:"+relationRulesMap.get(rrKey));

                   }

         }

}

 

2.结果截屏:

 

七、实验总结

通过这次试验让我对数据挖掘有了进一步的认识,对数据挖掘中要解决的问题,和面临的困难有了进一步的体会,同时学会了最基本的Apriori算法,掌握了发现发现频繁集和生成关联规则的方法。关联算法基本原理学习思路简单,只需一步一步找出频集。再通过支持度算出可信度。同时也了解了一些写数据挖掘算法的一般过程,提高了自己的实践动手能和加深对java的熟练程度。

 

 

目录

Apriori算法实现... 2

一、实验背景... 2

二、算法描述... 2

1.Apriori介绍... 2

2.连接步和剪枝步... 2

3.Apriori算法的步骤... 3

4. 由频繁项集产生关联规则... 3

三、 实验目的... 4

1.4

2.4

四、 实验要求... 4

五、 实验环境... 4

1.操作系统:... 4

2.编译环境:... 4

3.编程语言:... 4

六、实验实现... 5

1.程序:... 5

2.结果截屏:... 19

七、实验总结... 19

 

  Apriori算法实现 

一、实验背景 

现在, 数据挖掘作为从数据中获取信息的有效方法, 越来越受到人们的重视。1993年,Agrawal等人首先提出关联规则概念,关联规则挖掘便迅速受到数据挖掘领域专家的广泛关注.迄今关联规则挖掘技术得到了较为深入的发展。Apriori算法是关联规则挖掘经典算法。关联规则是数据挖掘的重要研究方向,它是要找出隐藏在数据间的相互关系。关联规则的挖掘问题就是在事务数据库T中找出具有用户给定的满足一定条件的最小支持度MinS和最小置信度MinC的关联规则。

二、算法描述 

1.Apriori介绍 

Apriori算法使用频繁项集的先验知识,使用一种称作逐层搜索的迭代方法,k项集用于探索(k+1)项集。首先,通过扫描事务记录,找出所有的频繁1项集,该集合记做L1,然后利用L1找频繁2项集的集合L2,L2找L3,如此下去,直到不能再找到任何频繁k项集。最后再在所有的频繁集中找出强规则,即产生用户感兴趣的关联规则。

2.连接步和剪枝步 

在上述的关联规则挖掘过程的两个步骤中,第一步往往是总体性能的瓶颈。Apriori算法采用连接步和剪枝步两种方式来找出所有的频繁项集。 

连接步为找出Lk(所有的频繁k项集的集合),通过将Lk-1(所有的频繁k-1项集的集合)与自身连接产生候选k项集的集合。候选集合记作Ck。设l1和l2是Lk-1中的成员。记li[j]表示li中的第j项。假设Apriori算法对事务或项集中的项按字典次序排序,即对于(k-1)项集li,li*1+<li*2+<……….<li*k-1]。将Lk-1与自身连接,如果(l1[1]=l2[1])&&( l1[2]=l2*2+)&&……..&& (l1[k-2]=l2[k-2])&&(l1[k-1]<l2[k-1]),那认为l1和l2是可连接。连接l1和l2 产生的结果是,l1*1+,l1*2+,……,l1*k-1],l2[k-1]}。剪枝步 CK是LK的超集,也就是说,CK的成员可能是也可能不是频繁的。通过扫描所有的事务,确定CK中每个候选的计数,判断是否小于最小支持度计数,如果不是,则认为该候选是频繁的。

3.Apriori算法的步骤 

第一步:设定最小支持度MinS和最小置信度MinC; 

第二步:Apriori算法使用候选项集。首先产生出候选的项的集合,即候选项集,若候选项集的支持度大于或等于最小支持度,则该候选项集为频繁项集; 

第三步:在Apriori算法的过程中,首先从数据库读入所有的事务,每个项都被看作候选1-项集,得出各项的支持度,再使用频繁1-项集集合来产生候选2-项集集合,因为先验原理保证所有非频繁的1-项集的超集都是非频繁的; 

第四步:再扫描数据库,得出候选2-项集集合,再找出频繁2-项集,并利用这些频繁2-项集集合来产生候选3-项集; 

第五步:重复扫描数据库,与最小支持度比较,产生更高层次的频繁项集,再从该集合里产生下一级候选项集,直到不再产生新的候选项集为止。 

4. 由频繁项集产生关联规则 

Confidence(X->Y)=P(B|A)=support(XY)/suppor(X) 关联规则产生步骤如下: 

第一步:对于每个频繁项集l,产生其所有非空真子集; 

第二步:对于每个非空真子集s,如果support(l)/support (s)>=MinC,则输出 s->(l-s),其中,MinC是最小置信度阈值。

三、 实验目的 

1. 学会用Apriori算法对数据进行发现频繁项集和生成关联规则的挖掘,加强对Apriori算法的理解。

2.锻炼分析问题、解决问题并动手实践的能力。

四、 实验要求 

使用一种你熟悉的程序设计语言,如C++或Java,实现Apriori算法。

五、 实验环境 

1.操作系统:

Win10 操作系统

2.编译环境:

编译器eclipse、jdk 9

3.编程语言:

Java面向对象的程序设计语言

六、实验实现

1.程序:

package com.apriori;

 

importjava.util.ArrayList;

importjava.util.Collections;

importjava.util.HashMap;

import java.util.List;

import java.util.Map;

import java.util.Set;

 

publicclass Apriori {

 

         privatefinalstaticintSUPPORT = 2; // 支持度阈值

         privatefinalstaticdoubleCONFIDENCE = 0.7; // 置信度阈值

 

         privatefinalstatic String ITEM_SPLIT=","; // 的分隔符

         privatefinalstatic String CON="->"; // 的分隔符

 

         privatefinalstaticList<String> transList=newArrayList<String>(); //所有交易

 

         static{//初始化交易记录

                   transList.add("1,2,5,");

                   transList.add("2,4,");

                   transList.add("2,3,");

                   transList.add("1,2,4,");

                   transList.add("1,3,");

                   transList.add("1,2,3,");

                   transList.add("1,3,");

                   transList.add("1,2,3,5,");

                   transList.add("1,2,3,");

         }

 

       

         publicMap<String,Integer> getFC(){

        Map<String,Integer> frequentCollectionMap=newHashMap<String,Integer>();//所有的繁集

 

        frequentCollectionMap.putAll(getItem1FC());

 

        Map<String,Integer> itemkFcMap=newHashMap<String,Integer>();

        itemkFcMap.putAll(getItem1FC());

        while(itemkFcMap!=null&&itemkFcMap.size()!=0){

          Map<String,Integer> candidateCollection=getCandidateCollection(itemkFcMap);

          Set<String> ccKeySet=candidateCollection.keySet();

 

          //项进行累加

          for(String trans:transList){

             for(String candidate:ccKeySet){

                      booleanflag=true;// 用来判断交易中是否出现该选项,如果出数加1

                      String[] candidateItems=candidate.split(ITEM_SPLIT);

                      for(String candidateItem:candidateItems){

                               if(trans.indexOf(candidateItem+ITEM_SPLIT)==-1){

                                         flag=false;

                                         break;

                               }

                      }

                      if(flag){

                               Integer count=candidateCollection.get(candidate);

                               candidateCollection.put(candidate, count+1);

                      }

             }

          }

 

          //从候集中找到符合支持度的繁集

          itemkFcMap.clear();

          for(String candidate:ccKeySet){

             Integer count=candidateCollection.get(candidate);

             if(count>=SUPPORT){

                 itemkFcMap.put(candidate, count);

             }

          }

 

          //合并所有繁集

          frequentCollectionMap.putAll(itemkFcMap);

 

        }

 

        returnfrequentCollectionMap;

         }

 

       

         privateMap<String,Integer> getCandidateCollection(Map<String,Integer> itemkFcMap){

                   Map<String,Integer> candidateCollection=newHashMap<String,Integer>();

                   Set<String> itemkSet1=itemkFcMap.keySet();

                   Set<String> itemkSet2=itemkFcMap.keySet();

 

                   for(String itemk1:itemkSet1){

                            for(String itemk2:itemkSet2){

                                     //

                                     String[] tmp1=itemk1.split(ITEM_SPLIT);

                                     String[] tmp2=itemk2.split(ITEM_SPLIT);

 

                                     String c="";

                                     if(tmp1.length==1){

                                               if(tmp1[0].compareTo(tmp2[0])<0){

                                                       c=tmp1[0]+ITEM_SPLIT+tmp2[0]+ITEM_SPLIT;

                                              }

                                     }else{

                                               booleanflag=true;

                    for(inti=0;i<tmp1.length-1;i++){

                           if(!tmp1[i].equals(tmp2[i])){

                                    flag=false;

                                    break;

                           }

                    }

                    if(flag&&(tmp1[tmp1.length-1].compareTo(tmp2[tmp2.length-1])<0)){

                           c=itemk1+tmp2[tmp2.length-1]+ITEM_SPLIT;

                    }

                                     }

 

                                     //行剪枝

                                     booleanhasInfrequentSubSet = false;

                                     if (!c.equals("")) {

                                              String[] tmpC = c.split(ITEM_SPLIT);

                                               for (inti = 0; i < tmpC.length; i++) {

                                                       String subC = "";

                                                       for (intj = 0; j < tmpC.length; j++) {

                                                                if (i != j) {

                                                                          subC = subC+tmpC[j]+ITEM_SPLIT;

                                                                }

                                                       }

                                                        if (itemkFcMap.get(subC) == null) {

                                                                hasInfrequentSubSet = true;

                                                                break;

                                                       }

                                              }

                                     }else{

                                               hasInfrequentSubSet=true;

                                     }

 

                                     if(!hasInfrequentSubSet){

                                               candidateCollection.put(c, 0);

                                     }

                            }

                   }

 

                   returncandidateCollection;

         }

 

       

         privateMap<String,Integer> getItem1FC(){

                   Map<String,Integer> sItem1FcMap=newHashMap<String,Integer>();

                   Map<String,Integer> rItem1FcMap=new HashMap<String,Integer>();//1

 

                   for(String trans:transList){

                            String[] items=trans.split(ITEM_SPLIT);

                            for(String item:items){

                                     Integer count=sItem1FcMap.get(item+ITEM_SPLIT);

                                     if(count==null){

                                               sItem1FcMap.put(item+ITEM_SPLIT, 1);

                                     }else{

                                               sItem1FcMap.put(item+ITEM_SPLIT, count+1);

                                     }

                            }

                   }

 

                   Set<String> keySet=sItem1FcMap.keySet();

                   for(String key:keySet){

                            Integer count=sItem1FcMap.get(key);

                            if(count>=SUPPORT){

                                     rItem1FcMap.put(key, count);

                            }

                   }

                   returnrItem1FcMap;

         }

 

  

         publicMap<String,Double> getRelationRules(Map<String,Integer> frequentCollectionMap){

                   Map<String,Double> relationRules=newHashMap<String,Double>();

                   Set<String> keySet=frequentCollectionMap.keySet();

                   for (String key : keySet) {

                            doublecountAll=frequentCollectionMap.get(key);

                            String[] keyItems = key.split(ITEM_SPLIT);

                            if(keyItems.length>1){

                                    List<String> source=newArrayList<String>();

                                    Collections.addAll(source, keyItems);

                                    List<List<String>> result=newArrayList<List<String>>();

 

                                    buildSubSet(source,result);//source的所有非空子集

 

                                     for(List<String>itemList:result){

                    if(itemList.size()<source.size()){//理真子集

                           List<String> otherList=newArrayList<String>();

                           for(String sourceItem:source){

                                    if(!itemList.contains(sourceItem)){

                                             otherList.add(sourceItem);

                                    }

                           }

                        String reasonStr="";//前置

                        String resultStr="";//

                        for(String item:itemList){

                                reasonStr=reasonStr+item+ITEM_SPLIT;

                        }

                        for(String item:otherList){

                                resultStr=resultStr+item+ITEM_SPLIT;

                        }

 

                        doublecountReason=frequentCollectionMap.get(reasonStr);

                        doubleitemConfidence=countAll/countReason;//算置信度

                        if(itemConfidence>=CONFIDENCE){

                                String rule=reasonStr+CON+resultStr;

                                relationRules.put(rule, itemConfidence);

                        }

                    }

                                     }

                            }

                   }

 

                   returnrelationRules;

         }

 

       

         private  voidbuildSubSet(List<String> sourceSet,List<List<String>> result) {

                   // 有一个元素递归终止。此非空子集仅为其自身,所以直接添加到result

                   if (sourceSet.size() == 1) {

                            List<String> set = newArrayList<String>();

                            set.add(sourceSet.get(0));

                            result.add(set);

                   } elseif (sourceSet.size() > 1){

                            // 当有n个元素递归求出前n-1个子集,在于result

                            buildSubSet(sourceSet.subList(0, sourceSet.size() - 1), result);

                            intsize = result.size();// 求出此result度,用于后面的追加第n个元素时计

                            // 把第n个元素加入到集合中

                            List<String> single = newArrayList<String>();

                            single.add(sourceSet.get(sourceSet.size() - 1));

                            result.add(single);

                            // 在保留前面的n-1子集的情况下,把第n个元素分别加到前n个子集中,并把新的集加入到result中;

                            // 保留原有n-1的子集,所以需要先行复制

                            List<String> clone;

                            for (inti = 0; i < size; i++) {

                                     clone = newArrayList<String>();

                                     for (String str : result.get(i)) {

                                               clone.add(str);

                                     }

                                     clone.add(sourceSet.get(sourceSet.size() - 1));

 

                                     result.add(clone);

                            }

                   }

         }

 

         publicstaticvoid main(String[] args){

                   Apriori apriori=new Apriori();

                   Map<String,Integer> frequentCollectionMap=apriori.getFC();

                   System.out.println("----------------------------------------");

                   System.out.println("----------------繁集"+"----------------");

                   System.out.println("----------------------------------------");

                   Set<String> fcKeySet=frequentCollectionMap.keySet();

                   for(String fcKey:fcKeySet){

                            System.out.println("集:{"+fcKey+"}"+"  : "+支持数:"+frequentCollectionMap.get(fcKey));

                   }

        Map<String,Double> relationRulesMap=apriori.getRelationRules(frequentCollectionMap);

        System.out.println("----------------------------------------");

        System.out.println("----------------联规则"+"----------------");

        System.out.println("----------------------------------------");

        Set<String> rrKeySet=relationRulesMap.keySet();

        for(String rrKey:rrKeySet){

                            System.out.println(rrKey+"  : "+"支持数:"+relationRulesMap.get(rrKey));

                   }

         }

}

 

2.结果截屏:

 

七、实验总结

通过这次试验让我对数据挖掘有了进一步的认识,对数据挖掘中要解决的问题,和面临的困难有了进一步的体会,同时学会了最基本的Apriori算法,掌握了发现发现频繁集和生成关联规则的方法。关联算法基本原理学习思路简单,只需一步一步找出频集。再通过支持度算出可信度。同时也了解了一些写数据挖掘算法的一般过程,提高了自己的实践动手能和加深对java的熟练程度。

 

 

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值