mahout关联规则之FPGrowthDriver源码分析part2

最新推荐文章于 2023-07-20 10:04:11 发布

iteye_12675

最新推荐文章于 2023-07-20 10:04:11 发布

阅读量129

点赞数

上次说到这个函数：

public final void generateTopKFrequentPatterns(Iterator<Pair<List<A>,Long>> transactionStream,
                                                 Collection<Pair<A, Long>> frequencyList,
                                                 long minSupport,
                                                 int k,
                                                 Collection<A> returnableFeatures,
                                                 OutputCollector<A,List<Pair<List<A>,Long>>> output,
                                                 StatusUpdater updater)

调用了一个同名的函数，然后同名的函数又引入了FPTree。那么上面这个函数是如何调用其同名的函数呢？如下：

generateTopKFrequentPatterns(new TransactionIterator<A>(transactionStream,
        attributeIdMapping), attributeFrequency, minSupport, k, reverseMapping
        .size(), returnFeatures, new TopKPatternsOutputConverter<A>(output,
            reverseMapping), updater);

这里先分析下FPTree的各个属性：

   int[] attribute;
   int[] childCount;
   int[] conditional;
   long[] headerTableAttributeCount;
   int[] headerTableAttributes;
   int headerTableCount;
   int[] headerTableLookup;
   int[][] headerTableProperties;
   int[] next;
   int[][] nodeChildren;
   long[] nodeCount;
   int nodes;
   int[] parent;
   boolean singlePath;
   final Collection<Integer> sortedSet = new TreeSet<Integer>();

其中的非header开头的属性当和TransactionTree的属性一样，可参考前一篇博客；这里header开头的属性则是属FPTree特有的，各个属性的含义如下：headerTableAttributes：项目名称（由于前面做过编码转换，所以此处为int型数据）；headerTableAttributeCount：项目出现的次数；headerTableProperties：项目在树中首次出现的id和最后一次出现的id；next：同一个项目在树中下一个id；

接着看同名函数，其定义如下：

private Map<Integer,FrequentPatternMaxHeap> generateTopKFrequentPatterns(
    Iterator<Pair<int[],Long>> transactions,
    long[] attributeFrequency,
    long minSupport,
    int k,
    int featureSetSize,
    Collection<Integer> returnFeatures, TopKPatternsOutputConverter<A> topKPatternsOutputCollector,
    StatusUpdater updater)

进入函数体内可以看到其建树包含两部分：其一：添加headerTable相关内容，或者说是初始化headerTable

FPTree tree = new FPTree(featureSetSize);
    for (int i = 0; i < featureSetSize; i++) {
      tree.addHeaderCount(i, attributeFrequency[i]);
    }

其二，是添加事务到FPTree上：

 while (transactions.hasNext()) {
      Pair<int[],Long> transaction = transactions.next();
      Arrays.sort(transaction.getFirst());
      // attribcount += transaction.length;
      nodecount += treeAddCount(tree, transaction.getFirst(), transaction.getSecond(), minSupport, attributeFrequency);
      }

这里起主要作用的便是treeAddCount()函数，这个函数不仅像之前TransactionTree那样建立了一棵树，同时有把相关信息添加到headerTable中和next中。
比如刚开始的原始数据，经过排序、删除、编码和相加后的数据展现如下：

{[1]2}
{[1,3]5}
{[2]1}
{[2,4]1}
{[0,2]1}
{[0,2,4]4}
{[0,4]2}
{[3]2}

则把上面的数据添加到FPTree树上的结果如下：

其相应的headerTable如下：

通过调试，打印出来的FPTree如下：

[FPTree
  -{attr:-1, id: 0, cnt:0}-0->-{attr:1, id: 1, cnt:7}-0->-{attr:3, id: 2, cnt:5}
                          -1->-{attr:2, id: 3, cnt:2}-0->-{attr:4, id: 4, cnt:1}
                          -2->-{attr:0, id: 5, cnt:7}-0->-{attr:2, id: 6, cnt:5}-0->-{attr:4, id: 7, cnt:4}
                                                     -1->-{attr:4, id: 8, cnt:2}
                          -3->-{attr:3, id: 9, cnt:2}

]

至此，FPTree就建好了，剩下的工作就是挖掘每棵FPTree了。

分享，快乐，成长

转载请注明出处：http://blog.csdn.net/fansy1990