Personal views on FP-Tree algorithms

Amber L

于 2020-04-25 13:23:43 发布

阅读量163

点赞数

分类专栏： data mining

本文链接：https://blog.csdn.net/weixin_43227712/article/details/105708927

版权

data mining 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

It’s the very first time that I have learnt about FP-TREE item-set mining algorithm in the DATA MINING course. I was really intrigued by its unique design to reduce cost in traversing transactions in database. In comparison with APRIORI algorithm, which would look up the whole database for k times if there are k items at most in all transactions, fp-tree interact with date only twice, one for constructing frequency record for each item and the other for building up the tree.

How to make full use of transactions is the key to reduce cost of reading from database. Initially, I thought tree might work but my design was more of dictionary tree. If do so, there will be much more redundant nodes in such tree.Hence, we need to abstract infomation and eliminate redundancy in the tree.

Firstly, in comparison with Apriori algorithm, fp-tree is more transaction-oriented, which means it does not generate item sets consecutively instead it use record of frequency of single items in all transactions and produce item set at the end.
An example of FP-Tree在这里插入图片描述
As the graph illustrated above, each path in this tress represents a transaction read from database. What is worthwhile of mention is that the value of each node stands for counts in corresponding transactions,that is the number of children nodes it has. Intuively, the more frequently a single item appear in transactions, the more peers it is connected to. Thus, if we put such item (e.g. i2) in a higher position, it will have more descendents.

Once such tree is constructed, we could find the longest common frequent prefix for the target item. For example, if we would like to obtain frequent item sets containing item ‘i5’, we could search for node labeled ‘i5’ and sum up their values. As long as the sum of values reaches the support threshold( we set 2 as the threshold), the common ancestors in the tree of such nodes,i2 and i1, are the results.
illustration of how to find the longest common frequent prefix
In the final step, all we need to do is to combine prefix and its subsets with target single item. In this example, we finally get (i1,i5),(i2,i5) and (i1,i2,i5). And the reason for this step is that, any superset of a frequent subset must be frequent.

All in all, with the powerful data structure " tree", we are able to rearrange and record information from database efficiently and of course we could access desired records easily and less costly.

More details to be added in the coming days!

Amber L

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Personal views on FP-Tree algorithms

It’s the very first time that I have learnt about FP-TREE item-set mining algorithm in the DATA MINING course. I was really intrigued by its unique design to reduce cost in traversing transactions in ...
复制链接

扫一扫