Personal views on FP-Tree algorithms

It’s the very first time that I have learnt about FP-TREE item-set mining algorithm in the DATA MINING course. I was really intrigued by its unique design to reduce cost in traversing transactions in database. In comparison with APRIORI algorithm, which would look up the whole database for k times if there are k items at most in all transactions, fp-tree interact with date only twice, one for constructing frequency record for each item and the other for building up the tree.

How to make full use of transactions is the key to reduce cost of reading from database. Initially, I thought tree might work but my design was more of dictionary tree. If do so, there will be much more redundant nodes in such tree.Hence, we need to abstract infomation and eliminate redundancy in the tree.

Firstly, in comparison with Apriori algorithm, fp-tree is more transaction-oriented, which means it does not generate item sets consecutively instead it use record of frequency of single items in all transactions and produce item set at the end.
An example of FP-Tree在这里插入图片描述
As the graph illustrated above, each path in this tress represents a transaction read from database. What is worthwhile of mention is that the value of each node stands for counts in corresponding transactions,that is the number of children nodes it has. Intuively, the more frequently a single item appear in transactions, the more peers it is connected to. Thus, if we put such item (e.g. i2) in a higher position, it will have more descendents.

Once such tree is constructed, we could find the longest common frequent prefix for the target item. For example, if we would like to obtain frequent item sets containing item ‘i5’, we could search for node labeled ‘i5’ and sum up their values. As long as the sum of values reaches the support threshold( we set 2 as the threshold), the common ancestors in the tree of such nodes,i2 and i1, are the results.
illustration of how to find the longest common frequent prefix
In the final step, all we need to do is to combine prefix and its subsets with target single item. In this example, we finally get (i1,i5),(i2,i5) and (i1,i2,i5). And the reason for this step is that, any superset of a frequent subset must be frequent.

All in all, with the powerful data structure " tree", we are able to rearrange and record information from database efficiently and of course we could access desired records easily and less costly.

More details to be added in the coming days!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值