FP-growth算法应用实例(基于python)

我把关于FP-Growth的算法原理,python实现代码,以及代码解读放在了另外一篇文章:有兴趣可以看看。

这篇文章给出该算法的一个很酷的实例应用。我们将用到一个叫 kosarak.dat 的数据集,可以从这里下载。这份数据集包含将近100万条记录,对于展示FP-Growth算法的速度十分有效。该文件的每一行包含某个用户浏览过的新闻报道。用户和报道被编码成整数。

为了看起来方便,还是先放一下python的实现代码:

#FP-Growth实现代码
class treeNode:
    def __init__(self, nameValue, numOccur, parentNode):
        self.name = nameValue
        self.count = numOccur
        self.nodeLink = None
        self.parent = parentNode
        self.children = {}

    def inc(self, numOccur):
        self.count += numOccur

    def disp(self, ind=1):
        print '  '*ind, self.name, ' ', self.count
        for child in self.children.values():
            child.disp(ind+1)
def updateHeader(nodeToTest, targetNode):
    while nodeToTest.nodeLink != None:
        nodeToTest = nodeToTest.nodeLink
    nodeToTest.nodeLink = targetNode
def updateFPtree(items, inTree, headerTable, count):
    if items[0] in inTree.children:
        # 判断items的第一个结点是否已作为子结点
        inTree.children[items[0]].inc(count)
    else:
        # 创建新的分支
        inTr
  • 7
    点赞
  • 26
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
FP-growth算法是一种用于频繁模式挖掘的算法,常用于数据挖掘、市场分析和推荐系统等领域。下面是使用Python实现FP-growth算法的代码: 首先,需要导入相应的库: ``` from collections import defaultdict from itertools import chain from typing import List, Tuple ``` 接着,定义一些常量: ``` # 定义FP树节点 class FPTreeNode: def __init__(self, item=None, count=1, parent=None): self.item = item self.count = count self.parent = parent self.children = defaultdict(FPTreeNode) # 定义FP树 class FPTree: def __init__(self, transactions, support, root_value, root_count): self.frequent_items = self.find_frequent_items(transactions, support) self.headers = self.build_header_table(self.frequent_items) self.root = self.build_fptree(transactions, root_value, root_count, self.frequent_items, self.headers) # 定义FP-growth算法 class FPGrowth: def __init__(self, min_support=0.5, min_confidence=0.5): self.min_support = min_support self.min_confidence = min_confidence # 定义函数:寻找频繁项集 def find_frequent_items(self, transactions, support): items = defaultdict(lambda: 0) for transaction in transactions: for item in transaction: items[item] += 1 # 去除不符合最小支持度的项 items = dict((item, support) for item, support in items.items() if support >= support * len(transactions)) # 返回频繁项集 return items ``` 接着,实现构建FP树的函数: ``` # 定义函数:构建FP树 def build_fptree(self, transactions, root_value, root_count, frequent_items, headers): root = FPTreeNode(item=root_value, count=root_count) for transaction in transactions: sorted_items = sorted([item for item in transaction if item in frequent_items], key=lambda item: frequent_items[item], reverse=True) if len(sorted_items) > 0: self.insert_tree(sorted_items, root, headers) return root # 定义函数:插入节点到FP树中 def insert_tree(self, items, node, headers): if items[0] in node.children: child = node.children[items[0]] else: child = FPTreeNode(item=items[0], parent=node) headers[items[0]].append(child) node.children[items[0]] = child if len(items) > 1: self.insert_tree(items[1:], child, headers) child.count += 1 ``` 最后,实现FP-growth算法的主函数: ``` # 定义函数:寻找频繁模式 def find_frequent_patterns(self, transactions): if not transactions: return None # 构建FP树 support = self.min_support root_value = 'null' root_count = len(transactions) fp_tree = FPTree(transactions, support, root_value, root_count) # 寻找频繁项集和条件模式基 frequent_patterns = defaultdict(int) conditional_patterns = defaultdict(list) self.mine_patterns(fp_tree, fp_tree.header_table, frequent_patterns, conditional_patterns) # 返回频繁模式 return frequent_patterns # 定义函数:挖掘频繁项集 def mine_patterns(self, tree, headers, frequent_patterns, conditional_patterns): sorted_items = [item[0] for item in sorted(headers.items(), key=lambda x: x[1][0].count)] for item in sorted_items: base_patterns = [path(item_node) for item_node in headers[item]] frequent_patterns.update({tuple(pattern): headers[item][0].count for pattern in base_patterns}) conditional_tree = self.build_conditional_tree(base_patterns, headers[item]) if conditional_tree: self.mine_patterns(conditional_tree, conditional_tree.header_table, frequent_patterns, conditional_patterns) ``` 这样,我们就完成了使用Python实现FP-growth算法的代码。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值