Apriori, FP-Growth与PrefixSpan：频繁项集挖掘算法详解-CSDN博客

版权声明：本文为CSDN博主「谷雨逝」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/weixin_43919172/article/details/107018968

频繁项集挖掘（Frequent Itemset Mining）

序列挖掘（Sequence Mining）

频繁项集挖掘

Apriori

Apriori是第一个基于自底向上的关联规则挖掘算法，它迭代地扫描着数据集。在每次迭代中，算法构建两个列表，一个候选列表和一个频繁列表，令其大小为k（kitemset）。随后，再下一次迭代中，算法增加k的大小，并更新候选列表与频繁列表。重复此过程直到常用列表中均不大于最小支持集。然后，满足最小支持条件的频繁项集将被分开，用于计算他们的置信度和最小项集，同时满足最低支持和最小置信度的将被作为频繁项集的结果返回。

code:

pip install efficient_apriori
from efficient_apriori import apriori
transactions = [('eggs', 'bread', 'soup'),
 ('eggs', 'bread', 'apple'),
 ('soup', 'bread', 'banana'),
 ('bread','banana','jam')]
# note support and confidence should be between 0 and 1
itemsets, rules = apriori(transactions, min_support=0.5,
min_confidence=0.8)
print(rules)

FP-Growth

code:

import pyfpgrowth
transactions = [('eggs', 'bread', 'soup'),
 ('eggs', 'bread', 'apple'),
 ('soup', 'bread', 'banana'),
 ('bread','banana','jam')]
patterns = pyfpgrowth.find_frequent_patterns(transactions, 2)
patterns

code:

import pandas as pd
from fim import eclat
transactions = [('eggs', 'bread', 'soup'),
 ('eggs', 'bread', 'apple'),
 ('soup', 'bread', 'banana'),
 (‘bread','banana','jam')]
rules = eclat(tracts = transactions, zmin = 3)
rules

Prefix Span（有序）:

from prefixspan import PrefixSpan
db = [
 ['a', 'b', 'c', 'd', 'e'],
 ['b', 'b', 'b', 'd', 'e'],
 ['c', 'b', 'c', 'c', 'a'],
 ['b', 'b', 'b', 'c', 'c'],
]
ps = PrefixSpan(db)
print(ps.frequent(4))
print("-----------")
print(ps.frequent(3))
print("-----------")
print(ps.frequent(2))