版权声明:本文为CSDN博主「谷雨逝」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_43919172/article/details/107018968
频繁项集挖掘(Frequent Itemset Mining)
序列挖掘(Sequence Mining)
频繁项集挖掘
- Apriori
Apriori是第一个基于自底向上的关联规则挖掘算法,它迭代地扫描着数据集。在每次迭代中,算法构建两个列表,一个候选列表和一个频繁列表,令其大小为k(kitemset)。随后,再下一次迭代中,算法增加k的大小,并更新候选列表与频繁列表。重复此过程直到常用列表中均不大于最小支持集。然后,满足最小支持条件的频繁项集将被分开,用于计算他们的置信度和最小项集,同时满足最低支持和最小置信度的将被作为频繁项集的结果返回。
code:
pip install efficient_apriori
from efficient_apriori import apriori
transactions = [('eggs', 'bread', 'soup'),
('eggs', 'bread', 'apple'),
('soup', 'bread', 'banana'),
('bread','banana','jam')]
# note support and confidence should be between 0 and 1
itemsets, rules = apriori(transactions, min_support=0.5,
min_confidence=0.8)
print(rules)
- FP-Growth
code:
import pyfpgrowth
transactions = [('eggs', 'bread', 'soup'),
('eggs', 'bread', 'apple'),
('soup', 'bread', 'banana'),
('bread','banana','jam')]
patterns = pyfpgrowth.find_frequent_patterns(transactions, 2)
patterns
code:
import pandas as pd
from fim import eclat
transactions = [('eggs', 'bread', 'soup'),
('eggs', 'bread', 'apple'),
('soup', 'bread', 'banana'),
(‘bread','banana','jam')]
rules = eclat(tracts = transactions, zmin = 3)
rules
Prefix Span(有序):
from prefixspan import PrefixSpan
db = [
['a', 'b', 'c', 'd', 'e'],
['b', 'b', 'b', 'd', 'e'],
['c', 'b', 'c', 'c', 'a'],
['b', 'b', 'b', 'c', 'c'],
]
ps = PrefixSpan(db)
print(ps.frequent(4))
print("-----------")
print(ps.frequent(3))
print("-----------")
print(ps.frequent(2))