FP-growth算法基于Apriori构建,但是采取高级的数据结构减少扫描次数,它只需要对数据库进行两次扫描,而Apriori算法对于每个潜在的频繁项集都会扫描数据集判定给定模式是否频繁,因此Fp-growth算法速度比起Apriori算法要快很多。
Fp-growth算法挖掘频繁项集的基本过程如下:
(1)构建FP树
(2)从FP树中挖掘频繁项集
FP-growth算法将数据存储在一种称为FP树的紧凑数据结构中,FP代表频繁模式。图11-5给出一个FP树的例子
假设数据库中存在下面六条购物记录:
·r,z,h,j,p
·z,y,x,w,v,u,t,s
·z
·r,x,n,o,s
·y,r,x,z,q,t,p
·y,z,x,e,q,s,t,m
构建的FP树如下所示
本例中使用pyfgrowth,安装方法为
pip install pyfpgrowth
可以使用阿里云安装pyfpgrowth,安装过程如下所示
>pip install pyfpgrowth -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting pyfpgrowth
Downloading https://mirrors.aliyun.com/pypi/packages/d2/4c/8b7cd90b4118ff0286d6584909b99e1ca5642bdc9072fa5a8dd361c864a0/pyfpgrowth-1.0.tar.gz (1.6MB)
|████████████████████████████████| 1.6MB 77kB/s
Building wheels for collected packages: pyfpgrowth
Building wheel for pyfpgrowth (setup.py) ... done
Created wheel for pyfpgrowth: filename=pyfpgrowth-1.0-py2.py3-none-any.whl size=5482 sha256=8a6199bd422266effff519de00bb5794d64eeada76d0aa9b71fa15b22f9810f4
Stored in directory: C:\Users\liujiannan\AppData\Local\pip\Cache\wheels\64\b0\f9\63b97b5a690d0b1e331121e8288c3186a1e9ffa6c7196b3871
Successfully built pyfpgrowth
Installing collected packages: pyfpgrowth
Successfully installed pyfpgrowth-1.0
假设我们要从如下数据中挖掘频繁项集
transactions = [[1, 2, 5],
[2, 4],
[2, 3],
[1, 2, 4],
[1, 3],
[2, 3],
[1, 3],
[1, 2, 3, 5],
[1, 2, 3]]
fpgrowth的封装函数如下,support为支持度,minConf代表置信度
patterns = pyfpgrowth.find_frequent_patterns(transactions, support)
rules = pyfpgrowth.generate_association_rules(patterns, minConf)
本例中设置的支持度support设置为2,置信度设置为0.7
patterns = pyfpgrowth.find_frequent_patterns(transactions, support=2)
rules = pyfpgrowth.generate_association_rules(patterns, minConf=0.7)
完整代码如下
import pyfpgrowth
transactions = [[1, 2, 5],
[2, 4],
[2, 3],
[1, 2, 4],
[1, 3],
[2, 3],
[1, 3],
[1, 2, 3, 5],
[1, 2, 3]]
patterns = pyfpgrowth.find_frequent_patterns(transactions, 2)
rules = pyfpgrowth.generate_association_rules(patterns, 0.7)
print(rules)
运行结果如下
{(5,): ((1, 2), 1.0), (1, 5): ((2,), 1.0), (2, 5): ((1,), 1.0), (4,): ((2,), 1.0)}