数据挖掘十大经典算法(4) The Apriori algorithm

 Apriori算法是一种最有影响的挖掘布尔关联规则频繁项集的算法。其核心是基于两阶段频集思想的递推算法。该关联规则在分类上属于单维、单层、布尔关联规则。在这里,所有支持度大于最小支持度的项集称为频繁项集,简称频集。

 

Apriori演算法所使用的前置统计量包括了:

  • 最大规则物件数:规则中物件组所包含的最大物件数量
  • 最小支援:规则中物件或是物件组必顸符合的最低案例数
  • 最小信心水准:计算规则所必须符合的最低信心水准门槛

  该算法的基本思想是:首先找出所有的频集,这些项集出现的频繁性至少和预定义的最小支持度一 样。然后由频集产生强关联规则,这些规则必须满足最小支持度和最小可信度。然后使用第1步找到的频集产生期望的规则,产生只包含集合的项的所有规则,其中 每一条规则的右部只有一项,这里采用的是中规则的定义。一旦这些规则被生成,那么只有那些大于用户给定的最小可信度的规则才被留下来。为了生成所有频集, 使用了递推的方法。
  可能产生大量的候选集,以及可能需要重复扫描数据库,是Apriori算法的两大缺点。

 

 

Building the FP-tree: Transaction ID #1: apple, banana, coca-cola, doughnut ``` root | a | p | p - b | | | c | | | d ``` Transaction ID #2: banana, coca-cola ``` root | a | p - b - c | | | d ``` Transaction ID #3: banana, doughnut ``` root | a | p - b - c | | | | | d | | | d ``` Transaction ID #4: apple, coca-cola ``` root | a - c | | | p - b - c | | | | | d | | | d ``` Transaction ID #5: apple, banana, doughnut ``` root | a - b - d | | | | | c | | | p - b - c | | | d | b - d | c ``` Transaction ID #6: apple, banana, coca-cola ``` root | a - b - c | | | | | d | | | p - b - c | | | d | b - d | c ``` Using the FP-Growth algorithm to discover frequent itemsets: Starting with the most frequent item (d): - d (4) - b-d (3) - c-b-d (2) - a-b-d (2) - a-p-b-d (2) Next, starting with the next most frequent item (b): - b (4) - a-b (3) - p-b (3) - c-b (2) - a-p-b (2) - c-b-d (2) - a-b-d (2) - a-p-b-d (2) Finally, starting with the least frequent item (c): - c (3) - b-c (2) - a-b-c (2) - p-b-c (2) - c-b-d (2) - a-b-d (2) - a-p-b-d (2) All sets of frequent itemsets with minimum support of 2 are: - {d} (4) - {b} (4) - {c} (3) - {a, d} (2) - {b, d} (3) - {p, b, d} (2) - {c, b, d} (2) - {a, b, d} (2) - {a, p, b, d} (2) - {a, b} (3) - {p, b} (3) - {c, b} (2) - {a, p, b} (2) - {c, b, d} (2) - {a, b, d} (2) - {a, p, b, d} (2) - {a, c, b} (2) - {p, c, b} (2) - {a, p, c, b} (2) Using the Apriori algorithm to verify the frequent itemsets with minimum support of 2: Starting with 1-itemsets: - {apple} (3) - {banana} (4) - {coca-cola} (3) - {doughnut} (4) Next, starting with 2-itemsets: - {apple, banana} (2) - {apple, coca-cola} (1) - {apple, doughnut} (2) - {banana, coca-cola} (2) - {banana, doughnut} (2) - {coca-cola, doughnut} (2) Finally, starting with 3-itemsets: - {apple, banana, doughnut} (2) All sets of frequent itemsets with minimum support of 2 are: - {banana} (4) - {doughnut} (4) - {apple} (3) - {coca-cola} (3) - {banana, doughnut} (2) - {apple, doughnut} (2) - {apple, banana} (2) - {banana, coca-cola} (2) - {coca-cola, doughnut} (2) - {apple, banana, doughnut} (2) The Apriori algorithm generates the same set of frequent itemsets with minimum support of 2 as the FP-Growth algorithm. Deriving all association rules with 70% minimum confidence for the frequent itemset {Apple, Banana, Doughnut}: First, find all the subsets of {Apple, Banana, Doughnut}: - {Apple, Banana} - {Apple, Doughnut} - {Banana, Doughnut} - {Apple} - {Banana} - {Doughnut} Next, calculate the confidence for each rule: - {Apple, Banana} -> {Doughnut} (2/2 = 100%) - {Apple, Doughnut} -> {Banana} (2/2 = 100%) - {Banana, Doughnut} -> {Apple} (2/2 = 100%) - {Apple} -> {Banana, Doughnut} (2/3 = 67%) - {Banana} -> {Apple, Doughnut} (2/4 = 50%) - {Doughnut} -> {Apple, Banana} (2/4 = 50%) All association rules with minimum confidence of 70% for the frequent itemset {Apple, Banana, Doughnut} are: - {Apple, Banana} -> {Doughnut} - {Apple, Doughnut} -> {Banana} - {Banana, Doughnut} -> {Apple}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值