python 关联规则(Association Rules)理论、Apriori算法案例实战

最新推荐文章于 2024-05-31 16:54:13 发布

置顶蓝翔厨师长

最新推荐文章于 2024-05-31 16:54:13 发布

阅读量3.3k

点赞数 4

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/qq_38415758/article/details/108694859

版权

python 专栏收录该内容

28 篇文章 1 订阅

订阅专栏

关联规则、Apriori算法原理及实战

关联规则中三个重要知识点
Apriori算法原理
Apriori算法应用

关联规则中三个重要知识点

某家水果店的订单清单如下：

购物单号	购买的水果
1	苹果、香蕉、梨
2	苹果、香蕉、梨、芒果
3	香蕉、梨、芒果、水蜜桃
4	苹果、芒果
5	苹果、水蜜桃

支持度：百分比数，表示一个商品组合出现的次数与总次数之间的比值，支持度越高，说明组合出现的概率越高。
Support(A) = N(A)/N
'苹果’的支持度为：Support(A) = 4/5 = 0.8
‘苹果、香蕉’的支持度为：Support(AB) = 2/5 = 0.4

置信度：条件概率，指购买了一个商品组合后，购买另一个组合的
概率。
Cofident(A=>B) = N(AB)/N(A) = P(AB)/P(A)
‘苹果=>香蕉’的置信度： Cofident(A=>B) = 2/4 = 0.5
'香蕉=>梨’的置信度：Cofident(B=>C) = 3/3 = 1

提升度：一个商品组合出现，对另一个商品组合出现概率的提升。
当提升度大于1，代表有提升；
当提升度等于1，代表没提升也没有下降；
当提升度小于1，代表下降。
lift(A=>B)= Cofident(A=>B)/Support(B)
‘苹果=>香蕉’的置信度:lift(A=>B) = 0.5/0.6<1

Apriori算法原理

Apriori算法原理就是查找频繁项集（frequent itemset）的过程。
频繁项集：支持度大于等于最小支持度（Min Support）的项集。
非频繁项集：支持度小于最小支持度的项集。

Apriori算法流程

step1：k=1，计算k项集的支持度；
step2：提出支持度小于最小支持度的项集；
step3：如果项集为空，将k-1项设置为最终结果。
否则k=k+1，重复step1-step3.

Apriori算法案例

将上述案例用ID来表示，苹果、香蕉、梨、芒果、水蜜桃分别用商品ID1，2，3，4，5来表示。

购物单号	购买的水果
1	1、2、3
2	1、2、3、4
3	2、3、4、5
4	1、4
5	1、5

1.计算k=1的支持度。

水果项集	支持度
1	4/5
2	3/5
3	3/5
4	3/5
5	2/5

2.设定最小阈值为0.5，进行剔除：

水果项集	支持度
1	4/5
2	3/5
3	3/5
4	3/5

3.计算k=2的支持度。

水果项集	支持度
1、2	2/5
1、3	2/5
1、4	2/5
2、3	3/5
2、4	2/5
3、4	2/5

4.进行剔除：

水果项集	支持度
2、3	3/5

得到k=2的频繁项集{2、3}，{香蕉、梨}的组合

Apriori算法应用

数据集为：Marker_Basket（购物篮）
数据来源：https://www.kaggle.com/dragonheir/basket-optimisation
1.efficient_apriori

#导入efficient_apriori
import pandas as pd
from efficient_apriori import apriori
from time import clock

data = pd.read_csv('./Market_Basket_Optimisation.csv',header= None)
#对数据进行预处理,将数据处理成transactions
start = clock()
transactions = []
for i in range(data.shape[0]):
    temp = set()
    for j in range(data.shape[1]):
        if str(data.values[i,j]) == 'nan':
            continue
        temp.add(str(data.values[i,j]))
    transactions.append(temp)
#设置最小支持度0.04，最小置信度0.02
itemsets,rules = apriori(transactions,min_support = 0.04,min_confidence = 0.02)
end = clock()
print('频繁项集:',itemsets)
print('关联规则:',rules)
print('运行时间:',end-start)

2.mlxtend

#导入mlxtend
from mlxtend.frequent_patterns import apriori
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import association_rules
import pandas as pd
from time import clock
data = pd.read_csv('./Market_Basket_Optimisation.csv',header= None)

start  = clock()
transactions = []
for i in range(data.shape[0]):
    temp = set()
    for j in range(data.shape[1]):
        if str(data.values[i,j]) == 'nan':
            continue
        temp.add(str(data.values[i,j]))
    transactions.append(temp)

#one-hot编码
te =  TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
transactions_ml = pd.DataFrame(te_ary,columns = te.columns_)

#设置最小支持度0.03
itemsets = apriori(transactions_ml,min_support = 0.03,use_colnames=True)
#从大到小排序
itemsets = itemsets.sort_values(by = 'support',ascending = False)
#设置最小提升度1.1
rules = association_rules(itemsets,metric = 'lift',min_threshold = 1.1)
#从大到小排序
rules = rules.sort_values(by = 'lift',ascending = False)

end = clock()
print('频繁项集:',itemsets)
print('关联规则:',rules)
print('运行时间:',end-start)

蓝翔厨师长

关注

4
点赞
踩
39

收藏

觉得还不错? 一键收藏
打赏
2
评论
python 关联规则(Association Rules)理论、Apriori算法案例实战

关联规则、Apriori算法原理及实战关联规则中三个重要知识点Apriori算法原理Apriori算法原理Apriori算法流程Apriori算法案例Apriori算法应用关联规则中三个重要知识点某家水果店的订单清单如下：购物单号购买的水果1苹果、香蕉、梨2苹果、香蕉、梨、芒果3香蕉、梨、芒果、水蜜桃4苹果、芒果5苹果、水蜜桃支持度：百分比数，表示一个商品组合出现的次数与总次数之间的比值，支持度越高，说明组合出现的概率越高。Support(
复制链接

扫一扫