Apriori算法|关联规则(AssociationRules)
最近的课程作业是做Apriori算法实例,上网查了一些资料,在这里做一个小结。
1.个人对关联规则的认识:
- 关联规则是一种无监督的机器学习方法,用于知识发现,而非预测。
- 关联规则一般用在购物篮案例、推荐电影等事务性相关场景中
2.Apriori算法的实现
- Python实现:使用mlxtend库实现(mlxtend库的网址:https://rasbt.github.io/mlxtend/)
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
frequent_itemsets = apriori(basket_sets2, min_support=0.05, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
- R实现:arules + arulesViz
- 注:相比较而言,R在Apriori算法的可视化分析这一块做的较好,因此本次课程作业采用R语言实现,使用“购物篮数据”
3.R语言代码
- 数据为购物篮数据
##加载程序包。
library(arules)
#arules是用于关联规则挖掘的程序包,
# 我们将调用其中的apriori函数和inspect函数,以及Income数据集。
library(arulesViz)
#arulesViz是用于关联规则可视化的程序包,我们将调用其中的plot函数。
data <- read.transactions("E:\\pythonProject\\购物篮原始数据.txt", format="basket", sep=",")
##查看数据集的摘要。
summary(data)
#查看前6项
inspect(head(data))
# m <- c(60,174,227,220,175,81,38,21,4)
# n <- c(0,1,2,3,4,5,6,7,8)
# barplot(m,names.arg=n,ylab='数量')
itemFreq <- itemFrequency(data)
sum(itemFreq) #本质上代表"平均一个transaction购买的item个数"
orderedItemFreq <- sort(itemFrequency(data), decreasing=T)
orderedItemFreq[1:11]
#可视化购物篮商品交易稀疏矩阵
#随机选取20个数据查看
# image(sample(data,50))
#选取前20个数据做可视化
#image(sample(data,20))
#可视化商品的支持度(商品频率)
#itemFrequencyPlot(data, support = 0.2) #支持度至少10%
itemFrequencyPlot(data,topN=11) #支持度
# #对Income数据集进行关联分析
# c1 <- seq(0.03,0.1,by=0.01)
# # c2 <- seq(0.3,0.5,by=0.1)
# for(m in c1)
# {
# rules <- apriori(data,parameter=list(support=m, confidence=0.3, minlen = 2,target="rules"))
# rules
# inspect(rules,by="lift")
# }
rules <- apriori(data,parameter=list(support=0.03,
confidence=0.3,
minlen = 2,
target="rules"))
summary(rules)
options(digits=4)
#设置输出小数位数为4位数
#查看规则
inspect(rules,by="lift")
##关联分析结果可视化
plot(rules)
#对关联规则的支持度、置信度和提升值进行可视化
#24条
rules_lift <- subset(rules, lift>2)
inspect(rules_lift,by="lift")
#支持度前五规则
#19
cannedvegrules <- subset(rules_lift, items %in% c("cannedveg"))
inspect(cannedvegrules,by="lift")
#20
frozerules <- subset(rules_lift, items %in% c("frozenmeal"))
inspect(frozerules,by="lift")
#4
fruitvegrules <- subset(rules_lift, items %in% c("fruitveg"))
inspect(fruitvegrules,by="lift")
#20
beerules <- subset(rules_lift, items %in% c("beer"))
inspect(beerules,by="lift")
#4
fishrules <- subset(rules_lift, items %in% c("fish"))
inspect(fishrules,by="lift")
#对规则里面最多的三种进行查看,三种都有,15条
f_t_rule <- subset(rules_lift, items %ain% c("cannedveg","frozenmeal","beer"))
inspect(f_t_rule,by="lift")
#分类查看
#啤酒&白酒 2条
brule <- subset(rules_lift, subset = (lhs %in% c("wine", "beer")))
bw_rule <- subset(brule, subset = (rhs %in% c("wine", "beer")))
inspect(bw_rule,by="lift")
#罐头蔬菜&罐头肉 2条
cannedrule <- subset(rules_lift, subset = (lhs %in% c("cannedveg", "cannedmeat")))
can_rule <- subset(cannedrule, subset = (rhs %in% c("cannedveg", "cannedmeat")))
inspect(can_rule,by="lift")
final_rule <- subset(f_t_rule, subset = !(lhs %in% c("cannedveg", "cannedmeat")&rhs %in% c("cannedveg", "cannedmeat")))
inspect(final_rule,by="lift")
final2_rule <- subset(final_rule, subset = !(lhs %in% c("wine", "beer") & rhs %in% c("wine", "beer")))
inspect(final2_rule,by="lift")
#删减完还剩13条
# beer_rule <- subset(final2_rule,subset = (rhs %in% c("beer")))
# inspect(beer_rule,by="lift")
#
# frozenmeal_rule <- subset(final2_rule,subset = (rhs %in% c("frozenmeal")))
# inspect(frozenmeal_rule,by="lift")
#
# cannedveg_rule <- subset(final2_rule,subset = (rhs %in% c("cannedveg")))
# inspect(cannedveg_rule,by="lift")
可参考的资料:
1、(Python)https://github.com/mattzheng/python-Apriori
2、(R)https://blog.csdn.net/gjwang1983/article/details/45015203