天池学习赛——基于Apriori算法的商品频繁项集与关联规则的挖掘

最新推荐文章于 2024-05-18 15:53:33 发布

hyk今天写算法了吗

最新推荐文章于 2024-05-18 15:53:33 发布

阅读量685

点赞数 1

分类专栏： # Python数据分析文章标签：算法 python 阿里云关联分析数据分析

本文链接：https://blog.csdn.net/m0_52000372/article/details/125562375

版权

Python数据分析专栏收录该内容

6 篇文章 9 订阅

订阅专栏

赛题背景

赛题以购物篮分析为背景，要求选手对品牌的历史订单数据，挖掘频繁项集与关联规则。通过这道赛题，鼓励学习者利用订单数据，为企业提供销售策略，产品关联组合，为企业提升销量的同时，也为消费者提供更适合的商品推荐。

赛题数据

数据源：order.csv，product.csv，customer.csv，date.csv ，分别为订单表，产品表，客户表，日期表

赛题任务

现在需要你使用关联分析（比如Apriori算法）挖掘订单中的频繁项集及关联规则
说明：
1）频繁项集、关联规则的计算会用到支持度、置信度、提升度等指标，
2）频繁项集：即大于最小支持度的商品或商品组合
3）关联规则：在频繁项集中，满足最小置信度，或最小提升度的推荐规则

解答

from efficient_apriori import apriori
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df_order = pd.read_csv("../download/order.csv", encoding='gbk')
df_customer = pd.read_csv("../download/customer.csv", encoding='gbk')
df_date = pd.read_csv("../download/date.csv", encoding='gbk')
df_product = pd.read_csv("../download/product.csv", encoding='gbk')
df_order['订单日期']=pd.to_datetime(df_order['订单日期'])
print(df_order.head())

在这里插入图片描述

df_order = df_order.groupby(['客户ID'])['产品名称'].unique()
print(df_order.head())
# 将所有交易订单追加到列表
transactions = []
for value in df_order:
    transactions.append(list(value))
import time
# 挖掘频繁项集和频繁规则
start = time.time()
itemsets, rules = apriori(transactions, min_support=0.03,  min_confidence=0.3)
print("频繁项集：", itemsets)
print("关联规则：", rules)
end = time.time()
print("用时：",end-start)

在这里插入图片描述

import operator
#分别根据置信度、支持度、提升度对规则进行排序
confidence = dict()
support = dict()
lift = dict()
for rule in rules:
    confidence[rule] = rule.confidence
    support[rule] = rule.support    
    lift[(rule)] = rule.lift    
    #rule.lhs,rule.rhs可用作提取规则的前件、后件
rules_sortbycon = sorted(confidence.items(),key=operator.itemgetter(1),reverse = True)
rules_sortbysup = sorted(support.items(),key=operator.itemgetter(1),reverse = True)
rules_sortbylift = sorted(lift.items(),key=operator.itemgetter(1),reverse = True)