Apriori关联分析模型

1.目标:

熟练掌握使用Python语言实现Apriori关联分析模型的方法。

2.内容:

  • 使用mlxtend工具包得出频繁项集合规则
  • 自定义购物数据集
  • 设置支持度(support)来选择频繁项集
  • 计算规则:指定不同的衡量标准与最小阈值
  • 将数据转成one-hot编码
  • 最终输出结果

3.具体实施:

3.1使用mlxtend工具包得出频繁项集合规则
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
3.2自定义购物数据集
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
data = {
   'ID': [1, 2, 3, 4, 5, 6],
   'Onion': [1, 0, 0, 1, 1, 1],
   'Potato': [1, 1, 0, 1, 1, 1],
   'Burger': [1, 1, 0, 0, 1, 1],
   'Milk': [0, 1, 1, 1, 0, 1],
   'Beer': [0, 0, 1, 0, 1, 0]}
df = pd.DataFrame(data)
df = df[['ID', 'Onion', 'Potato', 'Burger', 'Milk', 'Beer']]
df
print(df);
3.3设置支持度来选择频繁项集
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
data = {
   'ID': [1, 2, 3, 4, 5, 6],
   'Onion': [1, 0, 0, 1, 1, 1],
   'Potato': [1, 1, 0, 1, 1, 1],
   'Burger': [1, 1, 0, 0, 1, 1],
   'Milk': [0, 1, 1, 1, 0, 1],
   'Beer': [0, 0, 1, 0, 1, 0]}
df = pd.DataFrame(data)
df = df[['ID', 'Onion', 'Potato', 'Burger', 'Milk', 'Beer']]
df
frequent_itemsets = apriori(df[['Onion', 'Potato', 'Burger', 'Milk', 'Beer']], min_support=0.50,use_colnames = True)
frequent_itemsets
print(frequent_itemsets);
3.4计算规则:指定不同的衡量标准与最小阈值
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
data = {
   'ID': [1, 2, 3, 4, 5, 6],
   'Onion': [1, 0, 0, 1, 1, 1],
   'Potato': [1, 1, 0, 1, 1, 1],
   'Burger': [1, 1, 0, 0, 1, 1],
   'Milk': [0, 1, 1, 1, 0, 1],
   'Beer': [0, 0, 1, 0, 1, 0]}
df = pd.DataFrame(data)
df = df[['ID', 'Onion', 'Potato', 'Burger', 'Milk', 'Beer']]
df
frequent_itemsets = apriori(df[['Onion', 'Potato', 'Burger', 'Milk', 'Beer']], min_support=0.50,use_colnames = True)
frequent_itemsets
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)
rules [(rules['lift'] > 1.125) & (rules['confidence'] > 0.8)  ]
rules
print(rules [(rules['lift'] > 1.125) & (rules['confidence'] > 0.8)  ]);
3.5将数据转成one-hot编码

import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
retail_shopping_basket ={'ID': [1, 2, 3, 4, 5, 6],
                         'Basket': [['Beer', 'Diaper', 'Pretzels', 'Chips', 'Aspirin'],
                                    ['Diaper', 'Beer', 'Chips', 'Lotion', 'Juice', 'BabyFood', 'Milk'],
                                    ['soda', 'Chips', 'Milk'],
                                    ['Soup', 'Beer', 'Diaper', 'Milk', 'IceCream'],
                                    ['Soda', 'Coffee', 'Milk', 'Bread'],
                                    ['Beer', 'Chips']
                                    ]
                         }
retail = pd.DataFrame(retail_shopping_basket)
retail = retail[['ID', 'Basket']]
pd.options.display.max_colwidth = 100
retail_id = retail.drop('Basket' , 1)
retail_id
retail_Basket = retail.Basket.str.join(',')
retail_Basket = retail_Basket.str.get_dummies(',')
retail_Basket
retail = retail_id.join(retail_Basket)
retail
3.6最终结果输出
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
retail_shopping_basket ={'ID': [1, 2, 3, 4, 5, 6],
                         'Basket': [['Beer', 'Diaper', 'Pretzels', 'Chips', 'Aspirin'],
                                    ['Diaper', 'Beer', 'Chips', 'Lotion', 'Juice', 'BabyFood', 'Milk'],
                                    ['soda', 'Chips', 'Milk'],
                                    ['Soup', 'Beer', 'Diaper', 'Milk', 'IceCream'],
                                    ['Soda', 'Coffee', 'Milk', 'Bread'],
                                    ['Beer', 'Chips']
                                    ]
                         }
retail = pd.DataFrame(retail_shopping_basket)
retail = retail[['ID', 'Basket']]
pd.options.display.max_colwidth = 100
retail_id = retail.drop('Basket', 1)
retail_id
retail_Basket = retail.Basket.str.join(',')
retail_Basket = retail_Basket.str.get_dummies(',')
retail_Basket
retail = retail_id.join(retail_Basket)
retail
frequent_itemsets_2 = apriori(retail.drop('ID', 1), use_colnames= True)
association_rules(frequent_itemsets_2, metric='lift')
frequent_itemsets_2

print(association_rules(frequent_itemsets_2, metric='lift'))

最终得出结论:{Diaper,Beer}更关联。

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
关联性分析常用的算法Apriori算法和FP-Growth算法,下面给出Apriori算法Python实现: ```python def loadDataSet(): """ 加载样本数据集 """ return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]] def createC1(dataSet): """ 构建单个物品的项集列表 """ C1 = [] for transaction in dataSet: for item in transaction: if [item] not in C1: C1.append([item]) C1.sort() return list(map(frozenset, C1)) def scanD(D, Ck, minSupport): """ 由候选项集生成符合最小支持度的项集 """ ssCnt = {} for tid in D: for can in Ck: if can.issubset(tid): if not can in ssCnt: ssCnt[can] = 1 else: ssCnt[can] += 1 numItems = float(len(D)) retList = [] supportData = {} for key in ssCnt: support = ssCnt[key] / numItems if support >= minSupport: retList.insert(0, key) supportData[key] = support return retList, supportData def aprioriGen(Lk, k): """ 合并Lk生成Ck """ retList = [] lenLk = len(Lk) for i in range(lenLk): for j in range(i+1, lenLk): L1 = list(Lk[i])[:k-2] L2 = list(Lk[j])[:k-2] L1.sort() L2.sort() if L1 == L2: retList.append(Lk[i] | Lk[j]) return retList def apriori(dataSet, minSupport=0.5): """ Apriori算法 """ C1 = createC1(dataSet) D = list(map(set, dataSet)) L1, supportData = scanD(D, C1, minSupport) L = [L1] k = 2 while (len(L[k-2]) > 0): Ck = aprioriGen(L[k-2], k) Lk, supK = scanD(D, Ck, minSupport) supportData.update(supK) L.append(Lk) k += 1 return L, supportData ``` 这里的`loadDataSet()`函数用于加载样本数据集;`createC1()`函数用于构建单个物品的项集列表;`scanD()`函数用于由候选项集生成符合最小支持度的项集;`aprioriGen()`函数用于合并Lk生成Ck;`apriori()`函数用于实现Apriori算法。 对于给定的数据集,可以通过调用`apriori()`函数来获取项集及其支持度。例如: ```python dataSet = loadDataSet() L, supportData = apriori(dataSet, minSupport=0.5) print("项集:", L) print("支持度:", supportData) ``` 输出结果为: ``` 项集: [[frozenset({5}), frozenset({2}), frozenset({3}), frozenset({1})], [frozenset({2, 3}), frozenset({2, 5}), frozenset({3, 5}), frozenset({1, 3})], [frozenset({2, 3, 5})], []] 支持度: {frozenset({5}): 0.75, frozenset({3}): 0.75, frozenset({2}): 0.75, frozenset({1}): 0.5, frozenset({2, 3}): 0.5, frozenset({2, 5}): 0.5, frozenset({1, 3}): 0.5, frozenset({3, 5}): 0.5, frozenset({2, 3, 5}): 0.25} ``` 其中,`L`表示项集列表,`supportData`表示项集及其支持度。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值