第一章 1. 3 亲和性分析

最新推荐文章于 2023-02-23 10:38:16 发布

TututuXXX

最新推荐文章于 2023-02-23 10:38:16 发布

阅读量1k

点赞数 2

分类专栏：数据挖掘

本文链接：https://blog.csdn.net/xy88115211/article/details/79112127

版权

数据挖掘专栏收录该内容

4 篇文章 0 订阅

订阅专栏

第一章开启数据挖掘之旅

1.3 亲和性分析

1.3.1 什么是亲和性分析？

简而言之,就是顾客买了A之后是否会买B(A、B之前的购买规则之间的联系)的概率。

本部分涉及到支持度和置信度，这两个概念在例子中介绍。

1.3.2 案例

不同顾客购买五种商品的清单，分别是面包、牛奶、奶酪、苹果、香蕉，采用二维数组保存：行代表顾客一次购买记录，列代表某商品的购买记录，每个项表示某顾客有没有购买某商品 1表示购买，0表示没有购买；

以此为根据分析出合理的促销方案，即将哪两种商品放在一起同时销售的机会更大。

PS：我没有使用书上的很多专业词汇，一是我也是刚学，这些记得不是很熟；二是没有必要，学到最后必然会记住，不急于一时，大神勿喷。

1.3.3 代码分析及实现

例子1：输出数据集的前五行（在Numpy中采集数据）

代码：

import numpy as np

dataset_filename = "affinity_dataset.txt"

X = np.loadtxt(dataset_filename)

n_samples, n_features = X.shape

print("This dataset has {0} samples and {1}features".format(n_samples, n_features))

print(X[:5])

输出结果：

[[ 0. 0. 1. 1. 1.]

[ 1. 1. 0. 1. 0.]

[ 1. 0. 1. 1. 0.]

[ 0. 0. 1. 1. 1.]

[ 0. 1. 0. 0. 1.]]

总结：

单纯通过Numpy加在数据集并输出前5行。

例子2：输出购买了苹果的人的个数

代码：

# 定义特征的名字

features = ["bread", "milk","cheese", "apples", "bananas"]

# 输出购买了苹果的人的个数

num_apple_purchases = 0

for sample in X:

if sample[3] == 1:  # 购买苹果的人

       num_apple_purchases += 1

# 输出格式

print("{0} people boughtApples".format(num_apple_purchases))

输出：

36 people bought Apples

例子 3：输出购买了苹果之后又购买了香蕉的人

代码：

rule_valid = 0

rule_invalid = 0

for sample in X:

    if sample[3] ==1:  # This person bought Apples

        if sample[4] == 1:

            # This personbought both Apples and Bananas

            rule_valid +=1

        else:

            # This personbought Apples, but not Bananas

            rule_invalid+= 1

print("{0} cases of the rule being valid werediscovered".format(rule_valid))

print("{0} cases of the rule being invalid were discovered".format(rule_invalid))

输出：

21 cases of the rule being valid were discovered

15 cases of the rule being invalid were discovered

总结：

这就是一种规则 premise--->conclusion，语法与Java类似

例子 4：由上面的例子，输出支持度和置信度

支持度：

条件（premise）生效的次数

由上面的例子，21即为生效次数（前提premise是买了苹果)

置信度：

生效次数与条件（premise）出现次数的比值，即准确度。

EG：21/36

代码：

# 支持度和置信度

support = rule_valid

confidence = rule_valid / num_apple_purchases

print("The support is {0} and the confidence is{1:.3f}.".format(support, confidence))

# 将置信度显示为百分数

print("As a percentage, that is {0:.1f}%.".format(100* confidence))

输出：

The support is 21 and the confidence is 0.583.

As a percentage, that is 58.3%.

实例 1：计算支持度和置信度

代码：

from collections import defaultdict

# 生效次数、失效次数、条件出现次数，防止键不存在报错，使用defaultdict()

valid_rules = defaultdict(int)

invalid_rules = defaultdict(int)

num_occurences = defaultdict(int)

 

for sample in X:

# 条件：premise

for premise in range(n_features):

# 条件都不存在

        if sample[premise]== 0: continue

        # 记录该条件出现的次数

       num_occurences[premise] += 1

   # 结论：conclusion

        for conclusion inrange(n_features):

            # 即由某人买苹果推出他会买苹果，没有意义 X -> X.

            if premise ==conclusion:

                 Continue

            ifsample[conclusion] == 1:

                # 买premise后又买了conclusion，该规则生效一次

                valid_rules[(premise, conclusion)]+= 1

            else:

                # 买premise后没买了conclusion，该规则失效一次

               invalid_rules[(premise, conclusion)] += 1

# 支持度、置信度

support = valid_rules

confidence = defaultdict(float)

# 用前提和结论作为键去查询，(premise,conclusion)

for premise, conclusion in valid_rules.keys():

confidence[(premise, conclusion)] =valid_rules[(premise, conclusion)] / num_occurences[premise]

# 输出结果

for premise, conclusion in confidence:

    premise_name =features[premise]

    conclusion_name =features[conclusion]

    print("Rule: If aperson buys {0} they will also buy {1}".format(premise_name,conclusion_name))

    print(" -Confidence: {0:.3f}".format(confidence[(premise, conclusion)]))

    print(" -Support: {0}".format(support[(premise, conclusion)]))

    print("")

输出：取部分

Rule: If a person buys bread they will also buy milk

 - Confidence: 0.519

 - Support: 14

Rule: If a person buys milk they will also buy cheese

 - Confidence: 0.152

 - Support: 7

Rule: If a person buys apples they will also buy cheese

 - Confidence: 0.694

 - Support: 25

Rule: If a person buys milk they will also buy apples

 - Confidence: 0.196

 - Support: 9

Rule: If a person buys bread they will also buy apples

 - Confidence: 0.185

 - Support: 5

总结：

均通过定义计算，主要注意Python语法。

实例 2：定义输出函数

代码：

def print_rule(premise, conclusion, support, confidence,features):

   premise_name = features[premise]

   conclusion_name = features[conclusion]

# 简单代码

   print("Rule: If a person buys {0} they will also buy{1}".format(premise_name, conclusion_name))

   print(" - Confidence: {0:.3f}".format(confidence[(premise,conclusion)]))

   print(" - Support: {0}".format(support[(premise,conclusion)]))

   print("")

实例 3：排序找出最佳规则，分别针对支持度和置信度

说明：

针对支持度字典和置信度字典

items()函数返回字典的全部元素列表。

Itemgetter()作为键，itemgetter(1)表示支持度

Reverse=True 表示降序排列

代码：

# 支持度为键排序

from operator import itemgetter

sorted_support = sorted(support.items(), key=itemgetter(1),reverse=True)

# 输出支持度

for index in range(5):

    print("Rule#{0}".format(index + 1))

    (premise, conclusion)= sorted_support[index][0]

print_rule(premise, conclusion, support,confidence, features)

 

# 置信度为键排序

fsorted_confidence = sorted(confidence.items(),key=itemgetter(1), reverse=True)

# 输出置信度

for index in range(5):

    print("Rule#{0}".format(index + 1))

    (premise, conclusion)= sorted_confidence[index][0]

    print_rule(premise,conclusion, support, confidence, features)

输出：

支持度：

Rule #1

Rule: If a person buys cheese they will also buy bananas

 - Confidence: 0.659

 - Support: 27

Rule #2

Rule: If a person buys bananas they will also buy cheese

 - Confidence: 0.458

 - Support: 27

Rule #3

Rule: If a person buys apples they will also buy cheese

 - Confidence: 0.694

 - Support: 25

Rule #4

Rule: If a person buys cheese they will also buy apples

 - Confidence: 0.610

 - Support: 25

Rule #5

Rule: If a person buys bananas they will also buy apples

 - Confidence: 0.356

 - Support: 21

置信度：

Rule #1

Rule:If a person buys apples they will also buy cheese

- Confidence: 0.694

- Support: 25

Rule #2

Rule:If a person buys cheese they will also buy bananas

- Confidence: 0.659

- Support: 27

Rule #3

Rule:If a person buys bread they will also buy bananas

- Confidence: 0.630

- Support: 17

Rule #4

Rule:If a person buys cheese they will also buy apples

- Confidence: 0.610

- Support: 25

Rule #5

Rule:If a person buys apples they will also buy bananas

- Confidence: 0.583

- Support: 21

1.3.4 学习总结

通过置信度，我们可以发现一些商品被同时购买的几率很高，比如苹果和奶酪，那么我们可以在促销苹果的旁白摆上奶酪，这样买苹果的人又会去买奶酪（69.4%的人会这样），那么促销苹果的同时也提高了奶酪的销量。

TututuXXX

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
第一章 1. 3 亲和性分析

第一章开启数据挖掘之旅1.3 亲和性分析1.3.1 什么是亲和性分析？简而言之,就是顾客买了A之后是否会买B(A、B之前的购买规则之间的联系)的概率。本部分涉及到支持度和置信度，这两个概念在例子中介绍。1.3.2 案例不同顾客购买五种商品的清单，分别是面包、牛奶、奶酪、苹果、香蕉，采用二维数组保存：行代表顾客一次购买记录，列代表某商品的购买记录，每个项表示某顾客有没有购买某
复制链接

扫一扫