第一章 1. 3 亲和性分析

第一章开启数据挖掘之旅

1.3 亲和性分析

1.3.1 什么是亲和性分析?

简而言之,就是顾客买了A之后是否会买B(A、B之前的购买规则之间的联系)的概率。

本部分涉及到支持度和置信度,这两个概念在例子中介绍。

1.3.2 案例

不同顾客购买五种商品的清单,分别是面包、牛奶、奶酪、苹果、香蕉,采用二维数组保存:行代表顾客一次购买记录,列代表某商品的购买记录,每个项表示某顾客有没有购买某商品 1表示购买,0表示没有购买;

以此为根据分析出合理的促销方案,即将哪两种商品放在一起同时销售的机会更大。

PS:我没有使用书上的很多专业词汇,一是我也是刚学,这些记得不是很熟;二是没有必要,学到最后必然会记住,不急于一时,大神勿喷。

1.3.3 代码分析及实现

例子1:输出数据集的前五行(在Numpy中采集数据)

代码:

import numpy as np

dataset_filename = "affinity_dataset.txt"

X = np.loadtxt(dataset_filename)

n_samples, n_features = X.shape

print("This dataset has {0} samples and {1}features".format(n_samples, n_features))

print(X[:5])


输出结果:

[[ 0.  0.  1. 1.  1.]

 [ 1.  1. 0.  1.  0.]

 [ 1.  0. 1.  1.  0.]

 [ 0.  0. 1.  1.  1.]

 [ 0.  1. 0.  0.  1.]]

总结:

单纯通过Numpy加在数据集并输出前5行。

例子2:输出购买了苹果的人的个数

代码:

# 定义特征的名字

features = ["bread", "milk","cheese", "apples", "bananas"]

# 输出购买了苹果的人的个数

num_apple_purchases = 0

for sample in X:

if sample[3] == 1:  # 购买苹果的人

       num_apple_purchases += 1

# 输出格式

print("{0} people boughtApples".format(num_apple_purchases))


输出:

36 people bought Apples

例子 3:输出购买了苹果之后又购买了香蕉的人

代码:

rule_valid = 0

rule_invalid = 0

for sample in X:

    if sample[3] ==1:  # This person bought Apples

        if sample[4] == 1:

            # This personbought both Apples and Bananas

            rule_valid +=1

        else:

            # This personbought Apples, but not Bananas

            rule_invalid+= 1

print("{0} cases of the rule being valid werediscovered".format(rule_valid))

print("{0} cases of the rule being invalid were discovered".format(rule_invalid))


输出:

21 cases of the rule being valid were discovered
15 cases of the rule being invalid were discovered

总结:

这就是一种规则 premise--->conclusion,语法与Java类似

例子 4:由上面的例子,输出支持度和置信度

支持度:

条件(premise)生效的次数

由上面的例子,21即为生效次数(前提premise是买了苹果)

置信度:

生效次数与条件(premise)出现次数的比值,即准确度。

EG:21/36

 

代码:

 

# 支持度和置信度

support = rule_valid

confidence = rule_valid / num_apple_purchases

print("The support is {0} and the confidence is{1:.3f}.".format(support, confidence))

# 将置信度显示为百分数

print("As a percentage, that is {0:.1f}%.".format(100* confidence))

输出:

The support is 21 and the confidence is 0.583.
As a percentage, that is 58.3%.

 

实例 1:计算支持度和置信度

代码:

from collections import defaultdict

# 生效次数、失效次数、条件出现次数,防止键不存在报错,使用defaultdict()

valid_rules = defaultdict(int)

invalid_rules = defaultdict(int)

num_occurences = defaultdict(int)

 

for sample in X:

# 条件:premise

for premise in range(n_features):

# 条件都不存在

        if sample[premise]== 0: continue

        # 记录该条件出现的次数

       num_occurences[premise] += 1

   # 结论:conclusion

        for conclusion inrange(n_features):

            # 即由某人买苹果推出他会买苹果,没有意义 X -> X.

            if premise ==conclusion:

                 Continue

            ifsample[conclusion] == 1:

                # 买premise后又买了conclusion,该规则生效一次

                valid_rules[(premise, conclusion)]+= 1

            else:

                # 买premise后没买了conclusion,该规则失效一次

               invalid_rules[(premise, conclusion)] += 1

# 支持度、置信度

support = valid_rules

confidence = defaultdict(float)

# 用前提和结论作为键去查询,(premise,conclusion)

for premise, conclusion in valid_rules.keys():

confidence[(premise, conclusion)] =valid_rules[(premise, conclusion)] / num_occurences[premise]

# 输出结果

for premise, conclusion in confidence:

    premise_name =features[premise]

    conclusion_name =features[conclusion]

    print("Rule: If aperson buys {0} they will also buy {1}".format(premise_name,conclusion_name))

    print(" -Confidence: {0:.3f}".format(confidence[(premise, conclusion)]))

    print(" -Support: {0}".format(support[(premise, conclusion)]))

    print("")


输出:取部分

Rule: If a person buys bread they will also buy milk
 - Confidence: 0.519
 - Support: 14
 
Rule: If a person buys milk they will also buy cheese
 - Confidence: 0.152
 - Support: 7
 
Rule: If a person buys apples they will also buy cheese
 - Confidence: 0.694
 - Support: 25
 
Rule: If a person buys milk they will also buy apples
 - Confidence: 0.196
 - Support: 9
 
Rule: If a person buys bread they will also buy apples
 - Confidence: 0.185
 - Support: 5

总结:

均通过定义计算,主要注意Python语法。

实例 2:定义输出函数

代码:

 

def print_rule(premise, conclusion, support, confidence,features):

   premise_name = features[premise]

   conclusion_name = features[conclusion]

# 简单代码

   print("Rule: If a person buys {0} they will also buy{1}".format(premise_name, conclusion_name))

   print(" - Confidence: {0:.3f}".format(confidence[(premise,conclusion)]))

   print(" - Support: {0}".format(support[(premise,conclusion)]))

   print("")


实例 3:排序找出最佳规则,分别针对支持度和置信度

说明:

针对支持度字典和置信度字典

items()函数返回字典的全部元素列表。

Itemgetter()作为键,itemgetter(1)表示支持度

Reverse=True 表示降序排列

代码:

# 支持度为键排序

from operator import itemgetter

sorted_support = sorted(support.items(), key=itemgetter(1),reverse=True)

# 输出支持度

for index in range(5):

    print("Rule#{0}".format(index + 1))

    (premise, conclusion)= sorted_support[index][0]

print_rule(premise, conclusion, support,confidence, features)

 

# 置信度为键排序

fsorted_confidence = sorted(confidence.items(),key=itemgetter(1), reverse=True)

# 输出置信度

for index in range(5):

    print("Rule#{0}".format(index + 1))

    (premise, conclusion)= sorted_confidence[index][0]

    print_rule(premise,conclusion, support, confidence, features)


输出:

支持度:
Rule #1
Rule: If a person buys cheese they will also buy bananas
 - Confidence: 0.659
 - Support: 27
 
Rule #2
Rule: If a person buys bananas they will also buy cheese
 - Confidence: 0.458
 - Support: 27
 
Rule #3
Rule: If a person buys apples they will also buy cheese
 - Confidence: 0.694
 - Support: 25
 
Rule #4
Rule: If a person buys cheese they will also buy apples
 - Confidence: 0.610
 - Support: 25
 
Rule #5
Rule: If a person buys bananas they will also buy apples
 - Confidence: 0.356
 - Support: 21

 

置信度:

Rule #1

Rule:If a person buys apples they will also buy cheese

 - Confidence: 0.694

 - Support: 25

 

Rule #2

Rule:If a person buys cheese they will also buy bananas

 - Confidence: 0.659

 - Support: 27

 

Rule #3

Rule:If a person buys bread they will also buy bananas

 - Confidence: 0.630

 - Support: 17

 

Rule #4

Rule:If a person buys cheese they will also buy apples

 - Confidence: 0.610

 - Support: 25

 

Rule #5

Rule:If a person buys apples they will also buy bananas

 - Confidence: 0.583

 - Support: 21

 

1.3.4 学习总结

通过置信度,我们可以发现一些商品被同时购买的几率很高,比如苹果和奶酪,那么我们可以在促销苹果的旁白摆上奶酪,这样买苹果的人又会去买奶酪(69.4%的人会这样),那么促销苹果的同时也提高了奶酪的销量。

 

 

 

 

 

 

 

 

 

 

 

 

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值