亲和度分析：商品推荐（Python实现）

最新推荐文章于 2022-12-11 16:53:01 发布

迷小超

最新推荐文章于 2022-12-11 16:53:01 发布

阅读量1.7k

点赞数

文章标签：大数据数据分析

本文链接：https://blog.csdn.net/weixin_44153007/article/details/106265564

版权

亲和性分析：

数据挖掘有个常见的应用场景，即顾客在购买一件商品时，商家可以趁机了解他们还想买什么，以便把多数顾客愿意同时购买的商品放到一起销售以提升销售额。当商家收集到足够多的数据时，就可以对其进行亲和性分析，以确定哪些商品适合放在一起出售。

本质：根据样本个体（物体）之间的相似度，确定它们关系的亲疏。

接下来这个实例是对5种商品，人们买了其中一种，还会买另外一种商品的可能
要发掘可能，就必须得创建规则
如果顾客购买了商品X，那么他们可能愿意购买商品Y（这就是我们的规则）

每条规则都存在支持度和可信度（置信度）
支持度：给定规则应验的比例例：一共有10个人，喜欢苹果的有5人，那么苹果的支持度就是5/10
可信度：规则准确率如何（每条规则的正确可能性）

具体代码如下：

#coding:utf-8
import numpy as np
dataset_filename = "affinity_dataset.txt" #导入的样本集
X=np.loadtxt(dataset_filename)
n_samples, n_features = X.shape #n_samples 样本数 n_features 样本的特征
#print n_features,n_samples

# The names of the features, for your reference.
#面包 牛奶 奶酪 苹果 香蕉
features = ["bread", "milk", "cheese", "apples", "bananas"]

print X[:5]

[[0. 0. 1. 1. 1.]
 [1. 1. 0. 1. 0.]
 [1. 0. 1. 1. 0.]
 [0. 0. 1. 1. 1.]
 [0. 1. 0. 0. 1.]]

#First, how many rows contain our premise: that a person buying apples
#premise 前提
#计算出苹果的支持度
num_apple_purchases = 0
for sample in X:
    if sample[3] == 1: #This person bought Apples
        num_apple_purchases += 1
print "{0} people bought Apples".format(num_apple_purchases)#苹果的支持度=num_apple_purchases/sum_person

36 people bought Apples

#将规则和每个特征出现的次数设置成整形字典
from collections import defaultdict
valid_rules = defaultdict(int)
invalid_rules = defaultdict(int)
num_occurances = defaultdict(int)

#样本遍历，计算出每个特征的有效规则和无效规则
for sample in X:
    for premise in range(n_features):
        if sample[premise]==0:
            continue
        num_occurances[premise] += 1
        for conclusion in range(n_features):
            if premise == conclusion:  #将因果相同的去除，例：买了苹果最可能买的还是苹果
                continue
            if sample[conclusion] == 1:
                valid_rules[(premise,conclusion)]+=1
            else:
                invalid_rules[(premise, conclusion)]+=1

#有效规则即支持度
support = valid_rules
confidence = defaultdict(float)

#置信度的计算
for premise, conclusion in valid_rules.keys():
    rule = (premise, conclusion)   #前提 结果的特征索引值
    confidence[rule] = float(valid_rules[rule]) / num_occurances[premise]

这是打印所有的规则


for premise, conclusion in confidence:
    premise_name = features[premise]
    conclusion_name = features[conclusion]
    print "Rule: If a person buys {0}\
    they will also buy {1}".format(premise_name,conclusion_name)
    print " - Support: {0}".format(support[(premise,conclusion)])
    print " - COndidence: {0:.3f}".format(confidence[(premise,conclusion)])

Rule: If a person buys bread    they will also buy milk
 - Support: 14
 - COndidence: 0.519
Rule: If a person buys milk    they will also buy cheese
 - Support: 7
 - COndidence: 0.152
Rule: If a person buys apples    they will also buy cheese
 - Support: 25
 - COndidence: 0.694
Rule: If a person buys milk    they will also buy apples
 - Support: 9
 - COndidence: 0.196
Rule: If a person buys bread    they will also buy apples
 - Support: 5
 - COndidence: 0.185
Rule: If a person buys apples    they will also buy bread
 - Support: 5
 - COndidence: 0.139
Rule: If a person buys apples    they will also buy bananas
 - Support: 21
 - COndidence: 0.583
Rule: If a person buys apples    they will also buy milk
 - Support: 9
 - COndidence: 0.250
Rule: If a person buys milk    they will also buy bananas
 - Support: 19
 - COndidence: 0.413
Rule: If a person buys cheese    they will also buy bananas
 - Support: 27
 - COndidence: 0.659
Rule: If a person buys cheese    they will also buy bread
 - Support: 4
 - COndidence: 0.098
Rule: If a person buys cheese    they will also buy apples
 - Support: 25
 - COndidence: 0.610
Rule: If a person buys cheese    they will also buy milk
 - Support: 7
 - COndidence: 0.171
Rule: If a person buys bananas    they will also buy apples
 - Support: 21
 - COndidence: 0.356
Rule: If a person buys bread    they will also buy bananas
 - Support: 17
 - COndidence: 0.630
Rule: If a person buys bananas    they will also buy cheese
 - Support: 27
 - COndidence: 0.458
Rule: If a person buys milk    they will also buy bread
 - Support: 14
 - COndidence: 0.304
Rule: If a person buys bananas    they will also buy milk
 - Support: 19
 - COndidence: 0.322
Rule: If a person buys bread    they will also buy cheese
 - Support: 4
 - COndidence: 0.148
Rule: If a person buys bananas    they will also buy bread
 - Support: 17
 - COndidence: 0.288

#打印特定规则的置信度和支持度
def print_rule(premise, conclusion, support, confidence, features):      
    premise_name = features[premise]
    conclusion_name = features[conclusion]
    print "Rule: If a person buys {0}\
    they will also buy {1}".format(premise_name,conclusion_name)
    print " - Support: {0}".format(support[(premise,conclusion)])
    print " - COndidence: {0:.3f}".format(confidence[(premise,conclusion)])

打印指定规则

premise = 1
conclusion = 3
print_rule(premise,conclusion,support,confidence,features)

Rule: If a person buys milk    they will also buy apples
 - Support: 9
 - COndidence: 0.196

#按照支持度由高到低排序   规则支持数
from operator import itemgetter
sorted_support = sorted(support.items(), key=itemgetter(1),reverse=True)
print support
#support.items() 将字典转换成列表 【（），（）】
#itemgetter(1)表示以字典的值（非键）作为排序根据 即支持度
#reverse 相反  以相反的顺序进行排序，即降序（默认升序）

defaultdict(<type 'int'>, {(0, 1): 14, (1, 2): 7, (3, 2): 25, (1, 3): 9, (3, 0): 5, (4, 1): 19, (3, 1): 9, (1, 4): 19, (0, 2): 4, (2, 0): 4, (2, 3): 25, (2, 1): 7, (4, 3): 21, (0, 4): 17, (1, 0): 14, (4, 2): 27, (0, 3): 5, (3, 4): 21, (2, 4): 27, (4, 0): 17})

打印支持度最高的5个规则

for index in range(5):
    print "Rule #{0}".format(index+1)
    premise,conclusion = sorted_support[index][0]
    print_rule(premise,conclusion,support,confidence,features)

Rule #1
Rule: If a person buys bananas    they will also buy cheese
 - Support: 27
 - COndidence: 0.458
Rule #2
Rule: If a person buys cheese    they will also buy bananas
 - Support: 27
 - COndidence: 0.659
Rule #3
Rule: If a person buys apples    they will also buy cheese
 - Support: 25
 - COndidence: 0.694
Rule #4
Rule: If a person buys cheese    they will also buy apples
 - Support: 25
 - COndidence: 0.610
Rule #5
Rule: If a person buys bananas    they will also buy apples
 - Support: 21
 - COndidence: 0.356

#按照置信度由高到低进行排序   规则可信度
sorted_confidence = sorted(confidence.items(),key=itemgetter(1),reverse=True)

打印置信度最高的5个规则

for index in range(5):
    print("Rule #{0}".format(index+1))
    premise,conclusion = sorted_confidence[index][0]
    print_rule(premise,conclusion,support,confidence,features)

Rule #1
Rule: If a person buys apples    they will also buy cheese
 - Support: 25
 - COndidence: 0.694
Rule #2
Rule: If a person buys cheese    they will also buy bananas
 - Support: 27
 - COndidence: 0.659
Rule #3
Rule: If a person buys bread    they will also buy bananas
 - Support: 17
 - COndidence: 0.630
Rule #4
Rule: If a person buys cheese    they will also buy apples
 - Support: 25
 - COndidence: 0.610
Rule #5
Rule: If a person buys apples    they will also buy bananas
 - Support: 21
 - COndidence: 0.583

从排序结果来看，“顾客买苹果，也会买奶酪”和“顾客买奶酪，也会买香蕉”，这两条规则的支持度和置信度都很高。超市经理可以根据这些规则来调整商品摆放位置。例如，如果本周苹果促销，就在旁边摆上奶酪。但是香蕉和奶酪同时搞促销就没有多大意义了，因为我们发现购买奶酪的顾客中，接近66%的人即使不搞促销也会买香蕉——即使搞促销，也不会给销量带来多大提升。

参考书籍：Python书籍挖掘入门和实践