亲和度分析:商品推荐(Python实现)

亲和性分析:

数据挖掘有个常见的应用场景,即顾客在购买一件商品时,商家可以趁机了解他们还想买什么,以便把多数顾客愿意同时购买的商品放到一起销售以提升销售额。当商家收集到足够多的数据时,就可以对其进行亲和性分析,以确定哪些商品适合放在一起出售。

本质:根据样本个体(物体)之间的相似度,确定它们关系的亲疏。

接下来这个实例是对5种商品,人们买了其中一种,还会买另外一种商品的可能
要发掘可能,就必须得创建规则
如果顾客购买了商品X,那么他们可能愿意购买商品Y(这就是我们的规则)

每条规则都存在支持度和可信度(置信度)
支持度:给定规则应验的比例 例:一共有10个人,喜欢苹果的有5人,那么苹果的支持度就是5/10
可信度:规则准确率如何 (每条规则的正确可能性)

具体代码如下:

#coding:utf-8
import numpy as np
dataset_filename = "affinity_dataset.txt" #导入的样本集
X=np.loadtxt(dataset_filename)
n_samples, n_features = X.shape #n_samples 样本数 n_features 样本的特征
#print n_features,n_samples
# The names of the features, for your reference.
#面包 牛奶 奶酪 苹果 香蕉
features = ["bread", "milk", "cheese", "apples", "bananas"]
print X[:5]
[[0. 0. 1. 1. 1.]
 [1. 1. 0. 1. 0.]
 [1. 0. 1. 1. 0.]
 [0. 0. 1. 1. 1.]
 [0. 1. 0. 0. 1.]]
#First, how many rows contain our premise: that a person buying apples
#premise 前提
#计算出苹果的支持度
num_apple_purchases = 0
for sample in X:
    if sample[3] == 1: #This person bought Apples
        num_apple_purchases += 1
print "{0} people bought Apples".format(num_apple_purchases)#苹果的支持度=num_apple_purchases/sum_person
36 people bought Apples
#将规则和每个特征出现的次数设置成整形字典
from collections import defaultdict
valid_rules = defaultdict(int)
invalid_rules = defaultdict(int)
num_occurances = defaultdict(int)
#样本遍历,计算出每个特征的有效规则和无效规则
for sample in X:
    for premise in range(n_features):
        if sample[premise]==0:
            continue
        num_occurances[premise] += 1
        for conclusion in range(n_features):
            if premise == conclusion:  #将因果相同的去除,例:买了苹果最可能买的还是苹果
                continue
            if sample[conclusion] == 1:
                valid_rules[(premise,conclusion)]+=1
            else:
                invalid_rules[(premise, conclusion)]+=1  

#有效规则即支持度
support = valid_rules
confidence = defaultdict(float)
#置信度的计算
for premise, conclusion in valid_rules.keys():
    rule = (premise, conclusion)   #前提 结果的特征索引值
    confidence[rule] = float(valid_rules[rule]) / num_occurances[premise]

这是打印所有的规则


for premise, conclusion in confidence:
    premise_name = features[premise]
    conclusion_name = features[conclusion]
    print "Rule: If a person buys {0}\
    they will also buy {1}".format(premise_name,conclusion_name)
    print " - Support: {0}".format(support[(premise,conclusion)])
    print " - COndidence: {0:.3f}".format(confidence[(premise,conclusion)])
Rule: If a person buys bread    they will also buy milk
 - Support: 14
 - COndidence: 0.519
Rule: If a person buys milk    they will also buy cheese
 - Support: 7
 - COndidence: 0.152
Rule: If a person buys apples    they will also buy cheese
 - Support: 25
 - COndidence: 0.694
Rule: If a person buys milk    they will also buy apples
 - Support: 9
 - COndidence: 0.196
Rule: If a person buys bread    they will also buy apples
 - Support: 5
 - COndidence: 0.185
Rule: If a person buys apples    they will also buy bread
 - Support: 5
 - COndidence: 0.139
Rule: If a person buys apples    they will also buy bananas
 - Support: 21
 - COndidence: 0.583
Rule: If a person buys apples    they will also buy milk
 - Support: 9
 - COndidence: 0.250
Rule: If a person buys milk    they will also buy bananas
 - Support: 19
 - COndidence: 0.413
Rule: If a person buys cheese    they will also buy bananas
 - Support: 27
 - COndidence: 0.659
Rule: If a person buys cheese    they will also buy bread
 - Support: 4
 - COndidence: 0.098
Rule: If a person buys cheese    they will also buy apples
 - Support: 25
 - COndidence: 0.610
Rule: If a person buys cheese    they will also buy milk
 - Support: 7
 - COndidence: 0.171
Rule: If a person buys bananas    they will also buy apples
 - Support: 21
 - COndidence: 0.356
Rule: If a person buys bread    they will also buy bananas
 - Support: 17
 - COndidence: 0.630
Rule: If a person buys bananas    they will also buy cheese
 - Support: 27
 - COndidence: 0.458
Rule: If a person buys milk    they will also buy bread
 - Support: 14
 - COndidence: 0.304
Rule: If a person buys bananas    they will also buy milk
 - Support: 19
 - COndidence: 0.322
Rule: If a person buys bread    they will also buy cheese
 - Support: 4
 - COndidence: 0.148
Rule: If a person buys bananas    they will also buy bread
 - Support: 17
 - COndidence: 0.288
#打印特定规则的置信度和支持度
def print_rule(premise, conclusion, support, confidence, features):      
    premise_name = features[premise]
    conclusion_name = features[conclusion]
    print "Rule: If a person buys {0}\
    they will also buy {1}".format(premise_name,conclusion_name)
    print " - Support: {0}".format(support[(premise,conclusion)])
    print " - COndidence: {0:.3f}".format(confidence[(premise,conclusion)])

打印指定规则

premise = 1
conclusion = 3
print_rule(premise,conclusion,support,confidence,features)

Rule: If a person buys milk    they will also buy apples
 - Support: 9
 - COndidence: 0.196
#按照支持度由高到低排序   规则支持数
from operator import itemgetter
sorted_support = sorted(support.items(), key=itemgetter(1),reverse=True)
print support
#support.items() 将字典转换成列表 【(),()】
#itemgetter(1)表示以字典的值(非键)作为排序根据 即支持度
#reverse 相反  以相反的顺序进行排序,即降序(默认升序)
defaultdict(<type 'int'>, {(0, 1): 14, (1, 2): 7, (3, 2): 25, (1, 3): 9, (3, 0): 5, (4, 1): 19, (3, 1): 9, (1, 4): 19, (0, 2): 4, (2, 0): 4, (2, 3): 25, (2, 1): 7, (4, 3): 21, (0, 4): 17, (1, 0): 14, (4, 2): 27, (0, 3): 5, (3, 4): 21, (2, 4): 27, (4, 0): 17})

打印支持度最高的5个规则

for index in range(5):
    print "Rule #{0}".format(index+1)
    premise,conclusion = sorted_support[index][0]
    print_rule(premise,conclusion,support,confidence,features)
Rule #1
Rule: If a person buys bananas    they will also buy cheese
 - Support: 27
 - COndidence: 0.458
Rule #2
Rule: If a person buys cheese    they will also buy bananas
 - Support: 27
 - COndidence: 0.659
Rule #3
Rule: If a person buys apples    they will also buy cheese
 - Support: 25
 - COndidence: 0.694
Rule #4
Rule: If a person buys cheese    they will also buy apples
 - Support: 25
 - COndidence: 0.610
Rule #5
Rule: If a person buys bananas    they will also buy apples
 - Support: 21
 - COndidence: 0.356
#按照置信度由高到低进行排序   规则可信度
sorted_confidence = sorted(confidence.items(),key=itemgetter(1),reverse=True)

打印置信度最高的5个规则

for index in range(5):
    print("Rule #{0}".format(index+1))
    premise,conclusion = sorted_confidence[index][0]
    print_rule(premise,conclusion,support,confidence,features)
Rule #1
Rule: If a person buys apples    they will also buy cheese
 - Support: 25
 - COndidence: 0.694
Rule #2
Rule: If a person buys cheese    they will also buy bananas
 - Support: 27
 - COndidence: 0.659
Rule #3
Rule: If a person buys bread    they will also buy bananas
 - Support: 17
 - COndidence: 0.630
Rule #4
Rule: If a person buys cheese    they will also buy apples
 - Support: 25
 - COndidence: 0.610
Rule #5
Rule: If a person buys apples    they will also buy bananas
 - Support: 21
 - COndidence: 0.583

从排序结果来看,“顾客买苹果,也会买奶酪”和“顾客买奶酪,也会买香蕉”,这两条规则的支持度和置信度都很高。超市经理可以根据这些规则来调整商品摆放位置。例如,如果本周苹果促销,就在旁边摆上奶酪。但是香蕉和奶酪同时搞促销就没有多大意义了,因为我们发现购买奶酪的顾客中,接近66%的人即使不搞促销也会买香蕉——即使搞促销,也不会给销量带来多大提升。

参考书籍:Python书籍挖掘入门和实践

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值