educoder的python之数据挖掘与分析的商品推荐_图书数据分析(a)educoder-CSDN博客

本文链接：https://blog.csdn.net/weixin_56636204/article/details/121493300

任务描述

本关任务：求出支持度排前5的规则。

相关知识

为了完成本关任务，你需要掌握：如何对支持度字典进行排序。

如何对支持度字典进行排序

得到所有规则的支持度和置信度后，为了找出最佳规则，还需要根据支持度和置信度对规则进行排序，我们分别看一下这两个标准。

要找出支持度最高的规则，首先对支持度字典进行排序。字典中的元素（一个键值对）默认为没有前后顺序；字典的items()函数返回包含字典所有元素的列表。我们使用itemgetter()类作为键，这样就可以对嵌套列表进行排序。itemgetter(1)表示以字典各元素的值（这里为支持度）作为排序依据，reverse=True表示降序排列。

from operator import itemgetter
sorted_support = sorted(support.items(), key=itemgetter(1), reverse=True)

排序完成后，就可以输出支持度最高的前5条规则。

如何对置信度字典进行排序

同理，我们还可以输出置信度最高的规则。首先根据置信度进行排序。

sorted_confidence = sorted(confidence.items(), key=itemgetter(1), reverse=True)

从排序结果可以找出支持度和置信度都很高的规则。超市的经理就可以根据这些规则来调整商品摆放位置。

从上面这个例子就能看出数据挖掘的洞察力有多强大。人们可以用数据挖掘技术探索数据集中各变量之间的关系，寻找新发现。

编程要求

编写代码，从文本获取数据，计算出支持度排前5的规则，其中置信度为float浮点型，输出保留3位小数。

具体输入输出请看测试说明。

测试说明

我会对你编写的代码进行测试：

测试输入：

step4/input/goods.txt

输入要导入的数据文件

预期输出：

Rule #1 Rule: If a person buys apple they will also buy ham - Confidence: 0.659 - Support: 27 Rule #2 Rule: If a person buys apple they will also buy banana - Confidence: 0.610 - Support: 25 Rule #3 Rule: If a person buys banana they will also buy apple - Confidence: 0.694 - Support: 25 Rule #4 Rule: If a person buys banana they will also buy ham - Confidence: 0.583 - Support: 21 Rule #5 Rule: If a person buys bread they will also buy ham - Confidence: 0.413 - Support: 19

输出支持度排前5的规则，并输出其置信地和支持度

开始你的任务吧，祝你成功！

input_file = input()   #接收要导入的文件
import numpy as np 
data_file = input_file
Data = np.loadtxt(data_file,delimiter=" ")
from collections import defaultdict
features = [ "milk", "bread", "apple", "banana","ham"]  #存放商品名称
valid_rules = defaultdict(int)      #存放所有的规则应验的情况
invaild_rules = defaultdict(int)    #存放规则无效
num_occurances = defaultdict(int)   #存放条件相同的规则数量
#********* Begin *********#
#-----在此补充算法得到所有规则的置信度和支持度，并输出支持度最高的前5条规则-----#
for sample in Data:           
    for premise in range(4):
        if sample[premise] == 0:continue 
        num_occurances[premise] += 1      
        for conclusion in range(len(features)):
            if premise == conclusion:continue
            if sample[conclusion] == 1:
                valid_rules[(premise,conclusion)] += 1
            else:
                invaild_rules[(premise,conclusion)] += 1            
support = valid_rules
confidence = defaultdict(float)
for premise,conclusion in valid_rules.keys():
    rule = (premise,conclusion)
    confidence[rule] = valid_rules[rule] / num_occurances[premise]
def print_rule(premise,conclusion,support,confidence,features):
    premise_name = features[premise]
    conclusion_name = features[conclusion]
    print("Rule: If a person buys {0} they will also buy {1}".format(premise_name,conclusion_name))
    print("- Confidence: {0:.3f}".format(confidence[(premise,conclusion)]))  
    print("- Support: {0}".format(support[(premise,conclusion)]))
from operator import itemgetter 
sorted_support = sorted(support.items(), key=itemgetter(1), reverse=True)
for index in range(5): 
    print("Rule #{0}".format(index + 1)) 
    premise, conclusion = sorted_support[index][0] 
    print_rule(premise, conclusion, support, confidence, features) 
#********* End *********#
#-----请勿删除Begin-End之外的代码框架-----#