亲和性分析根据样本个体(物体)之间的相似度,确定它们关系的亲疏。
from collections import defaultdict
# Now compute for all possible rules
valid_rules = defaultdict(int)
invalid_rules = defaultdict(int)
num_occurences = defaultdict(int)
for sample in X:
for premise in range(n_features):
if sample[premise] == 0: continue
# Record that the premise was bought in another transaction
num_occurences[premise] += 1
for conclusion in range(n_features):
if premise == conclusion: # It makes little sense to measure if X -> X.
continue
if sample[conclusion] == 1:
# This person also bought the conclusion item
valid_rules[(premise, conclusion)] += 1
else:
# This person bought the premise, but not the conclusion
invalid_rules[(premise, conclusion)] += 1
support = valid_rules
confidence = defaultdict(float)
for premise, conclusion in valid_rules.keys():
confidence[(premise, conclusion)] = valid_rules[(premise, conclusion)] / num_occurences[premise]
defaultdict:如果查找的键不存在,返回一个默认值
for premise, conclusion in confidence:
premise_name = features[premise]
conclusion_name = features[conclusion]
print("Rule: If a person buys {0} they will also buy {1}".format(premise_name, conclusion_name))
print(" - Confidence: {0:.3f}".format(confidence[(premise, conclusion)]))
print(" - Support: {0}".format(support[(premise, conclusion)]))
print("")
得到支持度字典和置信度字典,分别包含每条规则的支持度和置信度。我们再来声明 一个函数,接收的参数有:分别作为前提条件和结论的特征索引值、支持度字典、置信度字典以 及特征列表。输出每条规则及其支持度和置信度,对输出进行格式化,以方便查看。
def print_rule(premise, conclusion,
support, confidence, features):
premise_name = features[premise]
conclusion_name = features[conclusion]
print("Rule: If a person buys {0} they will also buy
{1}".format(premise_name, conclusion_name))