公式:
- 假设A = [1,2,3,4] ,长度为4
- 假设B = [1,2,5,6] ,长度为4
- 则AB的公共部分C = [1,2], 长度为2
- AB的相似度为:2 / (4 + 4 - 2) = 0.33
算法:
方法一:
#-*- coding: utf-8 -*-
#user_data为用户信息嵌套字典
#如{'fabrice' : {'water' : 3}}
def sim_tonimoto(user_data, user1, user2):
common = {}
#判断有没有相同的数据, 没有相同数据则返回0
for item in user_data[user1]:
if item in user_data[user2]:
common[item] = 1
if len(common) == 0:
return 0
common_num = len(common)
user1_num = len(user_data[user1])
user2_num = len(user_data[user2])
res = float(common_num)/(user1_num + user2_num - common_num)
return res
方法二:
def sim_tonimoto(user_data, user1, user2):
common = [item for item in user_data[user1] if item in user_data[user2]]
return float(len(common))/(len(user_data[user1]) + len(user_data[user2]) - len(common))