用python写个简单的推荐系统示例程序
作者:阿俊 发布于:2011-11-26 16:03 Saturday
分类:推荐系统
python这门语言写程序代码量非常少,短短几行就可以把程序写的很清楚,由于它严格的缩进格式,所以看起来也很容易看。
其中,字典和列表递推式非常好用,写起来效率会非常非常高。
但python的执行效率很低,所以主要的函数还要用c写,或者用c和python混合着写,用python丰富的数据结构,用c的执行效率。
下面的代码是一个简单的user-based knn推荐系统
#python version 2.6
#dataset dict:
critics={'tom':{'a':3,'b':2,'c':3,'d':1,'e':5},
'bob':{'a':3,'b':4,'c':3,'d':1,'e':5},
'jean':{'a':3,'b':2,'c':3,'d':2,'e':3},
'jobs':{'a':1,'b':2,'c':4,'d':1,'e':5},
'bill':{'a':0,'b':2,'c':1,'d':0,'e':2}}
#calculate similarity
from math import sqrt
#Euclidean distance
def sim_distance (prefs,person1,person2):
si={}
for item in prefs[person1]:
if item in prefs[person2]:
si[item]=1
if len(si)==0: return 0
sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2)
for item in prefs[person1] if item in prefs[person2]])
return 1/(1+sqrt(sum_of_squares))
#print sim_distance(critics,'bill','tom')
#pearson method
def sim_pearson (prefs,person1,person2):
si={}
for item in prefs[person1]:
if item in prefs[person2]: si[item]=1
n=len(si)
if n==0:return 1
sum1=sum([prefs[person1][it] for it in si])
sum2=sum([prefs[person2][it] for it in si])
sum1Sq=sum([pow(prefs[person1][it],2)for it in si])
sum2Sq=sum([pow(prefs[person2][it],2)for it in si])
pSum=sum([prefs[person1][it]*prefs[person2][it] for it in si])
#calculate the pearson vlaue
num=pSum-(sum1*sum2/n)
den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
if den==0: return 0
r=num/den
return r
#print sim_pearson(critics,'bill','tom')
#ranking one person's neighborhood
def topmatches (prefs,person,n,similarity=sim_distance):
scores=[(similarity(prefs,person,other),other)
for other in prefs if other!=person]
scores.sort()#sort the similarity
scores.reverse()
return scores[0:n]
#print topmatches(critics,'bill',4)
#recommend process
def getRecommendations (prefs,person,n,similarity=sim_pearson):#n:print the top n item
totals={}
simSums={}
for other in prefs:
if other==person: continue
sim=similarity(prefs,person,other)
#print sim
if sim<=0: continue
for item in prefs[other]:
if item not in prefs[person] or prefs[person][item]==0:
#print item
totals.setdefault(item,0)
totals[item]+=prefs[other][item]*sim
simSums.setdefault(item,0)
#sum of similarity
simSums[item]+=sim
rankings=[(total/simSums[item],item) for item,total in totals.items()]
rankings.sort()
rankings.reverse()
return rankings[0:n]
print getRecommendations (critics,'bill',2)
如果你想做item-based的话,可以利用下面的函数