推荐算法

 协同过滤推荐

数据:一些用户对电影的评分数据

# A dictionary of movie critics and their ratings of a small
# set of movies
critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
      'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
      'The Night Listener': 3.0},
     'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
      'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
      'You, Me and Dupree': 3.5},
     'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
      'Superman Returns': 3.5, 'The Night Listener': 4.0},
     'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
      'The Night Listener': 4.5, 'Superman Returns': 4.0,'You, Me and Dupree': 2.5},
     'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
      'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
      'You, Me and Dupree': 2.0},
     'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
      'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
     'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

user-based collaborative filtering

1.计算用户间相似度

  • 欧几里得距离

             

def sim_distance(prefs,person1,person2):
      # Get the list of shared_items
      si={}
      for item in prefs[person1]:
        if item in prefs[person2]:
            si[item]=1
      # if they have no ratings in common, return 0
      if len(si)==0: return 0
      # Add up the squares of all the differences
      sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2)
                          for item in prefs[person1] if item in prefs[person2]])
      return 1/(1+sum_of_squares)
sim_distance(critics,'Lisa Rose','Gene Seymour')

结果为:

0.14814814814814814

 

  • Pearson相关系数
def sim_pearson(prefs,p1,p2):
       # Get the list of mutually rated items
       si={}
       for item in prefs[p1]:
         if item in prefs[p2]: si[item]=1
       # Find the number of elements
       n=len(si)
       # if they are no ratings in common, return 0
       if n==0: return 0
       # Add up all the preferences
       sum1=sum([prefs[p1][it] for it in si])
       sum2=sum([prefs[p2][it] for it in si])
       # Sum up the squares
       sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
       sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
       # Sum up the products
       pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
       # Calculate Pearson score
       num=pSum-(sum1*sum2/n)
       den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
       if den==0: return 0
       r=num/den
       return r

结果为:

0.39605901719066977

2.返回指定数量匹配值,并按降序排列

根据相似度大小返回相似用户列表:

def topMatches(prefs,person,n=5,similarity=sim_pearson):
    scores=[(similarity(prefs,person,other),other)
                for other in prefs if other!=person]
# Sort the list so the highest scores appear at the top 
    scores.sort( )
    scores.reverse( )
    return scores[0:n]

3.考虑到去除看过的,奇怪评分的影响,即对评分和相似度加权求和,得到最终推荐结果

# Gets recommendations for a person by using a weighted average
 # of every other user's rankings
def getRecommendations(prefs,person,similarity=sim_pearson):
    totals={}
    simSums={}
    for other in prefs:
        # don't compare me to myself
        if other==person: continue
        sim=similarity(prefs,person,other)
        # ignore scores of zero or lower
        if sim<=0: continue
        for item in prefs[other]:
            # only score movies I haven't seen yet
            if item not in prefs[person] or prefs[person][item]==0:
               # Similarity * Score 
               totals.setdefault(item,0) 
               totals[item]+=prefs[other][item]*sim 
               # Sum of similarities 
               simSums.setdefault(item,0) 
               simSums[item]+=sim
    # Create the normalized list
    rankings=[(total/simSums[item],item) for item, total in totals.items()]
    # Return the sorted list 
    rankings.sort( ) 
    rankings.reverse( ) 
    return rankings

item-based collaborative filtering

1.原理和user based一样,先计算item相似列表

 

 def calculateSimilarItems(prefs,n=10):
      # Create a dictionary of items showing which other items they
      # are most similar to.
      result={}
      # Invert the preference matrix to be item-centric
      itemPrefs=transformPrefs(prefs)
      c=0
      for item in itemPrefs:
        # Status updates for large datasets
        c+=1
        if c%100==0: print ("%d / %d" % (c,len(itemPrefs)))
        # Find the most similar items to this one
        scores=topMatches(itemPrefs,item,n=n,similarity=sim_distance)
        result[item]=scores
      return result

2.使用加权求和进行推荐

def getRecommendedItems(prefs,itemMatch,user):
    userRatings=prefs[user]
    scores={}
    totalSim={}
    # Loop over items rated by this user
    for (item,rating) in userRatings.items():
             # Loop over items similar to this one
        for (similarity,item2) in itemMatch[item]:
               # Ignore if this user has already rated this item
            if item2 in userRatings: continue
               # Weighted sum of rating times similarity
            scores.setdefault(item2,0)
            scores[item2]+=similarity*rating
               # Sum of all the similarities
            totalSim.setdefault(item2,0)
            totalSim[item2]+=similarity
    # Divide each total score by total weighting to get an average 
    rankings=[(score/totalSim[item],item) for item,score in scores.items()]
    # Return the rankings from highest to lowest 
    rankings.sort( )
    rankings.reverse( )
    return rankings

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值