协同过滤推荐算法

最新推荐文章于 2024-03-09 14:51:37 发布

Caisi Huang

最新推荐文章于 2024-03-09 14:51:37 发布

阅读量145

点赞数

分类专栏： Python 文章标签：数据挖掘协同过滤推荐系统 python 算法

本文链接：https://blog.csdn.net/qq_43920024/article/details/115156925

版权

Python 专栏收录该内容

32 篇文章 1 订阅

订阅专栏

文章目录

协同过滤推荐算法

协同过滤推荐算法

欧几里得距离

from math import sqrt 
def sim_distance(prefs,person1,person2):
    si = {}
    for item in prefs[person1]:
        if item in prefs[person2]:
            si[item] = 1
    print(si)
    if len(si) == 0:
        return 0
    squares = [pow(prefs[person1][item]-prefs[person2][item],2) for item in prefs[person1] if item in prefs[person2]]
    sum_of_squares=sum(squares)
    print(squares)
    return 1/(1+sum_of_squares)

皮尔逊相关系数

def sim_pearson(prefs, p1, p2):
    si = {}
    for item in prefs[p1]:
        if item in prefs[p2]:
            si[item] = 1
#     print(si)
    if len(si) == 0:
        return 0
    
    n = len(si)
    sum1 = sum([prefs[p1][it] for it in si])
    sum2 = sum([prefs[p2][it] for it in si])
    sum1Sq = sum([pow(prefs[p1][it], 2) for it in si])
    sum2Sq = sum([pow(prefs[p2][it], 2) for it in si])
    pSum = sum([prefs[p1][it] * prefs[p2][it] for it in si])
    
    num = pSum - (sum1 * sum2 / n)
    den = sqrt((sum1Sq - pow(sum1, 2) / n) * (sum2Sq - pow(sum2, 2) / n))
    
    if den == 0:
        return 0
    r = num/den
    return r

相似度最高

def topMatches(prefs,person,n=5,similarity=sim_pearson):
    
    scores=[(similarity(prefs, person, other), other) for other in prefs if other != person]
    
    scores.sort()
    scores.reverse()
    
    return scores[0:n]

相似度转评分

def getRecommendations(prefs,person,similarity=sim_pearson):
    totals = {}
    simSums = {}
    
    for other in prefs:
        if other == person:
            continue
        
        sim = similarity(prefs, person, other)
        if sim < 0:
#             print('\nSIM < 0:', other)
            continue
#         print(sim)

        for item in prefs[other]:
            if item not in prefs[person] or prefs[person][item] == 0: # 没有评过分的 
                totals.setdefault(item, 0)
                totals[item] += prefs[other][item] * sim
                
                simSums.setdefault(item, 0)
                simSums[item] += sim
#             print(item, '|\n', prefs[person], '|\n', prefs[person][item], '\n')
    print("\n {} ".format(totals))
    print(" {} \n".format(simSums))

    rankings = [(total/simSums[item], item) for item, total in totals.items()]
    rankings.sort()
    rankings.reverse()
    return rankings

物品用户数据转换

def transformPrefs(prefs):
    result = {}
    for person in prefs:
        for item in prefs[person]:
            result.setdefault(item, {})
            result[item][person] = prefs[person][item]
    return result

Data

data = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
                      'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
                      'The Night Listener': 3.0},

        'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
                         'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
                         'You, Me and Dupree': 3.5},

        'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
                             'Superman Returns': 3.5, 'The Night Listener': 4.0},

        'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
                         'The Night Listener': 4.5, 'Superman Returns': 4.0,
                         'You, Me and Dupree': 2.5},

        'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
                         'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
                         'You, Me and Dupree': 2.0},

        'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
                          'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},

        'Toby': {'Snakes on a Plane': 4.5, 'You, Me and Dupree': 1.0, 'Superman Returns': 4.0}
        }

Run

# 基于用户
getRecommendations(critics, 'Jack Matthews')

# 基于物品
movies = transformPrefs(critics)
getRecommendations(movies, 'You, Me and Dupree')

总结

都是基于数学的基础上，不同的推荐算法实现不同的公式
推荐使用皮尔逊相关系数

数据挖掘与机器学习[M].吴建生，许桂秋.2019.7

Caisi Huang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
协同过滤推荐算法

文章目录协同过滤推荐算法欧几里得距离皮尔逊相关系数相似度最高相似度转评分物品用户数据转换DataRun总结协同过滤推荐算法欧几里得距离from math import sqrt def sim_distance(prefs,person1,person2): si = {} for item in prefs[person1]: if item in prefs[person2]: si[item] = 1 print(si)
复制链接

扫一扫