这实际上并不是一个“MapReduce”功能,但它可以给你一些显着的加速,而不会有任何麻烦.
我实际上会使用numpy来“矢量化”操作,让你的生活更轻松.从这里你只需要遍历这个字典并应用矢量化函数,将这个项目与其他项目进行比较.
import numpy as np
bnb_items = bnb.values()
for num in xrange(len(bnb_items)-1):
sims = cosSim(bnb_items[num], bnb_items[num+1:]
def cosSim(User, OUsers):
""" Determinnes the cosine-similarity between 1 user and all others.
Returns an array the size of OUsers with the similarity measures
User is a single array of the items purchased by a user.
OUsers is a LIST of arrays purchased by other users.
"""
multidot = np.vectorize(np.vdot)
multidenom = np.vectorize(lambda x: np.sum(x)*np.sum(User))
#apply the dot-product between this user and all others
num = multidot(OUsers, User)
#apply the magnitude multiplication across this user and all others
denom = multidenom(OUsers)
return num/denom
我没有测试过这段代码,所以可能会有一些愚蠢的错误,但这个想法应该让你获得90%的代价.
这应该有一个非常重要的加速.如果您仍然需要加速,那么有一个精彩的博客文章实现了“Slope One”推荐系统here.
希望有所帮助,将