mahout in action推荐系统阅读笔记（5）

最新推荐文章于 2013-03-08 15:14:38 发布

softwarehe

最新推荐文章于 2013-03-08 15:14:38 发布

阅读量1k

点赞数

分类专栏： mahout

本文链接：https://blog.csdn.net/softwarehe/article/details/8646925

版权

mahout 专栏收录该内容

33 篇文章 0 订阅

订阅专栏

第四章：making recommendation

理解User based推荐

重点在于用户之间的相似性，第一个算法如下：

for every item i that u has no preference for yet
  for every other user v that has a preference for i
    compute a similarity s between u and v
    incorporate v's preference for i, weighted by s, into a running average
return the top items, ranked by weighted average

上面的计算有个问题，对每个item都计算实在太耗时间了，因此需要其它更有效率算法：

for every other user w
  compute a similarity s between u and w
  retain the top users, ranked by similarity, as a neighborhood n
for every item i that some user in n has a preference for,
    but that u has no preference for yet
  for every other user v in n that has a preference for i
    compute a similarity s between u and v
    incorporate v's preference for i, weighted by s, into a running average

mahout使用了后面一种算法

代码回顾：

DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity (model);
UserNeighborhood neighborhood =
new NearestNUserNeighborhood (2, similarity, model);
Recommender recommender =
new GenericUserBasedRecommender(model, neighborhood, similarity);

neighborhood
这里有一个neighborhood数量如何选择的问题，可以通过evaluation来选择，也可以采用ThresholdUserNeighborhood，当然最佳值的确定还是需要通过evaluation来选择

Similarity

PearsonCorrelationSimilarity，统计学相关的一种相似度，缺点是没考虑item数量的影响，对稀疏数据处理也不行，所有item评分一样也不能用，PearsonCorrelationSimilarity不一定是一个好的选择，当然，它也不算太坏

EuclideanDistanceSimilarity，算两个用户之间的欧式距离

余弦相似，和PearsonCorrelationSimilarity计算结果一样的

Spearman correlation，由SpearmanCorrelationSimilarity实现，是Pearson Correlation的一个变种

Tanimoto coefficient，计算相似和不相似的比值，不能有preference value，也叫Jaccard coefficient

log-likelihood，类似Tanimoto coefficient

softwarehe

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
mahout in action推荐系统阅读笔记（5）

第四章：making recommendation理解User based推荐重点在于用户之间的相似性，第一个算法如下：for every item i that u has no preference for yet for every other user v that has a preference for i compute a similarity s betw
复制链接

扫一扫

专栏目录