mahout in action推荐系统阅读笔记(5)

第四章:making recommendation

理解User based推荐

重点在于用户之间的相似性,第一个算法如下:

for every item i that u has no preference for yet
  for every other user v that has a preference for i
    compute a similarity s between u and v
    incorporate v's preference for i, weighted by s, into a running average
return the top items, ranked by weighted average

上面的计算有个问题,对每个item都计算实在太耗时间了,因此需要其它更有效率算法:

for every other user w
  compute a similarity s between u and w
  retain the top users, ranked by similarity, as a neighborhood n
for every item i that some user in n has a preference for,
    but that u has no preference for yet
  for every other user v in n that has a preference for i
    compute a similarity s between u and v
    incorporate v's preference for i, weighted by s, into a running average

mahout使用了后面一种算法

代码回顾:

DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity (model);
UserNeighborhood neighborhood =
new NearestNUserNeighborhood (2, similarity, model);
Recommender recommender =
new GenericUserBasedRecommender(model, neighborhood, similarity);
neighborhood
这里有一个neighborhood数量如何选择的问题,可以通过evaluation来选择,也可以采用ThresholdUserNeighborhood,当然最佳值的确定还是需要通过evaluation来选择

Similarity

PearsonCorrelationSimilarity,统计学相关的一种相似度,缺点是没考虑item数量的影响,对稀疏数据处理也不行,所有item评分一样也不能用,PearsonCorrelationSimilarity不一定是一个好的选择,当然,它也不算太坏

EuclideanDistanceSimilarity,算两个用户之间的欧式距离

余弦相似,和PearsonCorrelationSimilarity计算结果一样的

Spearman correlation,由SpearmanCorrelationSimilarity实现,是Pearson Correlation的一个变种

Tanimoto coefficient,计算相似和不相似的比值,不能有preference value,也叫Jaccard coefficient

log-likelihood,类似Tanimoto coefficient

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值