programming collective intelligence的读书笔记

第二章 Making Recommendations

现在我们浏览的每一个网站,几乎都会发现推荐系统的痕迹,以前我总是觉得很神秘,读完这一章发现其实挺简单的(呵呵,但是大数据量的计算还是很大的问题)。

本章主要讲了一般的推荐系统的实现方式,基本上都是用协同过滤来作的,所谓协同过滤 http://en.wikipedia.org/wiki/Collaborative_filtering ,就是找臭味和你相同的那些人或物,根据他们已有的喜好来推测出你的喜好。作者从实际例子出发,讲的很好,特别适合我这种半路出家的人。

作者从影评推荐的例子入手,根据每个用户的影评资料找出每一对用户之间的相似度,进而可以根据这个相似度为每一个用户推荐电影(当然,这里求每一对用户之间的相似度的度量方法N多,具体可以见 http://en.wikipedia.org/wiki/Metric_%28mathematics%29#Examples ),以上是根据用户的影评信息找相似的用户(User-Based Filtering),接下来作者展示了利用以上信息,找相似的电影(根据用户找相似电影,这其实就是Item-Based Filtering),对于这两种方式的不同,作者有以下说明:

This will probably work well for a few thousand people or items, but a very large site like Amazon has millions of customers and products—comparing a user with every other user and then comparing every product each user has rated can be very slow,The technique we have used thus far is called user-based collaborative filtering. An alternative is known as item-based collaborative filtering. In cases with very large datasets, item-based collaborative filtering can give better results, and it allows many of the calculations to be performed in advance so that a user needing recommenda-tions can get them more quickly.

所以后者需要经常线下算,保证拥有一份最新的item相似的字典。

 

User-Based or Item-Based Filtering?
Item-based filtering is significantly faster than user-based when getting a list of rec-ommendations for a large dataset, but it does have the additional overhead of main-
taining the item similarity table. Also, there is a difference in accuracy that depends on how “sparse” the dataset is. In the movie example, since every critic has rated
nearly every movie, the dataset is dense (not sparse). On the other hand, it would be unlikely to find two people with the same set of del.icio.us bookmarks—most book-
marks are saved by a small group of people, leading to a sparse dataset. Item-based filtering usually outperforms user-based filtering in sparse datasets, and the two per-
form about equally in dense datasets.

 

 

作业:

1、Tanimoto score:http://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_coefficient_.28extended_Jaccard_coefficient.29

转载于:https://www.cnblogs.com/mahatma/archive/2010/12/20/1911831.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值