这章主要讲了如何做推荐,现在推荐最常用的几种算法:Collaborative Filtering、Cluster Models、Search-Based Methods、Item-to-Item Collaborative Filtering.前两种是通过找相似的Customer,后两种通过找相似的Item.论文Amazon.com Recommendations Item-to-Item Collaborative Filtering 对这几种算法都有介绍。这章主要提了Collaborative Filtering和tem-to-Item Collaborative Filtering。 Collaborative Filtering:通过搜索大量的Customer数据集来找到那一小撮和你口味相似的。书中举了一个电影评论的例子,每个人都对一些电影进行评等级,通过这些数据来找到和你口味相似的人,以及对你没有看过的电影做推荐,并以这个例子演示了如何做推荐。
准备数据:(本笔记的代码使用ruby实现,python代码的实现见原书)
- critics={
- 'Lisa Rose' => { 'Lady in the Water' => 2.5, 'Snakes on a Plane' => 3.5,
- 'Just My Luck' => 3.0, 'Superman Returns' => 3.5, 'You, Me and Dupree' => 2.5,
- 'The Night Listener' => 3.0},
- 'Gene Seymour' => { 'Lady in the Water' => 3.0, 'Snakes on a Plane' => 3.5,
- 'Just My Luck' => 1.5, 'Superman Returns' => 5.0, 'The Night Listener'=> 3.0,
- 'You, Me and Dupree' => 3.5},
- 'Michael Phillips' => { 'Lady in the Water' => 2.5, 'Snakes on a Plane' => 3.0,
- 'Superman Returns' => 3.5, 'The Night Listener' => 4.0},
- 'Claudia Puig' => { 'Snakes on a Plane' => 3.5, 'Just My Luck' => 3.0,
- 'The Night Listener' => 4.5, 'Superman Returns' => 4.0,
- 'You, Me and Dupree' => 2.5},
- 'Mick LaSalle'=> { 'Lady in the Water' => 3.0, 'Snakes on a Plane' => 4.0,
- 'Just My Luck' => 2.0, 'Superman Returns' => 3.0, 'The Night Listener' => 3.0,
- 'You, Me and Dupree' => 2.0},
- 'Jack Matthews'=> { 'Lady in the Water' => 3.0, 'Snakes on a Plane' => 4.0,
- 'The Night Listener'=> 3.0, 'Superman Returns'=> 5.0, 'You, Me and Dupree' => 3.5},
- 'Toby' => { 'Snakes on a Plane' =>4.5,'You, Me and Dupree' =>1.0,'Superman Returns' => 4.0}
- }
定义相似度:
欧拉距离: