机器学习—推荐系统的设计

最新推荐文章于 2023-08-11 15:58:36 发布

Gssol

最新推荐文章于 2023-08-11 15:58:36 发布

阅读量519

点赞数

本文链接：https://blog.csdn.net/Gssol/article/details/74276140

版权

搜集偏好

首先需要找到一种表达不同人以及其偏好的方法，在python中可以使用一个嵌套的字典来表达。

eg：不同人对几部影片的评价
下面代码命名为：recommendations.py

critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
      'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
      'The Night Listener': 3.0},
     'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
      'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
      'You, Me and Dupree': 3.5},
     'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
      'Superman Returns': 3.5, 'The Night Listener': 4.0},
     'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
      'The Night Listener': 4.5, 'Superman Returns': 4.0,
      'You, Me and Dupree': 2.5},
     'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
      'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
      'You, Me and Dupree': 2.0},
     'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
      'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
     'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

上述字典使用1到5得评分，以此来体现包括本人在内的每位影评者对某一给定影片的喜爱程度。不管偏好是好是坏，我们需要一种方法将它们对应到数字。
在计算购买过某件商品时，可以用1表示购买过，用0表示从未购买过
在统计在线购物信息时，可以用0表示未购买，用1表示已浏览，用2表示已购买
对用户的每个行为都用数字表示相应地评价值。

启用终端，对上述程序进行查询和修改

Last login: Mon Jul  3 20:15:06 on ttys001
Shashas-MacBook-Pro:~ shasha$ cd desktop/recommended
Shashas-MacBook-Pro:recommended shasha$ python
Python 2.7.10 (default, Oct 23 2015, 19:19:21) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from recommendations import critics
>>> critics['Lisa Rose']['Lady in the Water']
2.5
>>> critics['Toby']['Snakes on a Plane']=4.5
>>> critics['Toby']
{'Snakes on a Plane': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 1.0}
>>>

寻找新近的用户

搜集完大家的偏好数据之后，我们需要确定人们在人们在品味方面的相似程度。相似度评价值体系：欧几里得距离和皮尔逊相关度

欧几里得距离
- 以评价的物品为坐标轴，然后将参与评价的人绘制到图上，考察彼此之间的距离远近。
- 计算距离的公式，对每一轴向上的差值求平方后相加然后对总和取平方根，python中可以使用pow(n,2)对某数求平方

>>> from math import sqrt
>>> sqrt(pow(5-4,2)+pow(4-1,2))
3.1622776601683795

上述求解距离值，偏好越相似，则其距离就越短。不过我们还需要一个函数，将偏好越相似的情况给出越大的值。为此，我们可以将函数值加1(避免被整数整除的错误)，并求解其倒数：

>>> 1/(1+sqrt(pow(5-4,2)+pow(4-1,2)))
0.2402530733520421

这样处理的结果：

让最终计算的值的区间在[0,1]之间，相似度越高答案越大
举例来说明：比如计算出来的距离有 0 0.1 1 10，通过使用加1求导规则，将最终的答案通过分数表现出来，结果为：
1/1 1/1.1 1/2 1/10
可以看出当距离越大，最终的结果会越小。

皮尔逊相关度