推荐系统：参数协同过滤

最新推荐文章于 2023-09-05 10:03:08 发布

-柚子皮-

最新推荐文章于 2023-09-05 10:03:08 发布

阅读量9.3k

点赞数 3

分类专栏：机器学习公开课推荐系统Resys 机器学习文章标签： machine learning 机器学习推荐系统 andrew ng

本文链接：https://blog.csdn.net/pipisorry/article/details/44850971

版权

机器学习同时被 3 个专栏收录

54 篇文章 70 订阅

订阅专栏

机器学习公开课

19 篇文章 1 订阅

订阅专栏

推荐系统Resys

12 篇文章 3 订阅

订阅专栏

http://blog.csdn.net/pipisorry/article/details/44850971

机器学习Machine Learning - Andrew NG courses学习笔记。Machine Learning - XVI. Recommender Systems 推荐系统(Week 9)

相关参考：基于内容的推荐[推荐系统：基于内容的推荐]和基于cosin相似度的非参数协同过滤算法[推荐系统：协同过滤collaborative filtering ]

本文主要讲基于低秩矩阵分解的参数协同过滤算法（lz先就这么取的名字）。

皮皮blog

协同过滤Collaborative Filtering

{另一种形式的协同过滤算法（NG）}

Problem Formulation 问题形式

通过已知数据和缺失的打分来预测?处的评分值。to look through the data and look at all the movie ratings that are missing and to try to predict what these values of the question marks should be.

Note: for this example, I have loosely 3 romantic or romantic comedy movies and 2 action movies.

主要思想

每个items都有一些features，但是我们并不知道它们的值是多少（与基于内容推荐的线性规划方法不同），同时每个用户通过θj告诉我们他们有多喜欢romantic或者action movies。θj和features都不知道，这时就可以通过协同过滤算法推断参数了。

最优化算法

使用类似EM算法的思想：首先随机给定θ的值，这样就可以进一步学习不同电影的features。但是为了学习到不同的features和参数(i.e., perform symmetry breaking)，θ不能都初始化为0。

协同过滤：by rating a few movies myself,the system learn better features and then these features can be used by the system to make better movie predictions for everyone else.And so there is a sense of collaboration where every user is helping the system learn better features for the common good. This is this collaborative filtering.

协同过滤算法Collaborative Filtering Algorithm

协同过滤优化目标

仅仅sum over所有有评分的user-item对。即式1：对每个用户user，sum over所有此user评分过的movies; 式2：对每个电影item，sum over所有对此item评分过的users。
学习的feature是Rn的（feature 0不用学习，直接为1？）：Previously we have been using this convention that we have a feature x0 = 1 that corresponds to an interceptor.When we are using this sort of formalism where we're are actually learning the features,we are actually going to do away with feature x0. And so the features we are going to learn x, will be in Rn.

协同过滤算法

{更efficient的算法，不用在xs 和 θs之间来回，但是可以同步simultaneously求解xs 和 θs}

[机器学习Machine Learning - Andrew NG courses]

皮皮blog

向量化Vectorization:低秩矩阵分解Low Rank Matrix Factorization

协同过滤算法collaborative filtering algorithm亦称低秩矩阵分解 low rank matrix factorization，称呼来源于矩阵X*θ'有一个线性代数上的数学性质：低秩矩阵。

因为用户对item的评分很少，矩阵X*θ'是稀疏的低秩的不完全的？（低秩不一定稀疏），所以这是一个不完全数据矩阵恢复到一个低秩矩阵和一个稀疏矩阵的数学问题，即矩阵完备。[矩阵分析与应用-张]

协同过滤的向量化

使用学习到的features来查找相关的电影

实现细节:均值归一化Mean Normalization

对于没有任何评分的用户Users 如Eve5，loss函数bias项为0，要最小化cost J，θ5=0，所以没有预测较高分的movie推荐给他。这样就不好了。

item均值归一化

所有item减去其（所有用户给它的）打分均值（没有评分的user_item不计入均值的计算），参数推断完成后再加回来。

lz认为这个是解决cold start问题的，item均值归一化，将item均值作为新加入用户的打分，类似baseline结合算法。

如果电影没有评分，就将列和均值设置为0；如果用户没有评分，就将行均值设置为0。in case you have some movies with no ratings,you can normalize the different columns to have means zero, instead of normalizing the rows to have mean zero,but if you really have a movie with no rating, maybe you just shouldn't recommend that movie to anyone（lz觉得这个很不合理，直接造成了first rater问题，还是应该有一定推荐的，增加新颖性）.And so, taking care of the case of a user who hasn't rated anything might be more important than taking care of the case of a movie that hasn't gotten a single rating.

皮皮blog