Recommender Systems - Implementational detail: Mean normalization

最新推荐文章于 2022-11-28 15:44:32 发布

王彩旗 edwardwangcq.com

最新推荐文章于 2022-11-28 15:44:32 发布

阅读量109

点赞数

分类专栏：人工智能 # 机器学习

本文链接：https://blog.csdn.net/edward_wang1/article/details/114398658

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

机器学习

109 篇文章 0 订阅

订阅专栏

Let's talk about one implementational detail of Collaborative filtering algorithm: mean normalization. It can work as a sort of pre-processing step for Collaborative filtering and depending on your data set, it can sometimes make the algorithm work a little bit better.

Motivation

To motivate the idea of mean normalization, let's consider an example in figure-1 where user Eve has not rated any movies. Let's say n=2 . We're going to learn a parameter vector $\theta ^{(5)}\in \mathbb{R}^{2}$ for user 5, Eve.

If you look at the first term of the optimization objective in figure-1, as below:

$\frac{1}{2}\sum _{(i,j):r(i,j)=1}((\theta ^{(j)})^{T}x^{(i)}-y^{(i,j)})^{2}$

because user Eve hasn't rated any movies, there was no movies for which r(i,j)=1 . So the first term plays no role at all in determining $\theta ^{(5)}$ .

And the only term that affects $\theta ^{(5)}$ is the last term:

$\frac{\lambda }{2}\sum_{j=1}^{n_{u}} \sum_{k=1} ^{n}(\theta ^{(j)}_{k})^{2}$

In other words, we want to minimize the following:

$\frac{\lambda }{2}\left [ (\theta ^{(5)}_{1})^{2}+ (\theta ^{(5)}_{2})^{2}\right ]$

And you'll end up with $\theta ^{(5)}=\begin{bmatrix} 0\\ 0 \end{bmatrix}$ .

Then, when we go to predict how user 5, Eve, would rate any movie, we have that $(\theta ^{(5)})^{T}x^{(i)}= 0$ all the time. Eve will go to predict every single movie with zero stars. But if you look at different movies, some people do like some movies. It seems kind of not useful to just predict that Eve is going to rate everything 0 stars. And in fact if we're predicting that Eve is going to rate everything 0 stars, we also don't have any good way of recommending any movies to her.

Mean normalization

The idea of mean normalization will let us fix above problem.

Firstly, we still need group all the ratings into the matrix Y as in figure-2. To perform mean normalization:

To compute the average rating of each movie and store that in a vector $\mu\in \mathbb{R}^{n_{m}}$ ( $n_{m}$ is number of movies)
For all the movie ratings, subtract from each row the average rating for that movie and form a new rating matrix still call it Y.
Note: by this, I'm just normalizing each movie to have an average rating of 0. And the question marks are still question marks
Take the new set of ratings in new matrix Y, and use it with the Collaborative filtering algorithm to learn the parameters $\theta ^{(j)}$ and features $x^{(i)}$
Then for user , movie , I'm going to predict the rating as $(\theta ^{(j)})^{T}x^{(i)}+\mu _{i}$

Then, specifically for user 5 which is Eve, the same argument as the previous discussion still applies in the sense that Eve had not rated any movies and so the learned parameter $\theta ^{(5)}=\begin{bmatrix} 0\\ 0 \end{bmatrix}$ . So what we'll get then is, for particular movie , we're going to predict for Eve:

$(\theta ^{(5)})^{T}x^{(i)}+\mu _{i}=\mu _{i}$

Then predicted ratings for all the movies by Eve will be:

$\mu =\begin{bmatrix} 2.5\\ 2.5\\ 2\\ 2.25\\ 1.25 \end{bmatrix}$

This actually makes sense, because it says that if Eve hadn't rated any movies, and we don't know anything about this new user Eve. Then what we're going to do is just predict for each of movies what are the average rating that those movie got.

Side notes:

In case you have some movies with no ratings, so it is analogous to a user who hasn't rated anything. You can also play with other versions of this algorithm where you normalize the different columns to have means zero. But it's less important because if you really have a movie with no ratings, maybe you just shouldn't recommend that movie to anyone.

<end>