Recommender Systems - Implementational detail: Mean normalization

Let's talk about one implementational detail of Collaborative filtering algorithm: mean normalization. It can work as a sort of pre-processing step for Collaborative filtering and depending on your data set, it can sometimes make the algorithm work a little bit better.

Motivation

Figure-1

To motivate the idea of mean normalization, let's consider an example in figure-1 where user Eve has not rated any movies. Let's say n=2. We're going to learn a parameter vector \theta ^{(5)}\in \mathbb{R}^{2} for user 5, Eve.

If you look at the first term of the optimization objective in figure-1, as below:

\frac{1}{2}\sum _{(i,j):r(i,j)=1}((\theta ^{(j)})^{T}x^{(i)}-y^{(i,j)})^{2}

because user Eve hasn't rated any movies, there was no movies for which r(i,j)=1. So the first term plays no role at all in determining \theta ^{(5)}.

And the only term that affects \theta ^{(5)} is the last term:

\frac{\lambda }{2}\sum_{j=1}^{n_{u}} \sum_{k=1} ^{n}(\theta ^{(j)}_{k})^{2}

In other words, we want to minimize the following:

\frac{\lambda }{2}\left [ (\theta ^{(5)}_{1})^{2}+ (\theta ^{(5)}_{2})^{2}\right ]

And you'll end up with \theta ^{(5)}=\begin{bmatrix} 0\\ 0 \end{bmatrix}.

Then, when we go to predict how user 5, Eve, would rate any movie, we have that (\theta ^{(5)})^{T}x^{(i)}= 0 all the time. Eve will go to predict every single movie with zero stars. But if you look at different movies, some people do like some movies. It seems kind of not useful to just predict that Eve is going to rate everything 0 stars. And in fact if we're predicting that Eve is going to rate everything 0 stars, we also don't have any good way of recommending any movies to her.

Mean normalization

Figure-2

The idea of mean normalization will let us fix above problem.

Firstly, we still need group all the ratings into the matrix Y as in figure-2. To perform mean normalization:

  1. To compute the average rating of each movie and store that in a vector \mu\in \mathbb{R}^{n_{m}} (n_{m} is number of movies)
  2. For all the movie ratings, subtract from each row the average rating for that movie and form a new rating matrix still call it Y.
    Note: by this, I'm just normalizing each movie to have an average rating of 0.  And the question marks are still question marks
  3. Take the new set of ratings in new matrix Y, and use it with the Collaborative filtering algorithm to learn the parameters \theta ^{(j)} and features x^{(i)}
  4. Then for user j, movie i, I'm going to predict the rating as (\theta ^{(j)})^{T}x^{(i)}+\mu _{i}

Then, specifically for user 5 which is Eve, the same argument as the previous discussion still applies in the sense that Eve had not rated any movies and so the learned parameter \theta ^{(5)}=\begin{bmatrix} 0\\ 0 \end{bmatrix}. So what we'll get then is, for particular movie i, we're going to predict for Eve:

(\theta ^{(5)})^{T}x^{(i)}+\mu _{i}=\mu _{i}

Then predicted ratings for all the movies by Eve will be:

\mu =\begin{bmatrix} 2.5\\ 2.5\\ 2\\ 2.25\\ 1.25 \end{bmatrix}

This actually makes sense, because it says that if Eve hadn't rated any movies, and we don't know anything about this new user Eve. Then what we're going to do is just predict for each of movies what are the average rating that those movie got.

Side notes:

In case you have some movies with no ratings, so it is analogous to a user who hasn't rated anything. You can also play with other versions of this algorithm where you normalize the different columns to have means zero. But it's less important because if you really have a movie with no ratings, maybe you just shouldn't recommend that movie to anyone.

<end>

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值