Recommender Systems - Collaborative filtering

最新推荐文章于 2024-04-13 14:30:31 发布

王彩旗 edwardwangcq.com

最新推荐文章于 2024-04-13 14:30:31 发布

阅读量296

点赞数

分类专栏：人工智能 # 机器学习

本文链接：https://blog.csdn.net/edward_wang1/article/details/113916787

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

机器学习

109 篇文章 0 订阅

订阅专栏

In this class, we'll talk about an approach to build a recommender system that's called collaborative filtering. This algorithm has a very interesting property that it does what is called feature learning. By that I mean it can start to learn for itself what features to use.

Problem motivation

Previously, as what shown in figure-1, we assumed that for each movie someone had come and told us how romantic that movie was and how much action there was in that movie. But actually it can be very difficult and time consuming to try to get someone to watch each movie and tell you these info. And often you'll need even more features than just these two. So where do you get these features from?

So, as figure-2, let's change the problem a bit and suppose we don't know the values of these features. Instead let's say we've gone through each of our users and each one has told us how much they like the romantic movies and how much they like action packed movies. For example, $\theta ^{(1)}=\begin{bmatrix} 0\\ 5\\ 0 \end{bmatrix}$ means Alice told us she really likes romantic movies and so there is a "5" associated with feature " $x_{1}$ " there. And she really doesn't like action movies, so there's a "0" for " $x_{2}$ " there. Similarly for other users Bob, Carol, and Dave. In general, we can go to our users and each user told us what is the value of $\theta ^{(j)}$ for them, then we can infer the value of the features of $x_{1}$ and $x_{2}$ for each movie.

For example, for the first movie "Love at last" which associated with feature vector $x^{(1)}$ . Let's ignore its title and pretend we don't know what this movie is about. Because:

Both Alice and Bob rated it 5 and they told us they like romantic movies
Both Carol and Dave rated it 0 and they like action movies and hate romantic movies

We might reasonably conclude that this is probably a romantic movie. Thus it's possible that $x^{(1)}_{1}=1.0$ and $x^{(1)}_{2}=0$

This example is mathematically simplified. What we're really asking is what feature vector should $x^{(1)}$ be so that

$(\theta ^{(1)})^{T}x^{(1)}\approx 5$

Similarly:

$(\theta ^{(2)})^{T}x^{(1)}\approx 5$

$(\theta ^{(3)})^{T}x^{(1)}\approx 0$

$(\theta ^{(4)})^{T}x^{(1)}\approx 0$

From above, we can infer that $x^{(1)}=\begin{bmatrix} 1\\ 1.0\\ 0 \end{bmatrix}$

Similarly, we can infer values for feature vectors of other movies.

Optimization algorithm

Let's say our users have given us their preferences as $\theta ^{(1)}, \theta ^{(2)},...,\theta ^{(n_{u})}$ . We can pose the above optimization problem to estimate $x^{(i)}$ which is the feature for movie . We want to sum over all the indices for which we have a rating for movie and try to choose feature $x^{(i)}$ to minimize the regularized cost function in figure-3. This is how we would learn the features for one specific movie.

To learn all the features for all the movies, we'll sum over all the $n_{m}$ movies and minimize the objective cost function in figure-4. You end up with a reasonable set of features for all the movies.

Collaborative filtering

In last class, if we have a set of movie ratings ( r(i,j), y(i,j) ), then given features of different movies ( $x^{(i)}$ , $i=1,..,n_{m}$ ), we can learn the parameters $\theta ^{(j)}, j=1,..,n_{u}$ which are the preference of different users. What we've shown earlier in this class is that if users are willing to give us parameters $\theta ^{(j)}, j=1,..,n_{u}$ , then we can estimate features $x^{(i)}$ , $i=1,..,n_{m}$ for different movies. So, this is kind of a chicken and egg problem.

Then what we can do is:

Randomly guess some initial value of $\theta^{(j)}, j=1,2,...,n_{u}$
Predict the features $x^{(i)}, i=1,2,...,n_{m}$ for different movies
Then based on these $x^{(i)}$ in step 2, we can then get even better $\theta ^{(j)}$
Then based on better $\theta ^{(j)}$ , we can get bettern $x^{(i)}$ again
Keep on iterating above and going back and forth.

It proved that this actually works and will cause your algorithm to converge to a reasonable set of features for your movies and a reasonable set of parameters for your users. This is a basic collaborative filtering algorithm. We'll be able to improve this later to make it quite a bit more computationally efficient. But hopefully this gives you a sense of how you can formulate a problem where you can simultaneously learn the parameters of different users and the features of different movies.

The term collaborative filtering refers to the observation that when you run this algorithm with a large set of users, what all of these users are effectively doing are sort of collaborating to get better movie ratings for everyone because for every user rating some subset of the movies, every user is helping the algorithm a little bit to learn better features, and then by rating a few movies myself, I will be helping the system learn better features, and these features can be used by the system to make better movie prediction for everyone else. So there is a sense of collaboration where every user is helping the system learn better features for the common good. This is the collaborative filtering.

Next, we'll try to develop an even bettern technique for collaborative filtering.

<end>