Recommender Systems - Content-based recommendations

最新推荐文章于 2023-02-19 00:10:09 发布

王彩旗 edwardwangcq.com

最新推荐文章于 2023-02-19 00:10:09 发布

阅读量122

点赞数

分类专栏：人工智能 # 机器学习

本文链接：https://blog.csdn.net/edward_wang1/article/details/113846419

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

机器学习

109 篇文章 0 订阅

订阅专栏

Let's talk about the first approach to build a recommender system: content based recommendations. We have features to capture what is the content of these movies. And we are really using features of the content of the movies to make our prediction.

Content-based recommender systems

Suppose we have two features for each movie:

$x_{1}$ : degree of romance

$x_{2}$ : degree of action

And as usual, we define $x_{0}=1$

Then, each movie can be represented as a feature vector:

$x^{(1)}=\begin{bmatrix} x_{0}\\ x_{1}\\ x_{2} \end{bmatrix}=\begin{bmatrix} 1\\ 0.9\\ 0 \end{bmatrix}$

$x^{(2)}=\begin{bmatrix} x_{0}\\ x_{1}\\ x_{2} \end{bmatrix}=\begin{bmatrix} 1\\ 1.0\\ 0.01 \end{bmatrix}$

...

$x^{(5)}=\begin{bmatrix} x_{0}\\ x_{1}\\ x_{2} \end{bmatrix}=\begin{bmatrix} 1\\ 0\\ 0.9 \end{bmatrix}$

We could treat predicting the ratings of each user as a separate linear regression problem. For each user , we're going to learn a parameter $\theta ^{(j)}\in \mathbb{R}^{n+1}=\mathbb{R}^{3}$ , and predict user as rating movie as $(\theta ^{(j)})^{T}x^{(i)}$ .

For example, let's say that we have somehow already gotten a parameter vector for Alice as $\theta ^{(1)}=\begin{bmatrix} 0\\ 5\\ 0 \end{bmatrix}$ . Then our prediction for the movie "Cute puppies of love" is going to be equal to $(\theta ^{(1)})^{T}x^{(3)}=\begin{bmatrix} 0\\ 5\\ 0 \end{bmatrix}^{T}\times\begin{bmatrix} 1\\ 0.99\\ 0 \end{bmatrix}=\begin{bmatrix} 0 & 5 & 0 \end{bmatrix}\times\begin{bmatrix} 0\\ 0.99\\ 0 \end{bmatrix}=4.95$ .

Problem formulation

In figure-2, we've defined some notations. At the bottom, we defined the cost function to minimize $\theta ^{(j)}$ for user . This is exactly the same as regularized cost function for linear regression. One thing is just that we deleted the const $m^{(j)}$ which means number of movies rated by user .

Optimization objective

In building a recommender system, we don't just want to learn parameters for single user, we want to learn parameters for all of our users. When I minimize above cost function for all users, I will get a separate parameter vector for each user $\theta ^{(1)}, \theta ^{(2)},..., \theta ^{(n_{u})}$ .Then I can use that to make predictions for all of my users.

Get $\theta ^{(j)}$ with GD

In figure-4,the upper part is the recap of the cost function for all of my users. At the bottom, it gives the formula of Gradient Descent. It is the same as what we've studied for linear regression. One only difference is we deleted the $\frac{1}{m^{(j)}}$ earlier in figure-2. What in the parentheses is the partial derivative $\frac{\partial }{\partial \theta ^{(j)}_{k}}J(\theta ^{(1)},...,\theta ^{(n_{u})})$ .

And using these formulas for the derivatives, you can also plug them into more advanced optimization algorithm like cluster gradient or L-BFGS or what you have to try to minimize the cost function.

But for many movies, we don't actually have such features, or it may be very difficult to get such features for all of our movies, for all of whatever items we're trying to sell. So, next we'll start to talk about an approach that's not content based.

<end>