Recommender Systems - Content-based recommendations

Let's talk about the first approach to build a recommender system: content based recommendations. We have features to capture what is the content of these movies. And we are really using features of the content of the movies to make our prediction.

Content-based recommender systems

Figure-1

Suppose we have two features for each movie:

x_{1}: degree of romance

x_{2}: degree of action

And as usual, we define x_{0}=1

Then, each movie can be represented as a feature vector:

x^{(1)}=\begin{bmatrix} x_{0}\\ x_{1}\\ x_{2} \end{bmatrix}=\begin{bmatrix} 1\\ 0.9\\ 0 \end{bmatrix}

x^{(2)}=\begin{bmatrix} x_{0}\\ x_{1}\\ x_{2} \end{bmatrix}=\begin{bmatrix} 1\\ 1.0\\ 0.01 \end{bmatrix}

...

x^{(5)}=\begin{bmatrix} x_{0}\\ x_{1}\\ x_{2} \end{bmatrix}=\begin{bmatrix} 1\\ 0\\ 0.9 \end{bmatrix}

We could treat predicting the ratings of each user as a separate linear regression problem. For each user j, we're going to learn a parameter \theta ^{(j)}\in \mathbb{R}^{n+1}=\mathbb{R}^{3}, and predict user j as rating movie i as (\theta ^{(j)})^{T}x^{(i)}.

For example, let's say that we have somehow already gotten a parameter vector for Alice as \theta ^{(1)}=\begin{bmatrix} 0\\ 5\\ 0 \end{bmatrix}. Then our prediction for the movie "Cute puppies of love" is going to be equal to (\theta ^{(1)})^{T}x^{(3)}=\begin{bmatrix} 0\\ 5\\ 0 \end{bmatrix}^{T}\times\begin{bmatrix} 1\\ 0.99\\ 0 \end{bmatrix}=\begin{bmatrix} 0 & 5 & 0 \end{bmatrix}\times\begin{bmatrix} 0\\ 0.99\\ 0 \end{bmatrix}=4.95.

 

Problem formulation

Figure-2

In figure-2, we've defined some notations. At the bottom, we defined the cost function to minimize \theta ^{(j)} for user j.  This is exactly the same as regularized cost function for linear regression. One thing is just that we deleted the const m^{(j)} which means number of movies rated by user j.

Optimization objective

Figure-3

In building a recommender system, we don't just want to learn parameters for single user, we want to learn parameters for all of our users. When I minimize above cost function for all users, I will get a separate parameter vector for each user \theta ^{(1)}, \theta ^{(2)},..., \theta ^{(n_{u})}.Then I can use that to make predictions for all of my users.

Get \theta ^{(j)} with GD

 

Figure-4

In figure-4,the upper part is the recap of the cost function for all of my users. At the bottom, it gives the formula of Gradient Descent. It is the same as what we've studied for linear regression. One only difference is we deleted the \frac{1}{m^{(j)}} earlier in figure-2. What in the parentheses is the partial derivative \frac{\partial }{\partial \theta ^{(j)}_{k}}J(\theta ^{(1)},...,\theta ^{(n_{u})}).

And using these formulas for the derivatives, you can also plug them into more advanced optimization algorithm like cluster gradient or L-BFGS or what you have to try to minimize the cost function.

But for many movies, we don't actually have such features, or it may be very difficult to get such features for all of our movies, for all of whatever items we're trying to sell. So, next we'll start to talk about an approach that's not content based.

<end>

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值