推荐系统

推荐系统——电影评分

例子:预测电影评分

有如下信息

MovieAlice (1)Bob (2)Carol (3)Dave (4)
Love at last5500
Romance forever5??0
Cute puppies of love?40?
Nonstop car chases0054
Swords vs. karate005?

 

定义

nu = 用户的数量

nm = 电影的数量

r(i, j) = 1 如果用户 j 给电影 i 打分

y(i, j) = 当 r(i, j) = 1 的情况下,用户 j 给电影 i 打的分数(0-5)

目标:预测 ?的值(未评分用户对电影的评分)

现假设每个电影有两个特征

MovieAlice (1)Bob (2)Carol (3)Dave (4)

x1

(remoance)

x2

(action)

Love at last55000.90
Romance forever5??01.00.01
Cute puppies of love?40?0.990
Nonstop car chases00540.11.0
Swords vs. karate005?00.9

这样就有了电影特征的训练集,比如对于电影 Love at least 的特征向量为

\[{x^{\left( 1 \right)}} = \left[ {\begin{array}{*{20}{c}}
{{x_0}}\\
{{x_1}}\\
{{x_2}}
\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}
1\\
{0.9}\\
0
\end{array}} \right]\]

对于每个用户 j ,学习其对应的单数 θ(j) ∈ R3,然后用 (θ(1))Tx(i) 预测用户 j 对于电影 i 的评分。

用 m(j) = 代表用户 j 评分的电影的数量,则定义学习目标

\[\underbrace {\min }_{{\theta ^{\left( j \right)}}}\frac{1}{{2{m^{\left( j \right)}}}}\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}}  + \frac{\lambda }{{2{m^{\left( j \right)}}}}\sum\limits_{k = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} \]

在推荐系统中会将 m(j) 去掉,因为他是常数且不会影响计算得到的 θ(j)

\[\underbrace {\min }_{{\theta ^{\left( j \right)}}}\frac{1}{2}\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}}  + \frac{\lambda }{2}\sum\limits_{k = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} \]

定义所有用户的学习目标

\[\underbrace {\min }_{{\theta ^{\left( 1 \right)}},...,{\theta ^{\left( {{n_u}} \right)}}}\frac{1}{2}\sum\limits_{j = 1}^{{n_u}} {\left[ {\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}}  + \lambda \sum\limits_{k = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} } \right]} \]

 然后运用梯度下降算法得到最优的 θ

\[\begin{array}{l}
\theta _k^{\left( j \right)}: = \theta _k^{\left( j \right)} - \alpha \sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)x_k^{\left( i \right)}} ---for-k=0\\
\theta _k^{\left( j \right)}: = \theta _k^{\left( j \right)} - \alpha \left( {\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)x_k^{\left( i \right)} + + \lambda \theta _k^{\left( j \right)}} } \right)---for-k≠0
\end{array}\]


推荐系统——系统过滤

对于数据

MovieAlice (1)Bob (2)Carol (3)Dave (4)

x1

(remoance)

x2

(action)

Love at last55000.90
Romance forever5??01.00.01
Cute puppies of love?40?0.990
Nonstop car chases00540.11.0
Swords vs. karate005?00.9

 一般情况下其实很难知道一个点有有“多么‘浪漫’”(比如 浪漫概率为 0.9)或者“多么‘动作’”,因此数据应该是如下形态

MovieAlice (1)Bob (2)Carol (3)Dave (4)

x1

(remoance)

x2

(remoance)

Love at last5500??
Romance forever5??0??
Cute puppies of love?40???
Nonstop car chases0054??
Swords vs. karate005???

但是,我们可以调查用户有多么喜欢“浪漫”电影,多么喜欢“动作”电影。因此,我们可以得到如下数据

\[{\theta ^{\left( 1 \right)}} = \left[ {\begin{array}{*{20}{c}}
0\\
5\\
0
\end{array}} \right],{\theta ^{\left( 2 \right)}} = \left[ {\begin{array}{*{20}{c}}
0\\
5\\
0
\end{array}} \right],{\theta ^{\left( 3 \right)}} = \left[ {\begin{array}{*{20}{c}}
0\\
0\\
5
\end{array}} \right],{\theta ^{\left( 4 \right)}} = \left[ {\begin{array}{*{20}{c}}
0\\
0\\
5
\end{array}} \right]\]

分析:对于电影“Love at last”,我们知道 Alice 和 Bob 喜欢这部电影,Carol 和 Dave 不喜欢这部电影;而 Alice 和 Bob 又都喜欢“浪漫电影”,Carol 和 Dave 又都不喜欢“浪漫”电影,我们可以推断这部电影是“浪漫”电影,而不是“动作电影”,即(x1 = 1.0, x2 = 0.0)

运用公式表达就是

\[\begin{array}{l}
{\left( {{\theta ^{\left( 1 \right)}}} \right)^T}{x^{\left( 1 \right)}} \approx 5\\
{\left( {{\theta ^{\left( 2 \right)}}} \right)^T}{x^{\left( 1 \right)}} \approx 5\\
{\left( {{\theta ^{\left( 3 \right)}}} \right)^T}{x^{\left( 1 \right)}} \approx 0\\
{\left( {{\theta ^{\left( 4 \right)}}} \right)^T}{x^{\left( 1 \right)}} \approx 0
\end{array}\]

这样,在已知 θ 的情况下得到

\[{x^{\left( 1 \right)}} = \left[ {\begin{array}{*{20}{c}}
1\\
{1.0}\\
{0.0}
\end{array}} \right]\]

-------------------------------------------------------------------------

这时,我们的问题就转化成已知 θ(i),..., θ(nu)

学习 x(i)

问题转化为

\[\underbrace {\min }_{{x^{\left( j \right)}}}\frac{1}{2}\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}}  + \frac{\lambda }{2}\sum\limits_{k = 1}^n {{{\left( {x_k^{\left( j \right)}} \right)}^2}} \]

对于所有 x(i),..., x(nm)。问题为

\[\underbrace {\min }_{{x^{\left( 1 \right)}},...,{x^{\left( {{n_m}} \right)}}}\frac{1}{2}\sum\limits_{j = 1}^{{n_m}} {\left[ {\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}}  + \lambda \sum\limits_{k = 1}^n {{{\left( {x_k^{\left( i \right)}} \right)}^2}} } \right]} \]


 

因此,对于给定 x(i),..., x(nm) 和 “电影评分”,可以评价 θ(i),..., θ(nu)

给定 θ(i),..., θ(nu) 和 “电影评分”,可以评价 x(i),..., x(nm)。 

当遇到问题时,可以

随机猜测 θ-->x-->θ-->x-->θ-->x-->...

-------------------------------------------------------------------------

在“协同过滤”应用中,并不是θ-->x-->θ-->x-->θ-->x-->...,而是将两者结合在一起

\[J\left( {{x^{\left( 1 \right)}},...,{x^{\left( {{n_m}} \right)}},{\theta ^{\left( 1 \right)}},...,{\theta ^{\left( {{n_u}} \right)}}} \right) = \frac{1}{2}\sum\limits_{\left( {i,j} \right):r\left( {i,j} \right) = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}}  + \frac{\lambda }{2}\sum\limits_{i = 1}^{{n_m}} {\sum\limits_{j = 1}^n {{{\left( {x_k^{\left( i \right)}} \right)}^2}} }  + \frac{\lambda }{2}\sum\limits_{j = 1}^{{n_u}} {\sum\limits_{j = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} } \]

\[\underbrace {\min }_{{x^{\left( 1 \right)}},...,{x^{\left( {{n_m}} \right)}},{\theta ^{\left( 1 \right)}},...,{\theta ^{\left( {{n_u}} \right)}}}J\left( {{x^{\left( 1 \right)}},...,{x^{\left( {{n_m}} \right)}},{\theta ^{\left( 1 \right)}},...,{\theta ^{\left( {{n_u}} \right)}}} \right)\]

-------------------------------------------------------------------------

总结:协同过滤算法

  • 随机初始化x(i),..., x(nm), θ(i),..., θ(nu)为小的随机值
  • 运用梯度下降算法或者别的优化算法最小化 J(x(i),..., x(nm), θ(i),..., θ(nu))
  • 当用户给定他的 θ 时,就可以结合算法学习得来的 x,运用 (θ(j))Tx 预测电影的得分。

-------------------------------------------------------------------------

协同算法的矩阵实现

对于数据

MovieAlice (1)Bob (2)Carol (3)Dave (4)
Love at last5500
Romance forever5??0
Cute puppies of love?40?
Nonstop car chases0054
Swords vs. karate005?

如果定义

\[Y = \left[ {\begin{array}{*{20}{c}}
5&5&0&0\\
5&?&?&0\\
?&4&0&?\\
0&0&5&4\\
0&0&5&0
\end{array}} \right]\]

\[\Pr edictedratings = \left[ {\begin{array}{*{20}{c}}
{{{\left( {{\theta ^{\left( 1 \right)}}} \right)}^T}\left( {{x^{\left( 1 \right)}}} \right)}&{{{\left( {{\theta ^{\left( 2 \right)}}} \right)}^T}\left( {{x^{\left( 1 \right)}}} \right)}&.&{{{\left( {{\theta ^{\left( {{n_u}} \right)}}} \right)}^T}\left( {{x^{\left( 1 \right)}}} \right)}\\
{{{\left( {{\theta ^{\left( 1 \right)}}} \right)}^T}\left( {{x^{\left( 2 \right)}}} \right)}&{{{\left( {{\theta ^{\left( 2 \right)}}} \right)}^T}\left( {{x^{\left( 2 \right)}}} \right)}&.&{{{\left( {{\theta ^{\left( {{n_u}} \right)}}} \right)}^T}\left( {{x^{\left( 2 \right)}}} \right)}\\
.&.&.&.\\
{{{\left( {{\theta ^{\left( 1 \right)}}} \right)}^T}\left( {{x^{\left( {{n_m}} \right)}}} \right)}&{{{\left( {{\theta ^{\left( 2 \right)}}} \right)}^T}\left( {{x^{\left( {{n_m}} \right)}}} \right)}&.&{{{\left( {{\theta ^{\left( {{n_u}} \right)}}} \right)}^T}\left( {{x^{\left( {{n_m}} \right)}}} \right)}
\end{array}} \right]\]

\[X = \left[ {\begin{array}{*{20}{c}}
{ - {{\left( {{x^{\left( 1 \right)}}} \right)}^T} - }\\
{ - {{\left( {{x^{\left( 2 \right)}}} \right)}^T} - }\\
.\\
{ - {{\left( {{x^{\left( {{n_m}} \right)}}} \right)}^T} - }
\end{array}} \right],\Theta = \left[ {\begin{array}{*{20}{c}}
{ - {{\left( {{\theta ^{\left( 1 \right)}}} \right)}^T} - }\\
{ - {{\left( {{\theta ^{\left( 2 \right)}}} \right)}^T} - }\\
.\\
{ - {{\left( {{\theta ^{\left( {{n_u}} \right)}}} \right)}^T} - }
\end{array}} \right]\]

\[\Pr edictedratings = X{\Theta ^T}\]

-------------------------------------------------------------------------

发现电影的相关性

如果你通过上述算法得到了电影的特征 x(i)。

现在有 5 个已知的电影和它们的特征,如何判断上述特征为 x(i) 的的电影与现有的先惯性大呢?

分别计算这五个电影与上述电影的“距离”,并寻找最小的距离的那个电影就是与这个电影相关性较大的

\[\left\| {{x^{\left( i \right)}} - {x^{\left( j \right)}}} \right\|\]

-------------------------------------------------------------------------

协同过滤算法中的均值归一化

对于数据,如果出现用户没有对任何电影评分

MovieAlice (1)Bob (2)Carol (3)Dave (4)Eve (5)
Love at last5500?
Romance forever5??0?
Cute puppies of love?40??
Nonstop car chases0054?
Swords vs. karate005?

?

 

这种情况下,在最小化代价函数时

\[\underbrace {\min }_{{x^{\left( 1 \right)}},...,{x^{\left( {{n_m}} \right)}},{\theta ^{\left( 1 \right)}},...,{\theta ^{\left( {{n_u}} \right)}}}\frac{1}{2}\sum\limits_{\left( {i,j} \right):r\left( {i,j} \right) = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}}  + \frac{\lambda }{2}\sum\limits_{i = 1}^{{n_m}} {\sum\limits_{j = 1}^n {{{\left( {x_k^{\left( i \right)}} \right)}^2}} }  + \frac{\lambda }{2}\sum\limits_{j = 1}^{{n_u}} {\sum\limits_{j = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} } \]

对于公示的第一部分

\[\frac{1}{2}\sum\limits_{\left( {i,j} \right):r\left( {i,j} \right) = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}} \]

由于没有 r(i, j) = 1 的情况,所以这部分无用

对于

\[\frac{\lambda }{2}\sum\limits_{j = 1}^{{n_u}} {\sum\limits_{j = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} } \]

最小化这个部分的结果是(假设电影有两个特征)

\[{\theta ^{\left( 5 \right)}} = \left[ {\begin{array}{*{20}{c}}
0\\
0
\end{array}} \right]\]

进而导致预测结果 

\[{\left( {{\theta ^{\left( 5 \right)}}} \right)^T}\left( {{x^{\left( i \right)}}} \right) = 0\]

可以看出这样是不对的或者无意义的。

均值归一化的做法是将 Y 减去每一行的均值

\[Y = \left[ {\begin{array}{*{20}{c}}
5&5&0&0&?\\
5&?&?&0&?\\
?&4&0&?&?\\
0&0&5&4&?\\
0&0&5&0&?
\end{array}} \right],\mu = \left[ {\begin{array}{*{20}{c}}
{\begin{array}{*{20}{c}}
{2.5}\\
{2.5}
\end{array}}\\
2\\
{2.25}\\
{1.25}
\end{array}} \right]\]

\[Y = Y - \mu = \left[ {\begin{array}{*{20}{c}}
{2.5}&{2.5}&{ - 2.5}&{ - 2.5}&?\\
{2.5}&?&?&{ - 2.5}&?\\
?&2&{ - 2}&?&?\\
{ - 2.25}&{ - 2.25}&{2.75}&{1.75}&?\\
{ - 1.25}&{ - 1.25}&{3.75}&{ - 1.25}&?
\end{array}} \right]\]

用新的 Y 取训练模型得到参数 x,当预测用户 j 对点电影 i 的评分时,预测结果应该是

\[{\left( {{\theta ^{\left( 5 \right)}}} \right)^T}\left( {{x^{\left( i \right)}}} \right) + {\mu _i}\]

这时,对于用户 Eve 的预测结果就是其它评分的均值

\[{\left( {{\theta ^{\left( 5 \right)}}} \right)^T}\left( x \right) + {\mu _i} = \left[ {\begin{array}{*{20}{c}}
{\begin{array}{*{20}{c}}
{2.5}\\
{2.5}
\end{array}}\\
2\\
{2.25}\\
{1.25}
\end{array}} \right]\]

转载于:https://www.cnblogs.com/qkloveslife/p/9908364.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值