矩阵分解(预测打分)

1.问题引入

 有如下R(5,4)矩阵:("-"表示用户没有打分)
其中打分矩阵R(n,m)是n行和m列,n表示user个数,m表示item个数。
在这里插入图片描述
 想要根据目前的矩阵R对未打分的商品进行评分的预测。为了解决这个问题,就用到了矩阵分解的方法。

2.问题分析

2.1构造损失函数

 为了求出未打分的值,可以将矩阵R(n,m)分解为P(n,k)*Q(K,m),所以可以得到一个预测值 R ^ = P ∗ Q \hat{R}=P*Q R^=PQ,来预测矩阵R,那么此时我们的问题就转换成了如何令 R ^ \hat{R} R^与R最为接近,由此我们引入了损失函数的概念,即让 R ^ − R \hat{R}-R R^R变小,当小到一定程度后我们即可认为 R ^ ≈ R \hat{R}\approx{R} R^R,为了简化计算(令该式子始终大于0),我们给其加入了一个平方变为
L o s s ( P , Q ) = 1 2 ( R − R ^ ) 2 = 1 2 ( ∑ i = 1 m ∑ j = 1 n ( R i j − ∑ s = 1 k P i s Q j s ) ) 2 Loss(P,Q)=\frac{1}{2}(\hat{R-R})^2=\frac{1}{2}(\sum_{i=1}^{m}\sum_{j=1}^{n}({R_{ij}-\sum_{s=1}^{k}P_{is}Q_{js}}))^{2} Loss(P,Q)=21(RR^)2=21(i=1mj=1n(Rijs=1kPisQjs))2

2.1梯度下降

 梯度下降法的核心思想即是沿函数的导数方向的不断改变自身参数,从到最终能到达最低点。梯度下降法的一些注意事项在多元线性回归中做了一些介绍,有兴趣可以查看。
 故运用梯度下降法,先求其偏导数:

∂ ∂ P i k L ( P , Q ) \frac{\partial}{\partial{P_{ik}}}L(P,Q) PikL(P,Q)
= 1 2 ( R i j − ∑ k = 1 k P i k Q k j ) ∗ 2 ∗ ( − 1 ) ∗ ( ∂ ∑ k = 1 k P i k Q k j ) ∂ P i k ) =\frac{1}{2}(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})*2*(-1)*(\frac{\partial{\sum_{k=1}^{k}P_{ik}Q_{kj}})}{\partial{P_{ik}}}) =21(Rijk=1kPikQkj)2(1)(Pikk=1kPikQkj))
= − ( R i j − ∑ k = 1 k P i k Q k j ) Q k j =-(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})Q_{kj} =(Rijk=1kPikQkj)Qkj
∂ ∂ Q k j L ( P , Q ) \frac{\partial}{\partial{Q_{kj}}}L(P,Q) QkjL(P,Q)
= 1 2 ( R i j − ∑ k = 1 k P i k Q k j ) ∗ 2 ∗ ( − 1 ) ∗ ( ∂ ∑ k = 1 k P i k Q k j ) ∂ P k j ) =\frac{1}{2}(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})*2*(-1)*(\frac{\partial{\sum_{k=1}^{k}P_{ik}Q_{kj}})}{\partial{P_{kj}}}) =21(Rijk=1kPikQkj)2(1)(Pkjk=1kPikQkj))
= − ( R i j − ∑ k = 1 k P i k Q k j ) P i k =-(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})P_{ik} =(Rijk=1kPikQkj)Pik

应用梯度下降法不断修改当前参数值:
P i k = P i k − α ∗ ( − ( R i j − ∑ k = 1 k P i k Q k j ) Q k j ) P_{ik}=P_{ik}-\alpha*(-(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})Q_{kj}) Pik=Pikα((Rijk=1kPikQkj)Qkj)
= P i k + α ( R i j − ∑ k = 1 k P i k Q k j ) Q k j ) =P_{ik}+\alpha(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})Q_{kj}) =Pik+α(Rijk=1kPikQkj)Qkj)

Q k j = P i k − α ∗ ( − ( R i j − ∑ k = 1 k P i k Q k j ) P i k ) Q_{kj}=P_{ik}-\alpha*(-(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})P_{ik}) Qkj=Pikα((Rijk=1kPikQkj)Pik)
= P i k + α ( R i j − ∑ k = 1 k P i k Q k j ) P i k ) =P_{ik}+\alpha(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})P_{ik}) =Pik+α(Rijk=1kPikQkj)Pik)
( α \alpha α为learning rate ,即学习率)

2.3正则化

 当训练的数据不够时或特征数量大于样本数量时,就会出现过拟合的情况,我们为了解决过拟合,就会采用正则化的手段或者减少一些非必要的样本特征。常用的有L1范数和L2范数。
①LP范数不是一个范数,而是一组范数
∣ ∣ x ∣ ∣ p = ( ∑ i n ∣ x i ∣ p ) 1 p ||x||_{p}=(\sum_{i}^{n}|x_{i}|^{p})^{\frac{1}{p}} xp=(inxip)p1
②p=1时,几位L1范数
∣ ∣ x 1 ∣ ∣ = ∑ i = 1 n ∣ x i ∣ ||x_{1}||=\sum_{i=1}^{n}|x_{i}| x1=i=1nxi
②L2范数,平方再开方,表示向量的距离。
∣ ∣ x 2 ∣ ∣ = ( ∑ i = 1 n ∣ x i ∣ 2 ) 1 2 ||x_{2}||=(\sum_{i=1}^{n}|x_{i}|^{2})^{\frac{1}{2}} x2=(i=1nxi2)21
L1正则化:
J ( θ ) = 1 2 m [ ∑ i = 1 m ( y i − h θ ( x i ) ) 2 + λ ∑ j = 1 n ∣ θ j ∣ ] J({\theta})=\frac{1}{2m}[\sum_{i=1}^{m}{(y^{i}-h_{\theta}(x^{i}))^{2}+\lambda\sum_{j=1}{n}|\theta_{j}|}] J(θ)=2m1[i=1m(yihθ(xi))2+λj=1nθj]
L2正则化:
J ( θ ) = 1 2 m [ ∑ i = 1 m ( y i − h θ ( x i ) ) 2 + λ ∑ j = 1 n θ j 2 ] J({\theta})=\frac{1}{2m}[\sum_{i=1}^{m}{(y^{i}-h_{\theta}(x^{i}))^{2}+\lambda\sum_{j=1}{n}\theta_{j}^{2}}] J(θ)=2m1[i=1m(yihθ(xi))2+λj=1nθj2]
式子中m为样本数量,n为特征个数。

3.代码实现

import numpy as np
import matplotlib.pyplot as plt
R=np.array([[5,3,0,1],
            [4,0,0,1],
            [1,1,0,5],
            [1,0,0,4],
            [0,1,5,4]])

M=R.shape[0]
N=R.shape[1]
K=2
P=np.random.rand(M,K)
Q=np.random.rand(K,N)
print(P)
print(Q)
[[0.65360868 0.08726216]
 [0.69762591 0.03076026]
 [0.88861613 0.75595254]
 [0.31559948 0.49104854]
 [0.33077934 0.40914307]]
[[0.68124148 0.49563369 0.14058849 0.73546283]
 [0.31494592 0.86798718 0.48578754 0.95628464]]

L o s s = 1 2 ( R − P Q ) 2 = 1 2 ( ∑ i = 1 m ∑ j = 1 n ( R i j − ∑ s = 1 k P i s Q j s ) ) 2 Loss=\frac{1}{2}(R-PQ)^{2}=\frac{1}{2}(\sum_{i=1}^{m}\sum_{j=1}^{n}({R_{ij}-\sum_{s=1}^{k}P_{is}Q_{js}}))^{2} Loss=21(RPQ)2=21(i=1mj=1n(Rijs=1kPisQjs))2

#损失函数
def cost(R,P,Q):
    e=0
    for i in range(R.shape[0]):
        for j in range(R.shape[1]):
            if(R[i,j]>0):
                e+=(R[i,j]-np.dot(P[i,:],Q[:,j]))**2
                
    return e/2

L2正则化:
L o s s = 1 2 ( R − P Q ) 2 = 1 2 ( ∑ i = 1 m ∑ j = 1 n ) R i j − ∑ s = 1 k P i s Q j s ) 2 + λ 2 ∑ s = 1 k ( P i k 2 + Q k j 2 ) Loss=\frac{1}{2}(R-PQ)^{2}=\frac{1}{2}(\sum_{i=1}^{m}\sum_{j=1}^{n}){R_{ij}-\sum_{s=1}^{k}P_{is}Q_{js}})^{2}+\frac{\lambda}{2}\sum_{s=1}{k}({P_{ik}^{2}}+Q_{kj}^{2}) Loss=21(RPQ)2=21(i=1mj=1n)Rijs=1kPisQjs)2+2λs=1k(Pik2+Qkj2)

#损失函数正则
def cost_re(R,P,Q,lambda1):
    e=0
    for i in range(R.shape[0]):
        for j in range(R.shape[1]):
            if(R[i][j]>0):
                e+=(R[i,j]-np.dot(P[i,:],Q[:,j]))**2
                for k in range(P.shape[1]):
                    e+=lambda1*(P[i,k]**2+Q[k,j]**2)/2    
    return e/2

对损失函数求偏导:
∂ ∂ P i k L ( P , Q ) \frac{\partial}{\partial{P_{ik}}}L(P,Q) PikL(P,Q)
= 1 2 ( R i j − ∑ k = 1 k P i k Q k j ) ∗ 2 ∗ ( − 1 ) ∗ ( ∂ ∑ k = 1 k P i k Q k j ) ∂ P i k ) =\frac{1}{2}(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})*2*(-1)*(\frac{\partial{\sum_{k=1}^{k}P_{ik}Q_{kj}})}{\partial{P_{ik}}}) =21(Rijk=1kPikQkj)2(1)(Pikk=1kPikQkj))
= − ( R i j − ∑ k = 1 k P i k Q k j ) Q k j =-(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})Q_{kj} =(Rijk=1kPikQkj)Qkj
∂ ∂ Q k j L ( P , Q ) \frac{\partial}{\partial{Q_{kj}}}L(P,Q) QkjL(P,Q)
= 1 2 ( R i j − ∑ k = 1 k P i k Q k j ) ∗ 2 ∗ ( − 1 ) ∗ ( ∂ ∑ k = 1 k P i k Q k j ) ∂ P k j ) =\frac{1}{2}(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})*2*(-1)*(\frac{\partial{\sum_{k=1}^{k}P_{ik}Q_{kj}})}{\partial{P_{kj}}}) =21(Rijk=1kPikQkj)2(1)(Pkjk=1kPikQkj))
= − ( R i j − ∑ k = 1 k P i k Q k j ) P i k =-(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})P_{ik} =(Rijk=1kPikQkj)Pik
梯度下降法:
P i k = P i k − α ∗ ( − ( R i j − ∑ k = 1 k P i k Q k j ) Q k j ) P_{ik}=P_{ik}-\alpha*(-(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})Q_{kj}) Pik=Pikα((Rijk=1kPikQkj)Qkj)
= P i k + α ( R i j − ∑ k = 1 k P i k Q k j ) Q k j ) =P_{ik}+\alpha(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})Q_{kj}) =Pik+α(Rijk=1kPikQkj)Qkj)

Q k j = P i k − α ∗ ( − ( R i j − ∑ k = 1 k P i k Q k j ) P i k ) Q_{kj}=P_{ik}-\alpha*(-(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})P_{ik}) Qkj=Pikα((Rijk=1kPikQkj)Pik)
= P i k + α ( R i j − ∑ k = 1 k P i k Q k j ) P i k ) =P_{ik}+\alpha(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})P_{ik}) =Pik+α(Rijk=1kPikQkj)Pik)

#梯度下降
def grad(R,P,Q,lr,epochs):
    costList=[]
    for s in range(epochs+1):
        for i in range(R.shape[0]):
            for j in range(R.shape[1]):
                if(R[i][j]>0):
                    e=R[i,j]-np.dot(P[i,:],Q[:,j])
                    for k in range(P.shape[1]):
                        grad_p=e*Q[k][j]
                        grad_q=e*P[i][k]

                        P[i][k]=P[i][k]+lr*grad_p
                        Q[k][j]=Q[k][j]+lr*grad_q
            
        if s%50==0:
            e=cost(R,P,Q)
            costList.append(e) 
            #print(e)
    return P,Q,costList
lr=0.0001
epochs=10000
p,q,costList=grad(R,P,Q,lr,epochs)
print(np.dot(p,q))
[[5.06116901 2.87426131 2.43375669 1.00254423]
 [3.96425112 2.26334177 2.08974137 0.96504113]
 [1.0366021  0.90360134 5.30277467 4.91333664]
 [0.99043153 0.81279639 4.2952649  3.93863408]
 [1.64541544 1.19616657 4.78417292 4.23883696]]
x=np.linspace(0,10000,201)
plt.plot(x,costList,'r')
plt.xlabel('epochs')
plt.ylabel('cost')
plt.show()

请添加图片描述

加入正则项后:
P i k = P i k − α ∗ ( − ( R i j − ∑ k = 1 k P i k Q k j ) P i k ) + λ P i k P_{ik}=P_{ik}-\alpha*(-(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})P_{ik})+\lambda{P_{ik}} Pik=Pikα((Rijk=1kPikQkj)Pik)+λPik
= P i k + α ( R i j − ∑ k = 1 k P i k Q k j ) P i k ) − λ P i k =P_{ik}+\alpha(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})P_{ik})-\lambda{P_{ik}} =Pik+α(Rijk=1kPikQkj)Pik)λPik
Q k j = Q k j − α ∗ ( − ( R i j − ∑ k = 1 k P i k Q k j ) Q k j ) + λ Q k j Q_{kj}=Q_{kj}-\alpha*(-(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})Q_{kj})+\lambda{Q_{kj}} Qkj=Qkjα((Rijk=1kPikQkj)Qkj)+λQkj
= P i k + α ( R i j − ∑ k = 1 k P i k Q k j ) Q k j ) − λ Q k j =P_{ik}+\alpha(R_{ij}-\sum_{k=1}^{k}P_{ik}Q_{kj})Q_{kj})-\lambda{Q_{kj}} =Pik+α(Rijk=1kPikQkj)Qkj)λQkj

#梯度下降(正则)
def grad_re(R,P,Q,lr,epochs,lambda1):
    costList=[]
    for s in range(epochs+1):
        for i in range(R.shape[0]):
            for j in range(R.shape[1]):
                if(R[i][j]>0):
                    e=R[i,j]-np.dot(P[i,:],Q[:,j])
                    for k in range(P.shape[1]):
                        grad_p=e*Q[k][j]#求梯度
                        grad_q=e*P[i][k]

                        #对PQ同时进行梯度下降,改变其值
                        P[i][k]=P[i][k]+lr*grad_p-lambda1*P[i][k]
                        Q[k][j]=Q[k][j]+lr*grad_q-lambda1*Q[k][j]
            
        if s%50==0:#求损失
            e=cost_re(R,P,Q,lambda1)
            costList.append(e) 
            #print(e)
    return P,Q,costList
lambda1=0.0001
p,q,costList=grad_re(R,P,Q,0.003,epochs,lambda1)
print(np.dot(p,q))

[[4.92262848 2.9460198  2.02962105 1.00548811]
 [3.9420148  2.37520018 1.85600337 0.99804788]
 [1.00407789 0.99148638 6.03037773 4.90035828]
 [0.99660486 0.90775439 4.8875762  3.94603309]
 [1.15092516 1.00009806 4.95103649 3.97741499]]
x=np.linspace(0,10000,201)
plt.plot(x,costList,'r')
plt.xlabel('epochs')
plt.ylabel('cost')
plt.show()

请添加图片描述

 通过观察代价函数随迭代步数的变化图可以看出:进行正则化后的代价函数的图像比没有进行正则化的图像较为圆滑了些许,可以得出正则化确实是一种解决过拟合问题的一种手段。
 此外,除了正则化以外,还可以通过增加样本数据集的数量或者减少样本特征,抛弃一些不必要的样本特征来避免过拟合。

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值