推荐系统-SVD/LFM基于模型的协同过滤

姓名:Jyx
描述:人工智能学习笔记

推荐系统SVD

在推荐系统里,我们把用户和物品放在同一个矩阵里,矩阵里的每个元素 r p q r_{pq} rpq代表用户 p p p对物品 q q q的评分。为了发现用户和物品间的联系,我们对矩阵进行分解
R N × D = ( p 1 p 2 ⋮ p N ) N × N ( Σ 11 0 ⋯ 0 0 Σ 22 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ 0 ) N × D ( q 1 q 2 ⋮ q D ) D × D T \bf{R}_{N \times D} = \begin{pmatrix} \bf{p}_1 \\ \bf{p}_2 \\ \vdots \\ \bf{p}_N \end{pmatrix}_{N \times N} \begin{pmatrix} \Sigma_{11} & 0 & \cdots & 0 \\ 0 & \Sigma_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \\ \end{pmatrix}_{N \times D} \begin{pmatrix} \bf{q}_1 \\ \bf{q}_2 \\ \vdots \\ \bf{q}_D \end{pmatrix}_{D \times D}^T RN×D=p1p2pNN×NΣ11000Σ220000N×Dq1q2qDD×DT
这里 p i p_i pi可以看成是描述用户的向量, q j q_j qj是描述物品的向量, Σ \Sigma Σ表示用户和物品的耦合关系,用户对任意物品的评分就是
r i j = p i ( Σ 11 0 ⋯ 0 0 Σ 22 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ 0 ) N × D q j T \bf{r}_{ij} = \bf{p_i} \begin{pmatrix} \Sigma_{11} & 0 & \cdots & 0 \\ 0 & \Sigma_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \\ \end{pmatrix}_{N \times D} q_j^T rij=piΣ11000Σ220000N×DqjT
但是SVD分解还有一些缺点

  1. 时间复杂度是 O ( N 3 ) O(N^3) O(N3)
  2. 评分矩阵太稀疏,分解难以进行
  3. SVD分解会导致大量的参数,而训练样本数有限,所以每个参数不能得到足够的训练,容易出现过拟合
    为此,我们简化SVD分解,SVD分解提供了一种思路,我们化简他的过程,只保留用户矩阵和物品矩阵,并且减少原始SVD分解的特征维数(用户N元N维,物品D元D维),变为K维,得到
    R N × D = ( p 1 p 2 ⋮ p N ) N × K ( q 1 q 2 ⋮ q D ) D × K T r i j = p i q j T = ∑ k = 1 K p i k q j k \bf{R}_{N \times D} = \begin{pmatrix} \bf{p}_1 \\ \bf{p}_2 \\ \vdots \\ \bf{p}_N \end{pmatrix}_{N \times K} \begin{pmatrix} \bf{q}_1 \\ \bf{q}_2 \\ \vdots \\ \bf{q}_D \end{pmatrix}_{D \times K}^T \\ r_{ij} = \bf{p}_i \bf{q}_j^T = \sum_{k = 1}^K p_{ik} q_{jk} RN×D=p1p2pNN×Kq1q2qDD×KTrij=piqjT=k=1Kpikqjk
    采用二次损失,有
    arg ⁡ min ⁡ P , Q 1 2 ∑ i , j ( r i j − ∑ k = 1 K p i k q j k ) 2 \mathop{\arg \min}_{P, Q} \frac{1}{2}\sum_{i, j}(r_{ij} - \sum_{k = 1}^K p_{ik} q_{jk})^2 argminP,Q21i,j(rijk=1Kpikqjk)2
    即使这样,维数依旧太高,需要加入正则,采用 L 2 L_2 L2正则,这样优化就变为
    arg ⁡ min ⁡ P , Q 1 2 ∑ i , j ( r i j − ∑ k = 1 K p i k q j k ) 2 + 1 2 λ p ∑ i = 1 N ∣ ∣ p i ∣ ∣ 2 2 + 1 2 λ q ∑ j = 1 D ∣ ∣ q j ∣ ∣ 2 2 \mathop{\arg \min}_{P, Q} \frac{1}{2} \sum_{i, j}(r_{ij} - \sum_{k = 1}^K p_{ik} q_{jk})^2 + \frac{1}{2} \lambda_p \sum_{i=1}^N || \bf{p}_{i}||_2^2 + \frac{1}{2} \lambda_q \sum_{j=1}^D || \bf{q}_{j}||_2^2 \\ argminP,Q21i,j(rijk=1Kpikqjk)2+21λpi=1Npi22+21λqj=1Dqj22

    arg ⁡ min ⁡ P , Q 1 2 ∑ i , j ( r i j − ∑ k = 1 K p i k q j k ) 2 + 1 2 λ p ∑ i = 1 N ∑ k = 0 K p i k 2 + 1 2 λ q ∑ j = 1 D ∑ k = 1 K q j k 2 \mathop{\arg \min}_{P, Q} \frac{1}{2} \sum_{i, j}(r_{ij} - \sum_{k = 1}^K p_{ik} q_{jk})^2 + \frac{1}{2} \lambda_p \sum_{i=1}^N \sum_{k = 0}^K p_{ik}^2 + \frac{1}{2} \lambda_q \sum_{j=1}^D \sum_{ k = 1} ^K q_{jk}^2 argminP,Q21i,j(rijk=1Kpikqjk)2+21λpi=1Nk=0Kpik2+21λqj=1Dk=1Kqjk2
    一般我们会加上几个偏置,一个全局偏置,一个用户偏置,一个物品偏置,当然每个偏置也要相应的加正则,最终我们得到
    预测函数。因为全局偏置为一个数,用户偏置,物品偏置均为向量,和已经没法写成矩阵形式了,每个用户对物品的打分写成
    r i j = μ + m i + w j + ∑ k = 1 K p i k q j k r_{ij} = \mu + m_{i} + w_{j} + \sum_{k = 1}^K p_{ik} q_{jk} rij=μ+mi+wj+k=1Kpikqjk
    损失函数
    arg ⁡ min ⁡ P , Q 1 2 ∑ i , j ( r i j − μ − m i − w j − ∑ k = 1 K p i k q j k ) 2 + 1 2 λ p ∑ i = 1 N ∑ k = 0 K p i k 2 + 1 2 λ q ∑ j = 1 D ∑ k = 1 K q j k 2 + 1 2 λ m ∑ i m i 2 + 1 2 λ w ∑ j w j 2 \mathop{\arg \min}_{P, Q} \frac{1}{2} \sum_{i, j}(r_{ij} - \mu - m_{i} - w_{j} - \sum_{k = 1}^K p_{ik} q_{jk})^2 + \frac{1}{2} \lambda_p \sum_{i=1}^N \sum_{k = 0}^K p_{ik}^2 + \frac{1}{2} \lambda_q \sum_{j=1}^D \sum_{ k = 1} ^K q_{jk}^2 + \frac{1}{2} \lambda_m \sum_{i}m_{i}^2 + \frac{1}{2} \lambda_w \sum_{ j}w_{j}^2 argminP,Q21i,j(rijμmiwjk=1Kpikqjk)2+21λpi=1Nk=0Kpik2+21λqj=1Dk=1Kqjk2+21λmimi2+21λwjwj2
    μ \mu μ为常量,可以根据输入矩阵直接计算出来,我们对最终得到的损失函数求梯度
    e i j = r i j − μ − m i − w j − ∑ k = 1 K p i k q j k e_{ij} = r_{ij} - \mu - m_{i} - w_{j} -\sum_{k = 1}^K p_{ik} q_{jk} eij=rijμmiwjk=1Kpikqjk, 有
    d l o s s d p i k = − ∑ j e i j q j k + λ p p i k d l o s s d q j k = − ∑ i e i j p i k + λ q q j k d l o s s d m i = − ∑ j e i j + λ m m i d l o s s d w j = − ∑ i e i j + λ w w j \begin{aligned} \dfrac{ \mathrm{d} loss}{\mathrm{d} p_{ik}} &= -\sum_j e_{ij} q_{jk} + \lambda_p p_{ik} \\ \dfrac{ \mathrm{d} loss}{\mathrm{d} q_{jk}} &= -\sum_i e_{ij} p_{ik} + \lambda_q q_{jk} \\ \dfrac{ \mathrm{d} loss}{\mathrm{d} m_{i}} &= -\sum_j e_{ij} + \lambda_m m_{i} \\ \dfrac{ \mathrm{d} loss}{\mathrm{d} w_{j}} &= -\sum_i e_{ij} + \lambda_w w_{j} \\ \end{aligned} dpikdlossdqjkdlossdmidlossdwjdloss=jeijqjk+λppik=ieijpik+λqqjk=jeij+λmmi=ieij+λwwj
    采用梯度下降法,则更新公式为
    p i k t + 1 = p i k t + α ( ∑ j e i j t q j k t − λ p p i k t ) q j k t + 1 = q j k t + α ( ∑ i e i j t p i k t − λ q q j k t ) m i t + 1 = m i t + α ( ∑ j e i j t − λ m m i t ) w j t + 1 = w j t + α ( ∑ i e i j t − λ w w j t ) \begin{aligned} p_{ik}^{t+1} &= p_{ik}^{t} + \alpha (\sum_j e_{ij}^t q_{jk}^t - \lambda_p p_{ik}^t) \\ q_{jk}^{t+1} &= q_{jk}^{t} + \alpha (\sum_i e_{ij}^t p_{ik}^t - \lambda_q q_{jk}^t) \\ m_{i}^{t+1} &= m_{i}^t + \alpha (\sum_j e_{ij}^t - \lambda_m m_{i}^t) \\ w_{j}^{t+1} &= w_{j}^t + \alpha (\sum_i e_{ij}^t - \lambda_w w_{j}^t) \\ \end{aligned} pikt+1qjkt+1mit+1wjt+1=pikt+α(jeijtqjktλppikt)=qjkt+α(ieijtpiktλqqjkt)=mit+α(jeijtλmmit)=wjt+α(ieijtλwwjt)
    α \alpha α为学习率, λ p \lambda_p λp λ q \lambda_q λq λ m \lambda_m λm λ w \lambda_w λw为正则参数
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值