姓名:Jyx
描述:人工智能学习笔记
推荐系统SVD
在推荐系统里,我们把用户和物品放在同一个矩阵里,矩阵里的每个元素
r
p
q
r_{pq}
rpq代表用户
p
p
p对物品
q
q
q的评分。为了发现用户和物品间的联系,我们对矩阵进行分解
R
N
×
D
=
(
p
1
p
2
⋮
p
N
)
N
×
N
(
Σ
11
0
⋯
0
0
Σ
22
⋯
0
⋮
⋮
⋱
⋮
0
0
⋯
0
)
N
×
D
(
q
1
q
2
⋮
q
D
)
D
×
D
T
\bf{R}_{N \times D} = \begin{pmatrix} \bf{p}_1 \\ \bf{p}_2 \\ \vdots \\ \bf{p}_N \end{pmatrix}_{N \times N} \begin{pmatrix} \Sigma_{11} & 0 & \cdots & 0 \\ 0 & \Sigma_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \\ \end{pmatrix}_{N \times D} \begin{pmatrix} \bf{q}_1 \\ \bf{q}_2 \\ \vdots \\ \bf{q}_D \end{pmatrix}_{D \times D}^T
RN×D=⎝⎜⎜⎜⎛p1p2⋮pN⎠⎟⎟⎟⎞N×N⎝⎜⎜⎜⎛Σ110⋮00Σ22⋮0⋯⋯⋱⋯00⋮0⎠⎟⎟⎟⎞N×D⎝⎜⎜⎜⎛q1q2⋮qD⎠⎟⎟⎟⎞D×DT
这里
p
i
p_i
pi可以看成是描述用户的向量,
q
j
q_j
qj是描述物品的向量,
Σ
\Sigma
Σ表示用户和物品的耦合关系,用户对任意物品的评分就是
r
i
j
=
p
i
(
Σ
11
0
⋯
0
0
Σ
22
⋯
0
⋮
⋮
⋱
⋮
0
0
⋯
0
)
N
×
D
q
j
T
\bf{r}_{ij} = \bf{p_i} \begin{pmatrix} \Sigma_{11} & 0 & \cdots & 0 \\ 0 & \Sigma_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \\ \end{pmatrix}_{N \times D} q_j^T
rij=pi⎝⎜⎜⎜⎛Σ110⋮00Σ22⋮0⋯⋯⋱⋯00⋮0⎠⎟⎟⎟⎞N×DqjT
但是SVD分解还有一些缺点
- 时间复杂度是 O ( N 3 ) O(N^3) O(N3)
- 评分矩阵太稀疏,分解难以进行
- SVD分解会导致大量的参数,而训练样本数有限,所以每个参数不能得到足够的训练,容易出现过拟合
为此,我们简化SVD分解,SVD分解提供了一种思路,我们化简他的过程,只保留用户矩阵和物品矩阵,并且减少原始SVD分解的特征维数(用户N元N维,物品D元D维),变为K维,得到
R N × D = ( p 1 p 2 ⋮ p N ) N × K ( q 1 q 2 ⋮ q D ) D × K T r i j = p i q j T = ∑ k = 1 K p i k q j k \bf{R}_{N \times D} = \begin{pmatrix} \bf{p}_1 \\ \bf{p}_2 \\ \vdots \\ \bf{p}_N \end{pmatrix}_{N \times K} \begin{pmatrix} \bf{q}_1 \\ \bf{q}_2 \\ \vdots \\ \bf{q}_D \end{pmatrix}_{D \times K}^T \\ r_{ij} = \bf{p}_i \bf{q}_j^T = \sum_{k = 1}^K p_{ik} q_{jk} RN×D=⎝⎜⎜⎜⎛p1p2⋮pN⎠⎟⎟⎟⎞N×K⎝⎜⎜⎜⎛q1q2⋮qD⎠⎟⎟⎟⎞D×KTrij=piqjT=k=1∑Kpikqjk
采用二次损失,有
arg min P , Q 1 2 ∑ i , j ( r i j − ∑ k = 1 K p i k q j k ) 2 \mathop{\arg \min}_{P, Q} \frac{1}{2}\sum_{i, j}(r_{ij} - \sum_{k = 1}^K p_{ik} q_{jk})^2 argminP,Q21i,j∑(rij−k=1∑Kpikqjk)2
即使这样,维数依旧太高,需要加入正则,采用 L 2 L_2 L2正则,这样优化就变为
arg min P , Q 1 2 ∑ i , j ( r i j − ∑ k = 1 K p i k q j k ) 2 + 1 2 λ p ∑ i = 1 N ∣ ∣ p i ∣ ∣ 2 2 + 1 2 λ q ∑ j = 1 D ∣ ∣ q j ∣ ∣ 2 2 \mathop{\arg \min}_{P, Q} \frac{1}{2} \sum_{i, j}(r_{ij} - \sum_{k = 1}^K p_{ik} q_{jk})^2 + \frac{1}{2} \lambda_p \sum_{i=1}^N || \bf{p}_{i}||_2^2 + \frac{1}{2} \lambda_q \sum_{j=1}^D || \bf{q}_{j}||_2^2 \\ argminP,Q21i,j∑(rij−k=1∑Kpikqjk)2+21λpi=1∑N∣∣pi∣∣22+21λqj=1∑D∣∣qj∣∣22
即
arg min P , Q 1 2 ∑ i , j ( r i j − ∑ k = 1 K p i k q j k ) 2 + 1 2 λ p ∑ i = 1 N ∑ k = 0 K p i k 2 + 1 2 λ q ∑ j = 1 D ∑ k = 1 K q j k 2 \mathop{\arg \min}_{P, Q} \frac{1}{2} \sum_{i, j}(r_{ij} - \sum_{k = 1}^K p_{ik} q_{jk})^2 + \frac{1}{2} \lambda_p \sum_{i=1}^N \sum_{k = 0}^K p_{ik}^2 + \frac{1}{2} \lambda_q \sum_{j=1}^D \sum_{ k = 1} ^K q_{jk}^2 argminP,Q21i,j∑(rij−k=1∑Kpikqjk)2+21λpi=1∑Nk=0∑Kpik2+21λqj=1∑Dk=1∑Kqjk2
一般我们会加上几个偏置,一个全局偏置,一个用户偏置,一个物品偏置,当然每个偏置也要相应的加正则,最终我们得到
预测函数。因为全局偏置为一个数,用户偏置,物品偏置均为向量,和已经没法写成矩阵形式了,每个用户对物品的打分写成
r i j = μ + m i + w j + ∑ k = 1 K p i k q j k r_{ij} = \mu + m_{i} + w_{j} + \sum_{k = 1}^K p_{ik} q_{jk} rij=μ+mi+wj+k=1∑Kpikqjk
损失函数
arg min P , Q 1 2 ∑ i , j ( r i j − μ − m i − w j − ∑ k = 1 K p i k q j k ) 2 + 1 2 λ p ∑ i = 1 N ∑ k = 0 K p i k 2 + 1 2 λ q ∑ j = 1 D ∑ k = 1 K q j k 2 + 1 2 λ m ∑ i m i 2 + 1 2 λ w ∑ j w j 2 \mathop{\arg \min}_{P, Q} \frac{1}{2} \sum_{i, j}(r_{ij} - \mu - m_{i} - w_{j} - \sum_{k = 1}^K p_{ik} q_{jk})^2 + \frac{1}{2} \lambda_p \sum_{i=1}^N \sum_{k = 0}^K p_{ik}^2 + \frac{1}{2} \lambda_q \sum_{j=1}^D \sum_{ k = 1} ^K q_{jk}^2 + \frac{1}{2} \lambda_m \sum_{i}m_{i}^2 + \frac{1}{2} \lambda_w \sum_{ j}w_{j}^2 argminP,Q21i,j∑(rij−μ−mi−wj−k=1∑Kpikqjk)2+21λpi=1∑Nk=0∑Kpik2+21λqj=1∑Dk=1∑Kqjk2+21λmi∑mi2+21λwj∑wj2
μ \mu μ为常量,可以根据输入矩阵直接计算出来,我们对最终得到的损失函数求梯度
令 e i j = r i j − μ − m i − w j − ∑ k = 1 K p i k q j k e_{ij} = r_{ij} - \mu - m_{i} - w_{j} -\sum_{k = 1}^K p_{ik} q_{jk} eij=rij−μ−mi−wj−∑k=1Kpikqjk, 有
d l o s s d p i k = − ∑ j e i j q j k + λ p p i k d l o s s d q j k = − ∑ i e i j p i k + λ q q j k d l o s s d m i = − ∑ j e i j + λ m m i d l o s s d w j = − ∑ i e i j + λ w w j \begin{aligned} \dfrac{ \mathrm{d} loss}{\mathrm{d} p_{ik}} &= -\sum_j e_{ij} q_{jk} + \lambda_p p_{ik} \\ \dfrac{ \mathrm{d} loss}{\mathrm{d} q_{jk}} &= -\sum_i e_{ij} p_{ik} + \lambda_q q_{jk} \\ \dfrac{ \mathrm{d} loss}{\mathrm{d} m_{i}} &= -\sum_j e_{ij} + \lambda_m m_{i} \\ \dfrac{ \mathrm{d} loss}{\mathrm{d} w_{j}} &= -\sum_i e_{ij} + \lambda_w w_{j} \\ \end{aligned} dpikdlossdqjkdlossdmidlossdwjdloss=−j∑eijqjk+λppik=−i∑eijpik+λqqjk=−j∑eij+λmmi=−i∑eij+λwwj
采用梯度下降法,则更新公式为
p i k t + 1 = p i k t + α ( ∑ j e i j t q j k t − λ p p i k t ) q j k t + 1 = q j k t + α ( ∑ i e i j t p i k t − λ q q j k t ) m i t + 1 = m i t + α ( ∑ j e i j t − λ m m i t ) w j t + 1 = w j t + α ( ∑ i e i j t − λ w w j t ) \begin{aligned} p_{ik}^{t+1} &= p_{ik}^{t} + \alpha (\sum_j e_{ij}^t q_{jk}^t - \lambda_p p_{ik}^t) \\ q_{jk}^{t+1} &= q_{jk}^{t} + \alpha (\sum_i e_{ij}^t p_{ik}^t - \lambda_q q_{jk}^t) \\ m_{i}^{t+1} &= m_{i}^t + \alpha (\sum_j e_{ij}^t - \lambda_m m_{i}^t) \\ w_{j}^{t+1} &= w_{j}^t + \alpha (\sum_i e_{ij}^t - \lambda_w w_{j}^t) \\ \end{aligned} pikt+1qjkt+1mit+1wjt+1=pikt+α(j∑eijtqjkt−λppikt)=qjkt+α(i∑eijtpikt−λqqjkt)=mit+α(j∑eijt−λmmit)=wjt+α(i∑eijt−λwwjt)
α \alpha α为学习率, λ p \lambda_p λp, λ q \lambda_q λq, λ m \lambda_m λm, λ w \lambda_w λw为正则参数