1.ALS算法
ALS(Alternating Least Square,交替最小二乘法)指使用最小二乘法的一种协同推荐算法。在UserCF和ItemCF中,我们需要计算用户-用户相似性矩阵/商品-商品相似性矩阵,对于大数据量的情况下很难处理好。那我们能否像PCA,word embedding那样,用低维度的向量来表示用户和商品呢?
ALS算法将user-item评分矩阵
R
R
R拆分成两个矩阵
U
U
U和
V
T
V^T
VT,其中
U
u
.
U_{u.}
Uu.代表了用户u在d个维度上的潜在个人偏好,
V
i
.
V_{i.}
Vi.代表了物品i在d个维度上的特征。
U
u
.
=
[
U
u
1
,
.
.
.
,
U
u
k
,
.
.
.
,
U
u
d
]
V
i
.
=
[
V
v
1
,
.
.
.
,
V
v
k
,
.
.
.
,
V
v
d
]
U_{u.}=[U_{u1},...,U_{uk},...,U_{ud}] \\ V_{i.}=[V_{v1},...,V_{vk},...,V_{vd}]
Uu.=[Uu1,...,Uuk,...,Uud]Vi.=[Vv1,...,Vvk,...,Vvd]
我们要寻找合适的U和V,使得
R
^
=
U
V
T
≈
R
\hat R = UV^T \approx R
R^=UVT≈R
最后我们使用学到的
U
U
U和
V
T
V^T
VT对未知的用户评分
r
u
i
r_{ui}
rui进行预测,有
r
u
i
=
U
u
.
V
i
.
T
r_{ui}=U_{u.}V_{i.}^T
rui=Uu.Vi.T
这实际上是一个最优化问题,我们需要找到合适的U和V,使得
R
R
R和
R
^
\hat R
R^的差距最小,写出目标函数如下
min
θ
∑
u
=
1
n
∑
i
=
1
m
y
u
i
[
1
2
(
r
u
i
−
U
u
.
V
i
.
T
)
2
+
α
u
2
∣
∣
U
u
.
∣
∣
2
+
α
v
2
∣
∣
v
i
.
∣
∣
2
]
\min_\theta \sum_{u=1}^n\sum_{i=1}^my_{ui}\left[\frac{1}{2}(r_{ui}-U_{u.}V_{i.}^T)^2+\frac{\alpha_u}{2}||U_{u.}||^2+\frac{\alpha_v}{2}||v_{i.}||^2\right]
θminu=1∑ni=1∑myui[21(rui−Uu.Vi.T)2+2αu∣∣Uu.∣∣2+2αv∣∣vi.∣∣2]
其中
y
u
i
y_{ui}
yui表示如果用户u对物品i有评分,则输出1,否则输出0。令
f
f
f表示目标函数,则有
∇
U
u
.
=
∂
f
∂
U
u
,
=
U
u
.
∑
i
=
1
m
y
u
i
(
V
I
.
T
V
i
.
+
α
u
I
)
−
∑
i
=
1
m
y
u
i
r
u
i
V
i
.
\nabla U_{u.}=\frac{\partial f}{\partial U_{u,}}=U_{u.}\sum_{i=1}^my_{ui}(V^T_{I.}V_{i.}+\alpha_uI)-\sum_{i=1}^my_{ui}r_{ui}V_{i.}
∇Uu.=∂Uu,∂f=Uu.i=1∑myui(VI.TVi.+αuI)−i=1∑myuiruiVi.
令偏导为0,有
U
u
.
=
b
u
A
u
−
1
.
.
.
.
.
.
.
.
.
.
(
1
)
b
u
=
∑
i
=
1
m
y
u
i
r
u
i
V
i
.
A
u
=
∑
i
=
1
m
y
u
i
(
V
i
.
T
V
i
.
+
α
u
I
)
\begin{aligned} &U_{u.}=b_uA_u^{-1}..........(1) \\ &b_u=\sum_{i=1}^my_{ui}r_{ui}V_{i.} \\ &A_u=\sum_{i=1}^my_{ui}(V^T_{i.}V_{i.}+\alpha_uI) \end{aligned}
Uu.=buAu−1..........(1)bu=i=1∑myuiruiVi.Au=i=1∑myui(Vi.TVi.+αuI)
同理,对于
V
i
.
V_{i.}
Vi.有
∇
V
i
.
=
∂
f
∂
V
i
,
=
V
i
.
∑
i
=
1
n
y
u
i
(
U
u
.
T
U
u
.
+
α
v
I
)
−
∑
i
=
1
n
y
u
i
r
u
i
U
u
.
\begin{aligned} &\nabla V_{i.}=\frac{\partial f}{\partial V_{i,}}=V_{i.}\sum_{i=1}^ny_{ui}(U^T_{u.}U_{u.}+\alpha_vI)-\sum_{i=1}^ny_{ui}r_{ui}U_{u.} \end{aligned}
∇Vi.=∂Vi,∂f=Vi.i=1∑nyui(Uu.TUu.+αvI)−i=1∑nyuiruiUu.
求偏导后,有
V
i
.
=
b
i
A
i
−
1
.
.
.
.
.
.
.
.
.
.
(
2
)
b
i
=
∑
i
=
1
n
y
u
i
r
u
i
U
u
.
A
i
=
∑
i
=
1
n
y
u
i
(
U
u
.
T
U
u
.
+
α
v
I
)
\begin{aligned} &V_{i.}=b_iA_i^{-1}..........(2) \\ &b_i=\sum_{i=1}^ny_{ui}r_{ui}U_{u.} \\ &A_i=\sum_{i=1}^ny_{ui}(U^T_{u.}U_{u.}+\alpha_vI) \end{aligned}
Vi.=biAi−1..........(2)bi=i=1∑nyuiruiUu.Ai=i=1∑nyui(Uu.TUu.+αvI)
求完偏导数之后,我们的算法也就结束了, 在每轮迭代的时候,使用上面的式子1和2更新权重即可。由于上面的求解过程每次都要遍历一轮样本,因此也有另一个版本的算法——随机梯度下降SGD,每次只选取一个样本进行更新,最终也可以收敛。
2.RSVD算法
前面的ALS算法只是简单地用
R
^
=
U
V
T
\hat R=UV^T
R^=UVT对未知评分进行预测,而RSVD考虑进了用户偏好
b
u
b_u
bu,物品自身的偏置
b
i
b_i
bi,以及全局的平均值
μ
\mu
μ。其目标函数为
r
^
u
i
=
U
u
.
V
i
.
T
+
b
u
+
b
i
+
μ
\hat r_{ui}=U_{u.}V_{i.}^T+b_u+b_i+\mu
r^ui=Uu.Vi.T+bu+bi+μ
同时为了防止过拟合,对
b
u
b_u
bu和
b
i
b_i
bi进行惩罚,加入了正则项,目标函数变为
min
θ
∑
u
=
1
n
∑
i
=
1
m
y
u
i
[
1
2
(
r
u
i
−
r
^
u
i
)
2
+
α
u
2
∣
∣
U
u
.
∣
∣
2
+
α
v
2
∣
∣
v
i
.
∣
∣
2
+
β
u
2
b
u
2
+
β
v
2
b
i
2
]
\min_\theta \sum_{u=1}^n\sum_{i=1}^my_{ui}\left[\frac{1}{2}(r_{ui}-\hat r_{ui})^2+\frac{\alpha_u}{2}||U_{u.}||^2+\frac{\alpha_v}{2}||v_{i.}||^2+\frac{\beta_u}{2}b_u^2+\frac{\beta_v}{2}b_i^2\right]
θminu=1∑ni=1∑myui[21(rui−r^ui)2+2αu∣∣Uu.∣∣2+2αv∣∣vi.∣∣2+2βubu2+2βvbi2]
我们使用随机梯度下降求解,所以只考虑
f
u
i
=
1
2
(
r
u
i
−
U
u
.
V
i
.
T
)
2
+
α
u
2
∣
∣
U
u
.
∣
∣
2
+
α
v
2
∣
∣
v
i
.
∣
∣
2
+
β
u
2
b
u
2
+
β
v
2
b
i
2
f_{ui}=\frac{1}{2}(r_{ui}-U_{u.}V_{i.}^T)^2+\frac{\alpha_u}{2}||U_{u.}||^2+\frac{\alpha_v}{2}||v_{i.}||^2+\frac{\beta_u}{2}b_u^2+\frac{\beta_v}{2}b_i^2
fui=21(rui−Uu.Vi.T)2+2αu∣∣Uu.∣∣2+2αv∣∣vi.∣∣2+2βubu2+2βvbi2
令
e
u
i
=
r
u
i
−
r
^
u
i
e_{ui}=r_{ui}-\hat r_{ui}
eui=rui−r^ui,则有
∇
μ
=
−
e
u
i
∇
b
u
=
−
e
u
i
+
β
u
b
u
∇
b
i
=
−
e
u
i
+
β
v
b
i
∇
U
u
.
=
−
e
u
i
V
i
.
+
α
u
U
u
.
∇
V
i
.
=
−
e
u
i
U
u
.
+
α
v
V
i
.
\begin{aligned} &\nabla_\mu=-e_{ui}\\ &\nabla b_u=-e_{ui}+\beta_ub_u \\ &\nabla b_i= -e_{ui}+\beta_vb_i \\ &\nabla U_{u.}=-e_{ui}V_{i.}+\alpha_uU_{u.} \\ &\nabla V_{i.}=-e_{ui}U_{u.}+\alpha_vV_{i.} \end{aligned}
∇μ=−eui∇bu=−eui+βubu∇bi=−eui+βvbi∇Uu.=−euiVi.+αuUu.∇Vi.=−euiUu.+αvVi.
之后使用下列式子进行更新
μ
=
μ
−
γ
b
u
=
b
u
−
γ
∇
b
u
b
i
=
b
i
−
γ
∇
b
i
U
u
.
=
U
u
.
−
γ
∇
U
u
.
V
i
.
=
V
i
.
−
γ
∇
V
i
.
\begin{aligned} &\mu=\mu-\gamma \\ &b_u=b_u-\gamma\nabla b_u \\ & b_i= b_i-\gamma\nabla b_i \\ &U_{u.}=U_{u.}-\gamma\nabla U_{u.} \\ &V_{i.}=V_{i.}-\gamma\nabla V_{i.} \end{aligned}
μ=μ−γbu=bu−γ∇bubi=bi−γ∇biUu.=Uu.−γ∇Uu.Vi.=Vi.−γ∇Vi.
考虑初始化的赋值,可以使用下列的初始化方式:
μ
=
∑
u
=
1
n
∑
i
=
1
m
y
u
i
r
u
i
/
∑
u
=
1
n
∑
i
=
1
m
y
u
i
b
u
=
∑
i
=
1
m
y
u
i
(
r
u
i
−
μ
)
/
∑
i
=
1
m
y
u
i
b
i
=
∑
u
=
1
n
y
u
i
(
r
u
i
−
μ
)
/
∑
i
=
1
n
y
u
i
U
u
k
=
(
r
−
0.5
)
×
0.01
,
k
=
1
,
2
,
.
.
.
,
d
V
i
k
=
(
r
−
0.5
)
×
0.01
,
k
=
1
,
2
,
.
.
.
,
d
\begin{aligned} &\mu=\sum_{u=1}^n\sum_{i=1}^my_{ui}r_{ui}/\sum_{u=1}^n\sum_{i=1}^my_{ui} \\ &b_u=\sum_{i=1}^my_{ui}(r_{ui}-\mu)/\sum_{i=1}^my_{ui} \\ & b_i= \sum_{u=1}^ny_{ui}(r_{ui}-\mu)/\sum_{i=1}^ny_{ui} \\ &U_{uk}=(r-0.5)\times 0.01 ,k=1,2,...,d \\ &V_{ik}=(r-0.5)\times 0.01 ,k=1,2,...,d \end{aligned}
μ=u=1∑ni=1∑myuirui/u=1∑ni=1∑myuibu=i=1∑myui(rui−μ)/i=1∑myuibi=u=1∑nyui(rui−μ)/i=1∑nyuiUuk=(r−0.5)×0.01,k=1,2,...,dVik=(r−0.5)×0.01,k=1,2,...,d
我们只要使用SGD让权重收敛即可。
附上代码(代码中并没有使用上述初始化,而是随机初始化):
import random
import math
import pandas as pd
import numpy as np
class RSVD():
def __init__(self, allfile, trainfile, testfile, latentFactorNum=20,alpha_u=0.01,alpha_v=0.01,beta_u=0.01,beta_v=0.01,learning_rate=0.01):
data_fields = ['user_id', 'item_id', 'rating', 'timestamp']
# all data file
allData = pd.read_table(allfile, names=data_fields)
# training set file
self.train_df = pd.read_table(trainfile, names=data_fields)
# testing set file
self.test_df=pd.read_table(testfile, names=data_fields)
# get factor number
self.latentFactorNum = latentFactorNum
# get user number
self.userNum = len(set(allData['user_id'].values))
# get item number
self.itemNum = len(set(allData['item_id'].values))
# learning rate
self.learningRate = learning_rate
# the regularization lambda
self.alpha_u=alpha_u
self.alpha_v=alpha_v
self.beta_u=beta_u
self.beta_v=beta_v
# initialize the model and parameters
self.initModel()
# initialize all parameters
def initModel(self):
self.mu = self.train_df['rating'].mean()
self.bu = np.zeros(self.userNum)
self.bi = np.zeros(self.itemNum)
self.U = np.mat(np.random.rand(self.userNum,self.latentFactorNum))
self.V = np.mat(np.random.rand(self.itemNum,self.latentFactorNum))
# self.bu = [0.0 for i in range(self.userNum)]
# self.bi = [0.0 for i in range(self.itemNum)]
# temp = math.sqrt(self.latentFactorNum)
# self.U = [[(0.1 * random.random() / temp) for i in range(self.latentFactorNum)] for j in range(self.userNum)]
# self.V = [[0.1 * random.random() / temp for i in range(self.latentFactorNum)] for j in range(self.itemNum)]
print("Initialize end.The user number is:%d,item number is:%d" % (self.userNum, self.itemNum))
def train(self, iterTimes=100):
print("Beginning to train the model......")
preRmse = 10000.0
for iter in range(iterTimes):
for index in self.train_df.index:
if index % 20000 == 0 :
print("第%s轮进度:%s%%" %(iter,index/len(self.train_df.index)*100))
user = int(self.train_df.loc[index]['user_id'])-1
item = int(self.train_df.loc[index]['item_id'])-1
rating = float(self.train_df.loc[index]['rating'])
pscore = self.predictScore(self.mu, self.bu[user], self.bi[item], self.U[user], self.V[item])
eui = rating - pscore
# update parameters bu and bi(user rating bais and item rating bais)
self.mu= -eui
self.bu[user] += self.learningRate * (eui - self.beta_u * self.bu[user])
self.bi[item] += self.learningRate * (eui - self.beta_v * self.bi[item])
temp = self.U[user]
self.U[user] += self.learningRate * (eui * self.V[user] - self.alpha_u * self.U[user])
self.V[item] += self.learningRate * (temp * eui - self.alpha_v * self.V[item])
# for k in range(self.latentFactorNum):
# temp = self.U[user][k]
# # update U,V
# self.U[user][k] += self.learningRate * (eui * self.V[user][k] - self.alpha_u * self.U[user][k])
# self.V[item][k] += self.learningRate * (temp * eui - self.alpha_v * self.V[item][k])
#
# calculate the current rmse
curRmse = self.test(self.mu, self.bu, self.bi, self.U, self.V)
print("Iteration %d times,RMSE is : %f" % (iter + 1, curRmse))
if curRmse > preRmse:
break
else:
preRmse = curRmse
print("Iteration finished!")
# test on the test set and calculate the RMSE
def test(self, mu, bu, bi, U, V):
cnt = self.test_df.shape[0]
rmse = 0.0
buT=bu.reshape(bu.shape[0],1)
predict_rate_matrix = mu + np.tile(buT,(1,self.itemNum))+ np.tile(bi,(self.userNum,1)) + self.U * self.V.T
for i in self.test_df.index:
user = int(self.test_df.loc[i]['user_id']) - 1
item = int(self.test_df.loc[i]['item_id']) - 1
score = float(self.test_df.loc[i]['rating'])
#pscore = self.predictScore(mu, bu[user], bi[item], U[user], V[item])
pscore = predict_rate_matrix[user,item]
rmse += math.pow(score - pscore, 2)
RMSE=math.sqrt(rmse / cnt)
return RMSE
# calculate the inner product of two vectors
def innerProduct(self, v1, v2):
result = 0.0
for i in range(len(v1)):
result += v1[i] * v2[i]
return result
def predictScore(self, mu, bu, bi, U, V):
#pscore = mu + bu + bi + self.innerProduct(U, V)
pscore = mu + bu + bi + np.multiply(U,V).sum()
if pscore < 1:
pscore = 1
if pscore > 5:
pscore = 5
return pscore
if __name__ == '__main__':
s = RSVD("../datasets/ml-100k/u.data", "../datasets/ml-100k/u1.base", "../datasets/ml-100k/u1.test")
s.train()