将用户的历史行为序列分割成长度为T的连续的连续的若干个子序列:
S = { x 1 , x 2 , . . . , x ∣ S ∣ } = { S n } n = 1 N S = \{x_1,x_2,...,x_{|S|}\} = \{S_n\}_{n=1}^N S={x1,x2,...,x∣S∣}={Sn}n=1N
其中 S n = { x n , 1 , x n , 2 , . . . , x x , T } S_n = \{x_{n,1},x_{n,2},...,x_{x,T}\} Sn={xn,1,xn,2,...,xx,T} 表示第 n 个序列,T表示子序列的长度。
短期兴趣生成网络
H ~ n − 1 l ∈ R T × D \tilde{H}_{n-1}^l\in R^{T \times D} H~n−1l∈RT×D 表示序列 S n − 1 S_{n-1} Sn−1的第 l l l层隐状态,网络结构为:
H ~ n l = A t t e n r e c l ( Q ~ n l , K ~ n l , V ~ n l ) = s o f t m a x ( Q ~ n l ( K ~ n l ) T ) V ~ n l \tilde{H}_n^l = Atten^l_{rec}(\tilde{Q}_n^l,\tilde{K}_n^l,\tilde{V}_n^l) = softmax(\tilde{Q}_n^l(\tilde{K}_n^l)^T)\tilde{V}_n^l H~nl=Attenrecl(Q~nl,K~nl,V~nl)=softmax(Q~nl(K~nl)T)V~nl
Q ~ n l = H ~ n l − 1 W ~ Q T \tilde{Q}_n^l = \tilde{H}_n^{l-1}\tilde{W}_Q^T Q~nl=H~nl−1W~QT
K ~ n l = H n l − 1 W ~ K T \tilde{K}_n^l = H_n^{l-1}\tilde{W}_K^T K~nl=Hnl−1W~KT
V ~ n l = H n l − 1 W ~ V T \tilde{V}_n^l = H_n^{l-1}\tilde{W}_V^T V~nl=Hnl−1W~VT
H n l − 1 = H ~ n l − 1 ∣ ∣ S t o p G r a d i e n t ( H ~ n − 1 l − 1 ) H_n^{l-1} = \tilde{H}_n^{l-1}||StopGradient(\tilde{H}_{n-1}^{l-1}) Hnl−1=H~nl−1∣∣StopGradient(H~n−1l−1)
H ~ n 0 = X n = { x n , 1 , x n , 2 , . . . , x x , T } ∈ R T × D \tilde{H}_n^0 = X_n = \{x_{n,1},x_{n,2},...,x_{x,T}\}\in R^{T\times D} H~n0=Xn={xn,1,xn,2,...,xx,T}∈RT×D
H ~ n = H ~ n L \tilde{H}_n = \tilde{H}_n^L H~n=H~nL
其中 ∣ ∣ || ∣∣表示concat操作, W ~ Q , W ~ K , W ~ V ∈ R D × D \tilde{W}_Q,\tilde{W}_K,\tilde{W}_V \in R^{D\times D} W~Q,W~K,W~V∈RD×D 表示模型参数。
长期兴趣生成网络
M l ∈ R m × D M^l\in R^{m\times D} Ml∈Rm×D
m表示memory slot的个数, M l M^l Ml表示 l l l层memory 矩阵。
H ^ n l = A t t e n l ( Q ^ n l , K ^ n l , V ^ n l ) \hat{H}_n^l = Atten^l(\hat{Q}_n^l,\hat{K}_n^l,\hat{V}_n^l) H^nl=Attenl(Q^nl,K^nl,V^nl)
Q ^ n l , K ^ n l , V ^ n l = H ^ n l − 1 W ^ Q T , M l − 1 W ^ K T , M l − 1 W ^ Q T \hat{Q}_n^l,\hat{K}_n^l,\hat{V}_n^l = \hat{H}_n^{l-1}\hat{W}_Q^T,M^{l-1}\hat{W}_K^T,M^{l-1}\hat{W}_Q^T Q^nl,K^nl,V^nl=H^nl−1W^QT,Ml−1W^KT,Ml−1W^QT
H ^ n = H ^ n L \hat{H}_n = \hat{H}_n^L H^n=H^nL
W ^ Q , W ^ K , W ^ V \hat{W}_Q,\hat{W}_K,\hat{W}_V W^Q,W^K,W^V表示模型参数。
兴趣融合网络
V n = G n ⊙ H ~ n + ( 1 − G n ) ⊙ H ^ n V_n = G_n \odot \tilde{H}_n + (1 - G_n)\odot\hat{H}_n Vn=Gn⊙H~n+(1−Gn)⊙H^n
G n = σ ( H ~ n W s h o r t + H ^ n W l o n g ) ∈ R T × D G_n = \sigma(\tilde{H}_n W_{short} + \hat{H}_n W_{long}) \in R^{T\times D} Gn=σ(H~nWshort+H^nWlong)∈RT×D
其中 ⊙ \odot ⊙表示逐位乘法, W s h o r t , W l o n g ∈ R D × D W_{short},W_{long} \in R^{D\times D} Wshort,Wlong∈RD×D表示模型参数.
长期兴趣更新网络
M l ← f a b s l ( M l , H ~ n − 1 l ) M^l \leftarrow f^l_{abs}(M^l, \tilde{H}_{n-1}^l) Ml←fabsl(Ml,H~n−1l)
f a b s l : R ( m + T ) × D → R m × D f_{abs}^l:R^{(m+T) \times D} \rightarrow R^{m\times D} fabsl:R(m+T)×D→Rm×D
f a b s l f_{abs}^l fabsl的通过胶囊网络实现:
b i j = x ‾ j W i j x i b_{ij} = \overline {x}_jW_{ij}x_i bij=xjWijxi
α i j = e x p ( b i j ) / ∑ j ′ = 1 m + T e x p ( b i j ′ ) \alpha_{ij}=exp(b_{ij})/\sum_{j'=1}^{m+T}exp(b_{ij'}) αij=exp(bij)/j′=1∑m+Texp(bij′)
s j = ∑ i = 1 m + T α i j W i j x i s_j = \sum_{i=1}^{m+T} \alpha_{ij}W_{ij}x_i sj=i=1∑m+TαijWijxi
x ‾ j = s q u a s h ( s j ) = ∣ ∣ s j ∣ ∣ 2 1 + ∣ ∣ s j ∣ ∣ 2 s j ∣ ∣ s j ∣ ∣ \overline{x}_j = squash(s_j) = \frac{||s_j||^2}{1+||s_j||^2}\frac{s_j}{||s_j||} xj=squash(sj)=1+∣∣sj∣∣2∣∣sj∣∣2∣∣sj∣∣sj
M = [ x ‾ 1 , x ‾ 2 , . . . , x ‾ m ] M = [\overline{x}_1,\overline{x}_2,...,\overline{x}_m] M=[x1,x2,...,xm]
f a b s l f_{abs}^l fabsl辅助训练loss:
L a e = ∑ l = 1 L ∣ ∣ a t t e n t r e c l ( Q ~ l , K ~ l , V ~ ) l − a t t e n t r e c l ( Q ~ l , K ^ l , V ^ l ) ∣ ∣ F 2 \mathcal{L}_{ae} = \sum_{l=1}^L ||attent_{rec}^l(\tilde{Q}^l,\tilde{K}^l,\tilde{V})^l - attent_{rec}^l(\tilde{Q}^l,\hat{K}^l,\hat{V}^l)||_F^2 Lae=l=1∑L∣∣attentrecl(Q~l,K~l,V~)l−attentrecl(Q~l,K^l,V^l)∣∣F2
Q ~ l = H ~ n l , K ~ n l = V ~ n l = M l ∣ ∣ H ~ n − 1 l , K ^ l = V ^ l = M l \tilde{Q}^l = \tilde{H}_n^l, \tilde{K}_n^l = \tilde{V}_n^l = M^l || \tilde{H}_{n-1}^l ,\hat{K}^l=\hat{V}^l = M^l Q~l=H~nl,K~nl=V~nl=Ml∣∣H~n−1l,K^l=V^l=Ml
loss 函数
L l i k e = − ∑ u ∈ U ∑ t ∈ S n l o g e x p ( x t T V n , t ) ∑ j ∈ V e x p ( x j T V n , t ) \mathcal{L}_{like} = -\sum_{u\in U}\sum_{t\in S_n}log\frac{exp(x_t^TV_{}n,t)}{\sum_{j\in V}exp(x_j^TV_{n,t})} Llike=−u∈U∑t∈Sn∑log∑j∈Vexp(xjTVn,t)exp(xtTVn,t)
L = L l i k e + L a e \mathcal{L} = \mathcal{L}_{like} + \mathcal{L}_{ae} L=Llike+Lae