Dynamic Memory based Attention Network for Sequential Recommendation

将用户的历史行为序列分割成长度为T的连续的连续的若干个子序列:

S = { x 1 , x 2 , . . . , x ∣ S ∣ } = { S n } n = 1 N S = \{x_1,x_2,...,x_{|S|}\} = \{S_n\}_{n=1}^N S={x1,x2,...,xS}={Sn}n=1N

其中 S n = { x n , 1 , x n , 2 , . . . , x x , T } S_n = \{x_{n,1},x_{n,2},...,x_{x,T}\} Sn={xn,1,xn,2,...,xx,T} 表示第 n 个序列,T表示子序列的长度。

短期兴趣生成网络

H ~ n − 1 l ∈ R T × D \tilde{H}_{n-1}^l\in R^{T \times D} H~n1lRT×D 表示序列 S n − 1 S_{n-1} Sn1的第 l l l层隐状态,网络结构为:

H ~ n l = A t t e n r e c l ( Q ~ n l , K ~ n l , V ~ n l ) = s o f t m a x ( Q ~ n l ( K ~ n l ) T ) V ~ n l \tilde{H}_n^l = Atten^l_{rec}(\tilde{Q}_n^l,\tilde{K}_n^l,\tilde{V}_n^l) = softmax(\tilde{Q}_n^l(\tilde{K}_n^l)^T)\tilde{V}_n^l H~nl=Attenrecl(Q~nl,K~nl,V~nl)=softmax(Q~nl(K~nl)T)V~nl

Q ~ n l = H ~ n l − 1 W ~ Q T \tilde{Q}_n^l = \tilde{H}_n^{l-1}\tilde{W}_Q^T Q~nl=H~nl1W~QT

K ~ n l = H n l − 1 W ~ K T \tilde{K}_n^l = H_n^{l-1}\tilde{W}_K^T K~nl=Hnl1W~KT

V ~ n l = H n l − 1 W ~ V T \tilde{V}_n^l = H_n^{l-1}\tilde{W}_V^T V~nl=Hnl1W~VT

H n l − 1 = H ~ n l − 1 ∣ ∣ S t o p G r a d i e n t ( H ~ n − 1 l − 1 ) H_n^{l-1} = \tilde{H}_n^{l-1}||StopGradient(\tilde{H}_{n-1}^{l-1}) Hnl1=H~nl1∣∣StopGradient(H~n1l1)

H ~ n 0 = X n = { x n , 1 , x n , 2 , . . . , x x , T } ∈ R T × D \tilde{H}_n^0 = X_n = \{x_{n,1},x_{n,2},...,x_{x,T}\}\in R^{T\times D} H~n0=Xn={xn,1,xn,2,...,xx,T}RT×D

H ~ n = H ~ n L \tilde{H}_n = \tilde{H}_n^L H~n=H~nL

其中 ∣ ∣ || ∣∣表示concat操作, W ~ Q , W ~ K , W ~ V ∈ R D × D \tilde{W}_Q,\tilde{W}_K,\tilde{W}_V \in R^{D\times D} W~Q,W~K,W~VRD×D 表示模型参数。

长期兴趣生成网络

M l ∈ R m × D M^l\in R^{m\times D} MlRm×D

m表示memory slot的个数, M l M^l Ml表示 l l l层memory 矩阵。

H ^ n l = A t t e n l ( Q ^ n l , K ^ n l , V ^ n l ) \hat{H}_n^l = Atten^l(\hat{Q}_n^l,\hat{K}_n^l,\hat{V}_n^l) H^nl=Attenl(Q^nl,K^nl,V^nl)

Q ^ n l , K ^ n l , V ^ n l = H ^ n l − 1 W ^ Q T , M l − 1 W ^ K T , M l − 1 W ^ Q T \hat{Q}_n^l,\hat{K}_n^l,\hat{V}_n^l = \hat{H}_n^{l-1}\hat{W}_Q^T,M^{l-1}\hat{W}_K^T,M^{l-1}\hat{W}_Q^T Q^nl,K^nl,V^nl=H^nl1W^QT,Ml1W^KT,Ml1W^QT

H ^ n = H ^ n L \hat{H}_n = \hat{H}_n^L H^n=H^nL

W ^ Q , W ^ K , W ^ V \hat{W}_Q,\hat{W}_K,\hat{W}_V W^Q,W^K,W^V表示模型参数。

兴趣融合网络

V n = G n ⊙ H ~ n + ( 1 − G n ) ⊙ H ^ n V_n = G_n \odot \tilde{H}_n + (1 - G_n)\odot\hat{H}_n Vn=GnH~n+(1Gn)H^n

G n = σ ( H ~ n W s h o r t + H ^ n W l o n g ) ∈ R T × D G_n = \sigma(\tilde{H}_n W_{short} + \hat{H}_n W_{long}) \in R^{T\times D} Gn=σ(H~nWshort+H^nWlong)RT×D

其中 ⊙ \odot 表示逐位乘法, W s h o r t , W l o n g ∈ R D × D W_{short},W_{long} \in R^{D\times D} Wshort,WlongRD×D表示模型参数.

长期兴趣更新网络

M l ← f a b s l ( M l , H ~ n − 1 l ) M^l \leftarrow f^l_{abs}(M^l, \tilde{H}_{n-1}^l) Mlfabsl(Ml,H~n1l)

f a b s l : R ( m + T ) × D → R m × D f_{abs}^l:R^{(m+T) \times D} \rightarrow R^{m\times D} fabsl:R(m+T)×DRm×D

f a b s l f_{abs}^l fabsl的通过胶囊网络实现:

b i j = x ‾ j W i j x i b_{ij} = \overline {x}_jW_{ij}x_i bij=xjWijxi

α i j = e x p ( b i j ) / ∑ j ′ = 1 m + T e x p ( b i j ′ ) \alpha_{ij}=exp(b_{ij})/\sum_{j'=1}^{m+T}exp(b_{ij'}) αij=exp(bij)/j=1m+Texp(bij)

s j = ∑ i = 1 m + T α i j W i j x i s_j = \sum_{i=1}^{m+T} \alpha_{ij}W_{ij}x_i sj=i=1m+TαijWijxi

x ‾ j = s q u a s h ( s j ) = ∣ ∣ s j ∣ ∣ 2 1 + ∣ ∣ s j ∣ ∣ 2 s j ∣ ∣ s j ∣ ∣ \overline{x}_j = squash(s_j) = \frac{||s_j||^2}{1+||s_j||^2}\frac{s_j}{||s_j||} xj=squash(sj)=1+∣∣sj2∣∣sj2∣∣sj∣∣sj

M = [ x ‾ 1 , x ‾ 2 , . . . , x ‾ m ] M = [\overline{x}_1,\overline{x}_2,...,\overline{x}_m] M=[x1,x2,...,xm]

f a b s l f_{abs}^l fabsl辅助训练loss:

L a e = ∑ l = 1 L ∣ ∣ a t t e n t r e c l ( Q ~ l , K ~ l , V ~ ) l − a t t e n t r e c l ( Q ~ l , K ^ l , V ^ l ) ∣ ∣ F 2 \mathcal{L}_{ae} = \sum_{l=1}^L ||attent_{rec}^l(\tilde{Q}^l,\tilde{K}^l,\tilde{V})^l - attent_{rec}^l(\tilde{Q}^l,\hat{K}^l,\hat{V}^l)||_F^2 Lae=l=1L∣∣attentrecl(Q~l,K~l,V~)lattentrecl(Q~l,K^l,V^l)F2

Q ~ l = H ~ n l , K ~ n l = V ~ n l = M l ∣ ∣ H ~ n − 1 l , K ^ l = V ^ l = M l \tilde{Q}^l = \tilde{H}_n^l, \tilde{K}_n^l = \tilde{V}_n^l = M^l || \tilde{H}_{n-1}^l ,\hat{K}^l=\hat{V}^l = M^l Q~l=H~nl,K~nl=V~nl=Ml∣∣H~n1l,K^l=V^l=Ml

loss 函数

L l i k e = − ∑ u ∈ U ∑ t ∈ S n l o g e x p ( x t T V n , t ) ∑ j ∈ V e x p ( x j T V n , t ) \mathcal{L}_{like} = -\sum_{u\in U}\sum_{t\in S_n}log\frac{exp(x_t^TV_{}n,t)}{\sum_{j\in V}exp(x_j^TV_{n,t})} Llike=uUtSnlogjVexp(xjTVn,t)exp(xtTVn,t)

L = L l i k e + L a e \mathcal{L} = \mathcal{L}_{like} + \mathcal{L}_{ae} L=Llike+Lae

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值