目录
SDM简述
CIKM ’19, 阿里巴巴的一项推荐工作。
贡献
在已有的 sequence-based 工作基础上,解决两个问题:
- session 中存在 multiple interest tendencies
- long-term behaviors are various and complex. 为此设计 long-short term gated 作长短期兴趣融合
网络结构
user profile preference
使用 e u = c o n c a t ( { e u p ∣ p ∈ P } ) e_u=concat(\{e_u^p|p\in P\}) eu=concat({eup∣p∈P}) 表达用户向量。where P = { a g e , g e n d e r , l i f e _ s t a g e } P=\{age,gender,life\_stage\} P={age,gender,life_stage}.
short-term preference
- 使用 e i = c o n c a t ( { e i f ∣ f ∈ F } ) e_i=concat(\{e_i^f|f\in F\}) ei=concat({eif∣f∈F}) 表达商品向量。where F = { i d , c a t e _ f i r s t _ l e v e l , c a t e _ l e a f _ l e v e l , s h o p , b r a n d } F=\{id,cate\_first\_level,cate\_leaf\_level,shop,brand\} F={id,cate_first_level,cate_leaf_level,shop,brand}. due to the sparsity caused by the large-scale items, encoding items only by item_id is far from satisfaction.
- 送往LSTM,得到 [ h 1 u , . . . , h t u ] [h^u_1,... ,h^u_t] [h1u,...,htu]
- 送往self_att, 得到 [ h ^ 1 u , . . . , h ^ t u ] [\hat h^u_1,... ,\hat h^u_t] [h^1u,...,h^tu]
- user_attention,权重由 s o f t m a x ( < e u , h i u > ) softmax(<e_u,h^u_i>) softmax(<eu,hiu>)得到,不再像self_att那样先线性投影 Q,K 空间再点积相乘。该步得到短期偏好 s t u s^u_t stu
long-term preference
- L u = { L f u ∣ f ∈ F } , l k u ∈ L f u , l k u ∈ R d L^u=\{L_f^u|f\in F\},l^u_k\in L^u_f, l^u_k\in R^d Lu={Lfu∣f∈F},lku∈Lfu,lku∈Rd, F 为field的集合,同上。 L f u L^u_f Lfu 为某个field的偏好list,同一field共享embedding 矩阵。
- z f u = u s e r _ a t t e n t i o n ( e u , L f u ) ∈ R d z^u_f =user\_attention(e_u,L^u_f)\in R^d zfu=user_attention(eu,Lfu)∈Rd , 起到 pooling 作用。
- z u = c o n c a t ( { z f u ∣ f ∈ F } ) z^u=concat(\{z^u_f|f\in F\}) zu=concat({zfu∣f∈F}),得到长期偏好 p u = t a n h ( W z u + b ) p^u=tanh(Wz^u+b) pu=tanh(Wzu+b).
long-short term fusion gate
“we elaborately design a gated neural network”,
G
u
=
σ
(
W
1
e
u
+
W
2
s
t
u
+
W
3
p
u
+
b
)
,
G
u
∈
R
d
G^u=\sigma(W_1e_u+W_2s^u_t+W_3p^u+b),G^u\in R^d
Gu=σ(W1eu+W2stu+W3pu+b),Gu∈Rd,该gate用来控制短期兴趣的占比。
⊙
\odot
⊙为element-wise multiplication,进一步得到
o
t
u
=
G
u
⊙
s
t
u
+
(
1
−
G
u
)
⊙
p
u
o^u_t=G^u\odot s^u_t+(1-G^u)\odot p^u
otu=Gu⊙stu+(1−Gu)⊙pu用于召回。
candidate matching
s c o r e ( i t e m i ) = < o t u , v i > , s c o r e ( i t e m i ) ∈ R , v i ∈ V score(item_i)=<o^u_t,v_i>,score(item_i)\in R,v_i\in V score(itemi)=<otu,vi>,score(itemi)∈R,vi∈V, V V V是另一个item emb矩阵。
数据集
用人家JD数据集,有意思吧。
模型对比及ablation study
- SDMMA. Sequential Deep Matching with Multi-head Attention is our multi-head self-attention enhanced model.
- PSDMMA. Personalized SDMMA adds user attention module to mine fine-grained personalized information.
- PSDMMAL. PSDMMA combines representations of shortterm sessions and Long-term behaviors.
- PSDMMAL-N. Based on PSDMMAL, during training, we take the following N items as target classes as Tang and Wang [24] does at the current time step. N = 5 in this experiment.
- PSDMMAL-NoS. PSDMMAL does Not contain the embeddings of Side information in short-term sessions and long term behaviors except for the ID feature of item and user.
小结
以下指标指相对提升。
- 添加 user_profile, + 1.7 % 1.7\% 1.7%
- 添加 side_info, + 8 % 8\% 8%
- 添加 长期兴趣融合, + 1.8 % 1.8\% 1.8% .
- following N as target。 在淘宝数据集上取得sota,上线也是用的这一版,pCTR+7%;pGMV+4%;discovery+24%.
- gate 设计。 SHAN 方法也用了user_profile与长期兴趣,但做法是att(query=user_profile,keys=[long_interest,short_interest]),这样的设计不如 gate 有表现力,更好的捕捉correlation,相互关系。
官方代码
代码参考性太差,作者称"只是核心代码,数据处理看不到,因为用的内部odps,故而删去"。 但很多核心方法也找不到定义,参考性差.