KGAT:Knowledge Graph Attention Network for Recommendation
Paper address:Knowledge Graph Attention Network for Recommendation
Contribution
提出了将具有辅助信息的实体融入图谱的CKG,引入注意力机制完成预测
Method
分为Embedding layer,Attentive Embedding layer和Prediction layer三个主要结构
-
Embedding layer
利用TransR,目标是给定三元组 ( h , r , t ) (h,r,t) (h,r,t)满足 e h r + e r ≈ e t r \mathbf{e}_h^r+\mathbf{e}_r\approx\mathbf{e}_t^r ehr+er≈etr的假设,具体表达为 L K G = ∑ ( h , r , t , t ′ ) ∈ T − ln σ ( g ( h , r , t ′ ) − g ( h , r , t ) ) \mathcal{L}_{\mathrm{KG}}=\sum_{(h,r,t,t^{\prime})\in\mathcal{T}}-\ln\sigma\Big(g(h,r,t^{\prime})-g(h,r,t)\Big) LKG=∑(h,r,t,t′)∈T−lnσ(g(h,r,t′)−g(h,r,t))其中 σ \sigma σ为sigmoid函数, g ( h , r , t ) = ∥ W r e h + e r − W r e t ∥ 2 2 g(h,r,t)=\|\mathbf{W}_r\mathbf{e}_h+\mathbf{e}_r-\mathbf{W}_r\mathbf{e}_t\|_2^2 g(h,r,t)=∥Wreh+er−Wret∥22, g ( h , r , t ) g(h,r,t) g(h,r,t)越小则越符合上述假设即更真实,反之亦然; t ′ t^\prime t′为随机任取实体 -
Attentive layer
包含三个组件information propagation, knowledge-aware attention和information aggregation-
information propagation
e N h = ∑ ( h , r , t ) ∈ N h π ( h , r , t ) e t \mathbf{e}_{\mathcal{N}_h}=\sum_{(h,r,t)\in\mathcal{N}_h}\pi(h,r,t)\mathbf{e}_t eNh=∑(h,r,t)∈Nhπ(h,r,t)et表示所有以 h h h为起始的三元组传播表示, π ( h , r , t ) \pi(h,r,t) π(h,r,t)表示信息传播系数
-
knowledge-aware attention
π ( h , r , t ) = ( W r e t ) T tanh ( ( W r e h + e r ) ) \pi(h,r,t)=(\mathbf{W}_r\mathbf{e}_t)^{\mathsf{T}}\tanh\Bigl((\mathbf{W}_r\mathbf{e}_h+\mathbf{e}_r)\Bigr) π(h,r,t)=(Wret)Ttanh((Wreh+er)),直观上,相似的节点之间信息交换更多,之后在以 h h h为首的组内做归一化,表示为 π ( h , r , t ) = exp ( π ( h , r , t ) ) ∑ ( h , r ′ , t ′ ) ∈ N h exp ( π ( h , r ′ , t ′ ) ) \pi(h,r,t)=\frac{\exp(\pi(h,r,t))}{\sum_{(h,r^{\prime},t^{\prime})\in\mathcal{N}_h}\exp(\pi(h,r^{\prime},t^{\prime}))} π(h,r,t)=∑(h,r′,t′)∈Nhexp(π(h,r′,t′))exp(π(h,r,t))
-
information aggregation
信息聚合表示为 e h ( 1 ) = f ( e h , e N h ) \mathbf{e}_h^{(1)}=f(\mathbf{e}_h,\mathbf{e}_{\mathcal{N}_h}) eh(1)=f(eh,eNh)
分为三种聚合方法:GCN,GraphSAGE和Hybrid
GCN aggregate: f G C N = LeakyReLU ( W ( e h + e N h ) ) f_{\mathrm{GCN}}=\text{LeakyReLU}\left(\mathbf{W}(\mathbf{e}_h+\mathbf{e}_{\mathcal{N}_h})\right) fGCN=LeakyReLU(W(eh+eNh))
GraphSAGE aggregate: f GraphSage = LeakyReLU ( W ( e h ∣ ∣ e N h ) ) f_\text{GraphSage }{ =\text{LeakyReLU}\left(\mathbf{W}(\mathbf{e}_h||\mathbf{e}_{\mathcal{N}_h})\right)} fGraphSage =LeakyReLU(W(eh∣∣eNh))
Hybrid aggregate: f Bi-Interaction = LeakyReLU ( W 1 ( e h + e N h ) ) + LeakyReLU ( W 2 ( e h ⊙ e N h ) ) \begin{aligned} f_\text{Bi-Interaction }=& \text{LeakyReLU}\Big(\mathbf{W}_1(\mathbf{e}_h+\mathbf{e}_{\mathcal{N}_h})\Big)+\text{LeakyReLU}\Big(\mathbf{W}_2(\mathbf{e}_h\odot\mathbf{e}_{\mathcal{N}_h})\Big) \end{aligned} fBi-Interaction =LeakyReLU(W1(eh+eNh))+LeakyReLU(W2(eh⊙eNh))
其中 W ( ⋅ ) W(\cdot) W(⋅)为点乘可学习矩阵, W ∈ R d ′ × d \mathbf{W}\in\mathbb{R}^{d^{\prime}\times d} W∈Rd′×d, d ′ d^\prime d′为变换维度, ⊙ \odot ⊙为哈达玛积
-
-
Prediction layer
经过L层Attentive layer获得User Embedding和Item Embedding各L+1个,评分 y ^ ( u , i ) = e u ∗ ⊤ e i ∗ \hat{y}(u,i)=\mathbf{e}_u^{*\top}\mathbf{e}_i^* y^(u,i)=eu∗⊤ei∗其中
||为concat操作
e u ∗ = e u ( 0 ) ∥ ⋯ ∥ e u ( L ) , e i ∗ = e i ( 0 ) ∥ ⋯ ∥ e i ( L ) \mathbf{e}_u^*=\mathbf{e}_u^{(0)}\|\cdots\|\mathbf{e}_u^{(L)},\quad\mathbf{e}_i^*=\mathbf{e}_i^{(0)}\|\cdots\|\mathbf{e}_i^{(L)} eu∗=eu(0)∥⋯∥eu(L),ei∗=ei(0)∥⋯∥ei(L)
optimization
L C F = ∑ ( u , i , j ) ∈ O − ln σ ( y ^ ( u , i ) − y ^ ( u , j ) ) \mathcal{L}_{\mathrm{CF}}=\sum_{(u,i,j)\in O}-\ln\sigma\left(\hat{y}(u,i)-\hat{y}(u,j)\right) LCF=∑(u,i,j)∈O−lnσ(y^(u,i)−y^(u,j))其中 O = { ( u , i , j ) ∣ ( u , i ) ∈ R + , ( u , j ) ∈ R − } O=\{(u,i,j)|(u,i)\in\mathbb{R}^+,(u,j)\in\mathbb{R}^-\} O={(u,i,j)∣(u,i)∈R+,(u,j)∈R−}分别对应已交互和未交互集
总损失为 L K G A T = L K G + L C F + λ ∥ Θ ∥ 2 2 \mathcal{L}_{\mathrm{KGAT}}=\mathcal{L}_{\mathrm{KG}}+\mathcal{L}_{\mathrm{CF}}+\lambda\left\|\Theta\right\|_{2}^{2} LKGAT=LKG+LCF+λ∥Θ∥22其中 Θ = { E , W r , ∀ l ∈ R , W 1 ( l ) , W 2 ( l ) , ∀ l ∈ { 1 , ⋯ , L } } \begin{aligned}\Theta=\{\mathrm{E},\mathbf{W}_r,\forall l\in\mathcal{R},\mathbf{W}_1^{(l)},\mathbf{W}_2^{(l)},\forall l\in\{1,\cdots,L\}\}\end{aligned} Θ={E,Wr,∀l∈R,W1(l),W2(l),∀l∈{1,⋯,L}}目的是防止过拟合