论文《Self-Supervised Hypergraph Convolutional Networks for Session-based Recommendation》阅读

论文《Self-Supervised Hypergraph Convolutional Networks for Session-based Recommendation》阅读

论文概况

本文是2021年AAAI上的一篇论文,该篇文章通过超图卷积网络来解决会话推荐问题

Introduction

作者提出了几个问题

  • RNN缺点:用户行为并没有严格的时间顺序,比如用户可以选择随机播放专辑或按顺序播放,但是这两种播放模式都是播放的同一专辑的音乐。(dignetica、yoochoose等数据集中,每个会话中项目间隔确实很小)
  • GNN缺点:忽略了基于会话的数据中复杂的项目相关性

对于上述问题,作者提出DHCN (Dual Channel Hypergraph Convolutional Networks):(1)获取项目之间的关系和跨会话信息(2)将自我监督任务集成到网络训练中,以增强超图建模并改进推荐任务

Method

在这里插入图片描述

A.Notations and Definitions

超图定义G = (V,E):V为会话内N个项目集合,E为M个超边集合。每个超边都有一个权重,构成一个MxM的对角矩阵Wϵϵ​。超图可以用一个NxM的H矩阵来表示,H矩阵中每一列代表了一个超边的节点构成。节点和超边的度分别表示为: D i i = ∑ ϵ = 1 M W ϵ ϵ H i ϵ , B ϵ ϵ = ∑ i = 1 N H i ϵ D_{i i}=\sum_{\epsilon=1}^{M} W_{\epsilon \epsilon} H_{i \epsilon}, B_{\epsilon \epsilon}=\sum_{i=1}^{N} H_{i \epsilon} Dii=ϵ=1MWϵϵHiϵ,Bϵϵ=i=1NHiϵ

线超图定义L(G)=(VL​,EL​):将每一个超边作为节点构成的图,其中VL​={ve​:ve​∈E},EL​={(vep,veq​​):ep​,eq∈E,∣ep​∩eq​∣≥1}。给每一个边分配权重W,W为相连的两个超边共同包含的集合数除两个超边包含的所有节点集合的阶。

超图结构:在超图中,一个会话中的所有物品由一个超边连接,且会话内物品都是互相连接的。

B. Hypergraph Convolutional Network

超图卷积定义为:
x i ( l + 1 ) = ∑ j = 1 N ∑ ϵ = 1 M H i ϵ H j ϵ W ϵ ϵ x j ( l ) (1) x_{i}^{(l+1)}=\sum_{j=1}^{N} \sum_{\epsilon=1}^{M} H_{i \epsilon} H_{j \epsilon} W_{\epsilon \epsilon} x_{j}^{(l)}\tag{1} xi(l+1)=j=1Nϵ=1MHiϵHjϵWϵϵxj(l)(1)
W超边权重设置为1,在行规范化之后(1)转化为:
X h ( l + 1 ) = D − 1 H W B − 1 H T X h ( l ) (2) X_{h}^{(l+1)}=D^{-1} H W B^{-1} H^{T} X_{h}^{(l)}\tag{2} Xh(l+1)=D1HWB1HTXh(l)(2)
在L层超图卷积后,对每一层获得的项目嵌入进行平均,得到最终的项目嵌入
X h = 1 L + 1 ∑ l = 0 L X h ( l ) (3) X_{h}=\frac{1}{L+1} \sum_{l=0}^{L} X_{h}^{(l)}\tag{3} Xh=L+11l=0LXh(l)(3)
为了提高最终的推荐结果,作者在这里还是加入了位置信息(啪)
通过可学习位置矩阵 Pr = [ p1 , p2 , p3 , … … , pm ] 来整合反向位置嵌入和学习项目表示,其中m是当前会话的长度。会话S的第t项嵌入为:
x t ∗ = tanh ⁡ ( W 1 [ x t ∥ p m − i + 1 ] + b ) (4) x_{t}^{*}=\tanh \left(W_{1}\left[x_{t} \| p_{m-i+1}\right]+b\right)\tag{4} xt=tanh(W1[xtpmi+1]+b)(4)
在得到最终项目嵌入后,利用SRGNN的方法,将这个会话项目嵌入的均值作为查询向量来计算最终的会话嵌入
α t = f ⊤ σ ( W 2 x s ∗ + W 3 x l ∗ + c ) , θ h = ∑ t = 1 m α t x t ∗ (5) \alpha_{t}=\mathbf{f}^{\top} \sigma\left(\mathbf{W}_{2} \mathbf{x}_{s}^{*}+\mathbf{W}_{3} \mathbf{x}_{l}^{*}+\mathbf{c}\right), \theta_{h}=\sum_{t=1}^{m} \alpha_{t} \mathbf{x}_{t}^{*}\tag{5} αt=fσ(W2xs+W3xl+c),θh=t=1mαtxt(5)
之后是 E i j E_{i j} Eij计算,包含聚簇注意力和查询感知注意力
聚簇注意力代表当前节点与邻居节点的相似度, h ⃗ i c \vec{h}_{i_{c}} h ic是k-hop内的邻居向量的平均值
α t = f ⊤ σ ( W 2 x s ∗ + W 3 x l ∗ + c ) , θ h = ∑ t = 1 m α t x t ∗ (6) \alpha_{t}=\mathbf{f}^{\top} \sigma\left(\mathbf{W}_{2} \mathbf{x}_{s}^{*}+\mathbf{W}_{3} \mathbf{x}_{l}^{*}+\mathbf{c}\right), \theta_{h}=\sum_{t=1}^{m} \alpha_{t} \mathbf{x}_{t}^{*}\tag{6} αt=fσ(W2xs+W3xl+c),θh=t=1mαtxt(6)
然后通过点积和softmax来进行推荐,使用交叉熵作为损失函数
L r = − ∑ i = 1 N y i log ⁡ ( y ^ i ) + ( 1 − y i ) log ⁡ ( 1 − y ^ i ) (7) \mathcal{L}_{r}=-\sum_{i=1}^{N} \mathbf{y}_{i} \log \left(\hat{\mathbf{y}}_{i}\right)+\left(1-\mathbf{y}_{i}\right) \log \left(1-\hat{\mathbf{y}}_{i}\right)\tag{7} Lr=i=1Nyilog(y^i)+(1yi)log(1y^i)(7)

C.Enhancing SBR with Self-Supervised Learning

在这部分,作者使用线超图作自监督学习,因为线超图和超图得到的会话嵌入应该是相似的。
每一个线超图节点的初始嵌入 Θ l ( 0 ) \Theta_{l}^{(0)} Θl(0)是构成该节点的所有项目的初始嵌入的均值
L(G)的关联矩阵是一个MxM的矩阵A,M是线图中的节点数。

A ^ = A + I (8) \hat{A}=A+I\tag{8} A^=A+I(8)
D ^ p , p = ∑ q = 1 m A ^ p , q (9) \hat{D}_{p, p}=\sum_{q=1}^{m} \hat{A}_{p, q}\tag{9} D^p,p=q=1mA^p,q(9)
线图卷积定义为:
Θ l ( l + 1 ) = D ^ − 1 A ^ Θ ( l ) (10) \Theta_{l}^{(l+1)}=\hat{D}^{-1} \hat{A} \Theta^{(l)}\tag{10} Θl(l+1)=D^1A^Θ(l)(10)
获取了跨会话信息
最终会话嵌入同样是对L轮的结果取均值
Θ l = 1 L + 1 ∑ l = 0 L Θ l ( l ) (11) \Theta_{l}=\frac{1}{L+1} \sum_{l=0}^{L} \Theta_{l}^{(l)}\tag{11} Θl=L+11l=0LΘl(l)(11)
我们利用fD函数来计算两个会话嵌入的相似度,并以此来获得损失函数
L s = − log ⁡ σ ( f D ( θ i h , θ i l ) ) − log ⁡ σ ( 1 − f D ( θ ~ i h , θ i l ) ) (12) \mathcal{L}_{s}=-\log \sigma\left(f_{\mathrm{D}}\left(\theta_{i}^{h}, \theta_{i}^{l}\right)\right)-\log \sigma\left(1-f_{\mathrm{D}}\left(\tilde{\theta}_{i}^{h}, \theta_{i}^{l}\right)\right)\tag{12} Ls=logσ(fD(θih,θil))logσ(1fD(θ~ih,θil))(12)
最终,联合损失函数设计为:L=Lr​+βLs

总结

作者将自监督学习融入会话推荐中,超图的设计也很不错。但仔细一看之后,我发现这个超图的计算和普通的全局图没啥区别,主要还是为了引出线超图。其次,一个超图是将好多会话结合到一起,那一个图的节点数量可能就有点多了。最后,作者也用自己的论文告诉我们,会话推荐确实和项目顺序有关。

Semi-supervised classification with graph convolutional networks (GCNs) is a method for predicting labels for nodes in a graph. GCNs are a type of neural network that operates on graph-structured data, where each node in the graph represents an entity (such as a person, a product, or a webpage) and edges represent relationships between entities. The semi-supervised classification problem arises when we have a graph where only a small subset of nodes have labels, and we want to predict the labels of the remaining nodes. GCNs can be used to solve this problem by learning to propagate information through the graph, using the labeled nodes as anchors. The key idea behind GCNs is to use a graph convolution operation to aggregate information from a node's neighbors, and then use this aggregated information to update the node's representation. This operation is then repeated over multiple layers, allowing the network to capture increasingly complex relationships between nodes. To train a GCN for semi-supervised classification, we use a combination of labeled and unlabeled nodes as input, and optimize a loss function that encourages the network to correctly predict the labels of the labeled nodes while also encouraging the network to produce smooth predictions across the graph. Overall, semi-supervised classification with GCNs is a powerful and flexible method for predicting labels on graph-structured data, and has been successfully applied to a wide range of applications including social network analysis, drug discovery, and recommendation systems.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值