[论文精读]Hypergraph Neural Networks

夏莉莉iy

于 2024-05-23 11:19:06 发布

阅读量1.1k

点赞数 29

分类专栏：论文精读文章标签：深度学习人工智能计算机视觉 embedding 笔记学习机器学习

本文链接：https://blog.csdn.net/sherlily/article/details/139125268

版权

论文精读专栏收录该内容

57 篇文章 8 订阅

订阅专栏

论文网址：Hypergraph neural networks | Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence (acm.org)

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

1. 省流版

1.1. 心得

（1）没有心得是最好的心得，所以一阶近似就是和GCN一样嘛。不提供代码说个der

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

①HCNN aims at representing high order relationships between data

②Applicable to multi-modal data and performs excellently

2.2. Introduction

①Hypergraph structure in social media:

②Comparison between graph and hypergraph:

2.3. Related Work

2.3.1. Hypergraph learning

①The transductive inference on hypergraph focuses on minimizing the difference between strong connected nodes（有个小问题就是现在如果超图本来就是基于相关性强才构建的...可能那些节点本身就很相似了）

2.3.2. Neural networks on graph

①Introducing related works on spectral and spatial domain

2.4. Hypergraph Neural Networks

2.4.1. Hypergraph learning statement

①They define a hypergraph as $G=\left ( V, E,W \right )$ ， where $W$ is a diagonal matrix that each element denotes a weight of hyper edge

②The incidence matrix $H$ can be constructed by:

$h(v,e)=\left\{\begin{array}{cc}1,&\text{if}\, v\in e\\0,&\text{if} \, v\not\in e\end{array}\right.$

这里Incidence matrix应该都是不带权的，计算边缘度和顶点度直接加个数就行

③Denoting $\textbf{D}_e$ and $\textbf{D}_v$ combined with the diagonal $\delta(e)$ and $d\left ( v \right )$ respectively

④The vertex label should be smooth (regularized):

$\arg\min_{f}\left\{\mathcal{R}_{emp}(f)+\Omega(f)\right\}$

where $\mathcal{R}_{emp}(f)$ denotes the supervised empirical loss, $\Omega(f)$ denotes the rigularize, and $f$ denotes the classification function

⑤The $\Omega(f)$ can be calculated by:

$\begin{gathered} \Omega(f)= \frac{1}{2}\sum_{e\in\mathcal{E}}\sum_{\{u,v\}\in\mathcal{V}}\frac{w(e)h(u,e)h(v,e)}{\delta(e)} \Big(\frac{f(u)}{\sqrt{d(u)}}-\frac{f(v)}{\sqrt{d(v)}}\Big)^{2}, \end{gathered}$

⑥ $\theta=\mathbf{D}_{v}^{-1/2}\mathbf{H}\mathbf{W}\mathbf{D}_{e}^{-1}\mathbf{H}^{\top}\mathbf{D}_{v}^{-1/2}$ and $\Delta=\mathbf{I}-\Theta$

⑦所以可以把 $\Omega(f)$ 写成！！！？？：

$\Omega(f)=f^{\top}\Delta$

我没去推诶。

where $\Delta$ is positive semi-definite, and usually called the hypergraph Laplacian

2.4.2. Spectral convolution on hypergraph

①Updating the hypergraph: $G=\left ( V,E,\Delta \right )$

②Eigen decomposition: $\Delta=\Phi\Lambda\Phi^{\top}$ where $\Phi =diag\left ( \phi _1,...,\phi_n \right )$ contains the eigen vectors and $\Lambda =diag\left ( \lambda _1,...,\lambda _n \right )$ contains eigen values

③Changing the original singal $x=\left ( x_1,...,x_n \right )$ to $\hat{x}=\Phi ^Tx$ , where $\Phi ^T$ is the Fourier base

④Spectral convolution with filer $g$ :

$\mathbf{g} \star \mathbf{x}=\mathbf{\Phi}\left(\left(\boldsymbol{\Phi}^{\top} \mathbf{g}\right) \odot\left(\boldsymbol{\Phi}^{\top} \mathbf{x}\right)\right)=\mathbf{\Phi} g(\boldsymbol{\Lambda}) \boldsymbol{\Phi}^{\top} \mathbf{x}$

where $g\left ( \Lambda \right )=diag\left ( \mathbf{g}\left ( \lambda _1 \right ),..., \mathbf{g}\left ( \lambda _n \right )\right )$ is Fourier coefficients

⑤They use 1 order approximate by Chebyshev of Fourier, then update convolution:

$\mathbf{g} \star \mathbf{x}\approx \sum_{k=0}^{K}\theta _kT_k\left ( \hat{\Delta } \right )x\\ \approx \theta_0x-\theta_1\mathbf{D}_{v}^{-1/2}\mathbf{H}\mathbf{W}\mathbf{D}_{e}^{-1}\mathbf{H}^{\top}\mathbf{D}_{v}^{-1/2}x$

where $\theta _0$ and $\theta _1$ are parameters of filters

⑥They transform them to（这参数可以纯自己设计的啊？）:

$\left\{\begin{matrix} \theta_1=-\frac{1}{2}\theta\\ \theta_0=\frac{1}{2}\theta \mathbf{D}_{v}^{-1/2}\mathbf{H}\mathbf{D}_{e}^{-1}\mathbf{H}^{\top}\mathbf{D}_{v}^{-1/2}\end{matrix}\right.$

⑦The convolution will be:

$\begin{gathered} \mathbf{g}\star\mathbf{x} \approx{\frac{1}{2}}\theta\mathbf{D}_{v}^{-1/2}\mathbf{H}(\mathbf{W}+\mathbf{I})\mathbf{D}_{e}^{-1}\mathbf{H}^{\top}\mathbf{D}_{v}^{-1/2}\mathbf{x} \\ \approx\theta\mathbf{D}_{v}^{-1/2}\mathbf{H}\mathbf{W}\mathbf{D}_{e}^{-1}\mathbf{H}^{\top}\mathbf{D}_{v}^{-1/2}\mathbf{x}, \end{gathered}$

（作者说 $W$ 最开始就是 $I$ 那还要每一层都加个 $I$ 啊？感觉最开始加一下就可以了后面再加会不会自环环多了啊。噢，第一层是为了和 $I$ 叠起来把1/2系数消了是吧）

Thus the final convolution function can be:

$\mathrm{Y=D_v^{-1/2}HWD_e^{-1}H^{\top}D_v^{-1/2}X\Theta}$

where $\mathbf{W}=\mathrm{diag}(\mathbf{w_{1}},\ldots,\mathbf{w_{n}})$ and $\Theta\in\mathbb{R}^{C_1\times C_2}$

2.4.3. Hypergraph neural networks analysis

①Process of HGNN:

②Convolution layer:

$\mathbf{X}^{(l+1)}=\sigma(\mathbf{D}_v^{-1/2}\mathbf{H}\mathbf{W}\mathbf{D}_e^{-1}\mathbf{H}^{\top}\mathbf{D}_v^{-1/2}\mathbf{X}^{(l)}\mathbf{\Theta}^{(l)})$

([v,f2]=[v,v]×[v,e]×[e,e]×[e,e]×[e,v]×[v,v]×[v,f]×[f,f2])

where $\sigma$ denotes the nonlinear activation function

③The details of convolution:

2.4.4. Implementation

①Hypergraph construction: They construct the hypergraph by defining the most similar vertex. For each node, they find $K$ nearest neighbors, which means each hyperedge connects $K+1$ node. And there is finally $N$ nodes and $N$ hyperedges, $\textbf{H}\in \mathbb{R}^{N \times N}$