[论文阅读笔记]2019_AAAI_Hypergraph neural networks
论文下载地址: https://deepai.org/publication/hypergraph-neural-networks
发表期刊:AAAI
Publish time: 2019
作者及单位:
- Yifan Feng,1
- Haoxuan You,3
- Zizhao Zhang,3
- Rongrong Ji,1,2
- Yue Gao3∗
- 1Fujian Key Laboratory of Sensing and Computing for Smart City, Department of Congnitive Science
- School of Information Science and Engineering, Xiamen University, 361005, China
- 2Peng Cheng Laboratory, China
- 3BNRist, KLISS, School of Software, Tsinghua University, 100084, China.
- {evanfeng97, haoxuanyou}@gmail.com, rrji@xmu.edu.cn, {zz-z14,gaoyue}@tsinghua.edu.cn
数据集:
- Citation Network classification
- Cora
- Pumbed ( (Sen et al. 2008) )
- Visual object classification
- ModelNet40 :
- Princeton ModelNet40 dataset (Wu et al. 2015)
- is based on single modality feature
- NTU
- the National Taiwan University (NTU) 3D model dataset (Chen et al. 2003)
- is based on multi-modality feature.
- ModelNet40 :
代码:
其他人写的文章
- 论文笔记:AAAI 2019 Hypergraph Neural Networks
- Hypergraph Neural Networks超图神经网络
- Hypergraph Neural Network, AAAI, 2019.
- Hypergraph Neural Networks HGNN
- 【论文阅读】Hypergraph Neural Networks
简要概括创新点:
- (1) CV领域,已经在用hypergraph, 并且图论中的数据结构中,推荐领域确实该探索hypergraph了。
- (2) In this paper, we propose a framework of hypergraph neural networks (HGNN).
- In this method, HGNN generalizes the convolution operation to the hypergraph learning process. The convolution on spectral domain is conducted with hypergraph Laplacian and further approximated by truncated chebyshev polynomials.
(在该方法中,HGNN将卷积运算推广到超图学习过程。利用超图拉普拉斯算子进行谱域卷积,并进一步用截断切比雪夫多项式逼近。)- (3) HGNN可以探索 complex的,multi-modal, high-order的connection(realtion)
- (4) HGNN了,肯定本文提出了hyperedge convolution operation来进行实现
- (5) 包括做实验,也是用了CV的。we have conducted experiments on citation network classification and visual object recognition tasks
Abstract
- In this paper, we present a hypergraph neural networks (HGNN) framework for data representation learning, which can encode high-order data correlation in a hypergraph structure. (在本文中提出了一个用于数据表示学习的超图神经网络(HGNN) 框架,它可以在超图结构中编码高阶数据相关性)
- Confronting the challenges of learning representation for complex data in real practice, we propose to incorporate such data structure in a hypergraph, which is more flexible on data modeling, especially when dealing with complex data. (面对在实践中学习复杂数据表示的挑战,特别是在处理复杂数据时,超图在数据建模方面更加灵活)
- In this method, a hyperedge convolution operation is designed to handle the data correlation during representation learning. In this way, traditional hypergraph learning procedure can be conducted using hyperedge convolution operations efficiently. (该方法设计了超边卷积运算来处理表示学习过程中的数据相关性。通过这种方法,可以有效地利用超边卷积运算来进行传统的超图学习)
- HGNN is able to learn the hidden layer representation considering the high-order data structure, which is a general framework considering the complex data correlations.
- We have conducted experiments on citation network classification and visual object recognition tasks and compared HGNN with graph convolutional networks and other traditional methods. Experimental results demonstrate that the proposed HGNN method outperforms recent state-of-the-art methods. We can also reveal from the results that the proposed HGNN is superior when dealing with multi-modal data compared with existing methods.
Introduction
- (1) Graph-based convolutional neural networks (Kipf and Welling 2017), (Defferrard, Bresson, and Vandergheynst 2016) have attracted much attention in recent years. Different from traditional convolutional neural networks, graph convolution is able to encode the graph structure of different input data using a neural network model and it can be used in the semi-supervised learning procedure. Graph convolutional neural networks have shown superiority on representation learning compared with traditional neural networks due to its ability of using data graph structure. (基于图的卷积神经网络(Kipf和Welling,2017),(Defferard、Bresson和Vandergheynst,2016)近年来备受关注。与传统的卷积神经网络不同,图卷积能够利用神经网络模型对不同输入数据的图结构进行编码,可以用于半监督学习过程。与传统神经网络相比,图卷积神经网络具有利用数据图结构的能力,在表示学习方面显示出了优越性。)
- (2) In traditional graph convolutional neural network methods, the pairwise connections among data are employed. It is noted that the data structure in real practice could be beyond pairwise connections and even far more complicated. Confronting the scenarios with multi-modal data, the situation for data correlation modelling could be more complex. (在传统的图卷积神经网络方法中,数据之间采用了成对连接。值得注意的是,在实际应用中,数据结构可能超越成对连接,甚至更为复杂。面对具有多模态数据的情景,数据关联建模的情况可能更加复杂。)
- (3) Figure 1 provides examples of complex connections on social media data. (图1提供了社交媒体数据的复杂连接示例)
- On one hand, the data correlation can be more complex than pairwise relationship, which is difficult to be modeled by a graph structure. (一方面,数据关联可能比成对关系更复杂,而成对关系很难用图结构建模)
- On the other hand, the data representation tends to be multi-modal, such as the visual connections, text connections and social connections in this example. (另一方面,数据表示往往是多模态的,例如本例中的视觉连接、文本连接和社会连接)
- Under such circumstances, traditional graph structure has the limitation to formulate the data correlation, which limits the application of graph convolutional neural networks. (在这种情况下,传统的图结构在建立数据相关性方面存在局限性,从而限制了图卷积神经网络的应用)
- Under such circumstance, it is important and urgent to further investigate better and more general data structure model to learn representation. (在这种情况下,进一步研究更好、更通用的数据结构模型来学习表示就显得尤为重要和迫切)
- (4) To tackle this challenging issue, in this paper, we propose a hypergraph neural networks (HGNN) framework, which uses the hypergraph structure for data modeling. (为了解决这个具有挑战性的问题,在本文中,我们提出了一个超图神经网络(HGNN)框架,它使用超图结构进行数据建模)
- Compared with simple graph, on which the degree for all edges is mandatory 2, a hypergraph can encode high-order data correlation (beyond pairwise connections) using its degree-free hyperedges, as shown in Figure 2. (与所有边的阶数都必须为2的简单图相比,超图可以使用其无阶超边对高阶数据相关性(除了成对连接)进行编码,如图2所示)
- In Figure 2, the graph is represented using the adjacency matrix, in which each edge connects just two vertices. On the contrary, a hypergraph is easy to be expanded for multi-modal and heterogeneous data representation using its flexible hyperedges. (在图2中,图形使用邻接矩阵表示,其中每条边仅连接两个顶点。相反,利用超图灵活的超边,超图很容易扩展为多模态和异构数据表示)
- For example, a hypergraph can jointly employ multi-modal data for hypergraph generation by combining the adjacency matrix, as illustrated in Figure 2. (例如,通过组合邻接矩阵,超图可以联合使用多模态数据来生成超图,如图2所示。)
- Therefore, hypergraph has been employed in many computer vision tasks such as classification and retrieval tasks (Gao et al. 2012). (因此,超图已被用于许多计算机视觉任务,如分类和检索任务(Gao等人,2012)。)
- However, traditional hypergraph learning methods (Zhou, Huang, and Schölkopf 2007) suffer from their high computation complexity and storage cost, which limits the wide application of hypergraph learning methods. (然而,传统的超图学习方法(Zhou、Huang和Schölkopf 2007)具有较高的计算复杂度和存储成本,这限制了超图学习方法的广泛应用。)
- (5) In this paper, we propose a hypergraph neural networks framework (HGNN) for data representation learning.
- In this method, the complex data correlation is formulated in a hypergraph structure, and we design a hyperedge convolution operation to better exploit the high-order data correlation for representation learning.
- More specifically, HGNN is a general framework which can incorporate with multi-modal data and complicated data correlations. (更具体地说,HGNN是一个通用框架,它可以结合多模态数据和复杂的数据相关性。)
- Traditional graph convolutional neural networks can be regarded as a special case of HGNN. (传统的图卷积神经网络可以看作是HGNN的特例。)
- (6) To evaluate the performance of the proposed HGNN framework, we have conducted experiments on citation network classification and visual object recognition tasks. The experimental results on four datasets and comparisons with graph convolutional network (GCN) and other traditional methods have shown better performance of HGNN. These results indicate that the proposed HGNN method is more effective on learning data representation using high-order and complex correlations. (为了评估所提出的HGNN框架的性能,我们对引文网络分类和可视对象识别任务进行了实验。在四个数据集上的实验结果以及与图卷积网络(GCN)和其他传统方法的比较表明,HGNN具有更好的性能。这些结果表明,所提出的HGNN方法在使用高阶和复相关学习数据表示方面更为有效。)
- (7) The main contributions of this paper are two-fold:
-
We propose a hypergraph neural networks framework, i.e., HGNN, for representation learning using hypergraph structure.
- HGNN is able to formulate complex and high-order data correlation through its hypergraph structure and can be also efficient using hyperedge convolution operations.
- It is effective on dealing with multi-modal data/features.
- Moreover, GCN (Kipf and Welling 2017) can be regarded as a special case of HGNN, for which the edges in simple graph can be regarded as 2-order hyperedges which connect just two vertices. (此外,GCN(Kipf and Welling 2017)可被视为HGNN的特例,对于该特例,简单图中的边可被视为仅连接两个顶点的2阶超边。)
-
We have conducted extensive experiments on citation network classification and visual object classification tasks. Comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed HGNN framework. Experiments also indicate the better performance of the proposed method when dealing with multi-modal data. (我们对引文网络分类和可视化对象分类任务进行了广泛的实验。与最新方法的比较证明了所提出的HGNN框架的有效性。实验还表明,该方法在处理多模态数据时具有较好的性能。)
-
2 Related Work
In this section, we briefly review existing works of hypergraph learning and neural networks on graph.
2.1 Hypergraph learning
(介绍一下hypergraph在学术界的发展历程,哪些论文中提出过,用来解决什么问题.其实CV领域已经开始用了)
- In many computer vision tasks, the hypergraph structure has been employed to model high-order correlation among data. (在许多计算机视觉任务中,超图结构被用来模拟数据之间的高阶相关性)
- Hypergraph learning is first introducedin(Zhou,Huang,and Schölkopf 2007), as a propagation process on hypergraph structure. The transductive inference on hypergraph aims to minimize the label difference among vertices with stronger connections on hypergraph. (超图学习是在(Zhou、Huang和Schölkopf 2007)中首次引入的,作为超图结构上的传播过程。超图上的直推推理旨在最小化超图上具有较强连接的顶点之间的标签差异。)
- In (Huang, Liu, and Metaxas 2009), hypergraph learning is further employed in video object segmentation. (Huang et al. 2010) used the hypergraph structure to model image relationship and conducted transductive inference process for image ranking. (在(Huang、Liu和Metaxas,2009)中,超图学习进一步应用于视频对象分割。(Huang等人,2010)使用超图结构对图像关系进行建模,并对图像排序进行归纳推理)
- To further improve the hypergraph structure, research attention has been attracted for leaning the weights of hyperedges, which have great influence on modeling the correlation of data. (为了进一步改进超图的结构,超边的权值学习成为研究的热点,超边的权值学习对数据相关性建模有很大的影响。)
- In (Gao et al. 2013), a l 2 l_2 l2 regularize on the weights is introduced to learn optimal hyperedge weights. (在(Gao等人,2013年)中,引入了对权重的L2正则化来学习最优超边权重。)
- In (Hwang et al. 2008), the correlation among hyperedges is further explored by a assumption that highly correlated hyperedges should have similar weights. (在(Hwang et al.2008)中,通过假设高度相关的超边应具有相似的权重,进一步探讨了超边之间的相关性。)
- Regarding the multi-modal data, in (Gao et al. 2012), multi-hypergraph structure is introduced to assign weights for different sub-hypergraphs, which corresponds to different modalities. (关于多模态数据,在(Gao et al.2012)中,引入了多超图结构来为不同的子超图分配权重,这对应于不同的模态。)
2.2 图上的神经网络
-
(1) Since many irregular data that do not own a grid-like structure can only be represented in the form of graph, extending neural networks to graph structure has attracted great attention from researchers. (由于许多不规则数据不是网格结构,只能以图形的形式表示,因此将神经网络扩展到图形结构引起了研究者的极大关注。)
- In (Gori, Monfardini, and Scarselli 2005) and (Scarselli et al. 2009), the neural network on graph is first introduced to apply recurrent neural networks to deal with graphs. For generalizing convolution network to graph,the methods are divided into spectral and non-spectral approaches. (在(Gori、Monfardini和Scarselli 2005)和(Scarselli et al.2009)中,首次引入了图上神经网络,以应用递归神经网络处理图。为了将卷积网络推广到图,将卷积网络的方法分为谱方法和非谱方法)
-
(2) For spectral approaches, the convolution operation is formulated in spectral domain of graph. (对于谱方法,卷积运算在图的谱域中表示)
- (Bruna et al. 2014) introduces the first graph CNN, which uses the graph Laplacian eigen basis as an analogy of the Fourier transform. (介绍了第一个图形CNN,它使用图形拉普拉斯特征基作为傅里叶变换的类比。)
- In (Henaff, Bruna, and LeCun 2015), the spectral filters can be parameterized with smooth coefficients to make them spatial-localized. (可以使用平滑系数对光谱滤波器进行参数化,以使其空间局部化)
- In (Defferrard, Bresson, and Vandergheynst 2016), a Chebyshev expansion of the graph Laplacian is further used to approximate the spectral filters. (在(Defferard、Bresson和Vandergheynst 2016)中,进一步使用图Laplacian的切比雪夫拓展来近似光谱滤波器。)
- Then, in (Kipf and Welling 2017), the chebyshev polynomials are simplified into 1-order polynomials to form an efficient layer-wise propagation model. (然后,在(Kipf and Welling 2017)中,切比雪夫多项式被简化为一阶多项式,以形成有效的分层传播模型。)
-
(3) For spatial approaches, the convolution operation is defined in groups of spatial close nodes. (对于空间方法,卷积运算在空间闭合节点组中定义)
- In (Atwood and Towsley 2016), the powers of a transition matrix is employed to define the neighborhood of nodes. (利用转移矩阵的幂定义节点的邻域)
- (Monti et al. 2017) uses the local path operators in the form of Gaussian mixture models to generalize convolution in spatial domain.
- In (Velickovic et al. 2018), the attention mechanisms is introduced into the graph to build attention-based architecture to perform the node classification task on graph.
3 Hypergraph Neural Networks
- In this section, we introduce our proposed hypergraph neural networks (HGNN).
- We first briefly introduce hypergraph learning,
- and then the spectral convolution on hypergraph is provided.
- Following, we analyze the relations between HGNN and existing methods. In the last part of the section, some implementation details will be given.
3.1 Hypergraph learning statement
- (1) We first review the hypergraph analysis theory. Different from simple graph, a hyperedge in a hypergraph connects two or more vertices. A hypergraph is defined as
G
=
(
V
,
E
,
W
)
\mathcal{G} = (\mathcal{V}, \mathcal{E}, W)
G=(V,E,W), which includes a vertex set
V
\mathcal{V}
V, a hyperedge set
E
\mathcal{E}
E. Each hyperedge is assigned with a weight by
W
W
W, a diagonal matrix of edge weights. The hypergraph
G
\mathcal{G}
G can be denoted by a
∣
V
∣
×
∣
E
∣
|\mathcal{V}| \times |\mathcal{E}|
∣V∣×∣E∣ incidence matrix
H
H
H, with entries defined as
- (2) For a vertex
v
∈
V
v \in \mathcal{V}
v∈V, its degree is defined as
d
(
v
)
=
∑
e
∈
E
ω
(
e
)
h
(
v
,
e
)
d(v) = \sum_{e\in \mathcal{E}}\omega(e)h(v,e)
d(v)=∑e∈Eω(e)h(v,e).
- For an edge e ∈ E e\in\mathcal{E} e∈E, its degree is defined as δ ( e ) = ∑ v ∈ E h ( v , e ) \delta(e) = \sum_{v\in\mathcal{E}}h(v,e) δ(e)=∑v∈Eh(v,e).
- Further, D e D_e De and D v D_v Dv denote the diagonal matrices of the edge degrees and the vertex degrees, respectively.
- (3) Here let us consider the node(vertex) classification problem on hypergraph, where the node labels should be smooth on the hypergraph structure. The task can be formulated as a regularization framework as introduced by (Zhou, Huang, and Schölkopf 2007): (这里让我们考虑超图上的节点(顶点)分类问题,其中节点标签在超图结构上应该是光滑的。这项任务可以表述为(Zhou、Huang和Schölkopf,2007)介绍的规范化框架:)
- where Ω ( f ) \Omega(f) Ω(f) is a regularize on hypergraph,
- R e m p ( f ) R_{emp}(f) Remp(f) denotes the supervised empirical loss,
- f ( ⋅ ) f(\cdot) f(⋅) is a classification function.
- The regularize
Ω
(
f
)
\Omega(f)
Ω(f) is defined as:
- we let
θ
=
D
v
−
1
/
2
H
W
D
e
−
1
H
⊤
D
v
−
1
/
2
\theta = D^{-1/2}_v HWD^{-1}_eH^{\top}D^{-1/2}_v
θ=Dv−1/2HWDe−1H⊤Dv−1/2 and
Δ
=
I
−
Θ
\Delta = I - \Theta
Δ=I−Θ.Then, the normalized
Ω
(
f
)
\Omega(f)
Ω(f) can be written as:
- where Δ \Delta Δ is positive semi-definite, and usually called the hypergraph Laplacian.
3.2 Spectral convolution on hypergraph
- (1) Given a hypergraph
G
=
(
V
,
E
,
Δ
)
\mathcal{G} = (\mathcal{V}, \mathcal{E}, \Delta)
G=(V,E,Δ) with
n
n
n vertices, since the hypergraph Laplacian
Δ
\Delta
Δ is a
n
×
n
n \times n
n×n positive semi-definite matrix, the eigen decomposition
Δ
=
Φ
Λ
Φ
⊤
\Delta = \Phi \Lambda {\Phi}^{\top}
Δ=ΦΛΦ⊤ can be employed to get the orthonormal eigen vectors
Φ
=
d
i
a
g
(
ϕ
1
,
.
.
.
,
ϕ
n
)
\Phi = diag({\phi}_1, ..., {\phi}_n)
Φ=diag(ϕ1,...,ϕn) and a diagonal matrix
Λ
=
d
i
a
g
(
λ
1
,
.
.
.
,
λ
n
)
\Lambda = diag({\lambda}_1, ..., {\lambda}_n)
Λ=diag(λ1,...,λn) containing corresponding non-negative eigenvalues. Then, the Fourier transform for a signal
x
=
(
x
1
,
.
.
.
,
x
n
)
x = (x_1, ..., x_n)
x=(x1,...,xn) in hypergraph is defined as
x
^
=
Φ
⊤
x
\hat{x} = {\Phi}^{\top}x
x^=Φ⊤x, where the eigen vectors are regarded as the Fourier bases and the eigenvalues are interpreted as frequencies. The spectral convolution of signal
x
x
x and filter
g
g
g can be denoted as
- where ⊙ \odot ⊙ denotes the element-wise Hadamard product and
- g ( Δ ) = d i a g ( g ( λ 1 , . . . , g ( λ n ) ) ) g(\Delta) = diag(g({\lambda}_1, ..., g({\lambda}_n))) g(Δ)=diag(g(λ1,...,g(λn))) is a function of the Fourier coefficients. However, the computation cost in forward and inverse Fourier transform is O ( n 2 ) O(n^2) O(n2).
- (2)To solve the problem, we can follow (Defferrard, Bresson, and V andergheynst 2016) to parametrize
g
(
Δ
)
g(\Delta)
g(Δ) with
K
K
K order polynomials.
- Furthermore, we use the truncated Chebyshev expansion as one such polynomial.
- Chebyshv polynomials
T
k
(
x
)
T_k(x)
Tk(x) is recursively computed by
T
k
(
x
)
=
2
x
T
k
−
1
(
x
)
−
T
k
−
2
(
x
)
T_k(x) = 2xT_{k−1}(x) − T_{k−2}(x)
Tk(x)=2xTk−1(x)−Tk−2(x), with
T
0
(
x
)
=
1
T_0(x) = 1
T0(x)=1 and
T
1
(
x
)
=
x
T_1(x) = x
T1(x)=x. Thus, the
g
(
Δ
)
g(\Delta)
g(Δ) can be parametried as
- where T k ( Δ ~ ) T_k(\tilde{\Delta}) Tk(Δ~) is the Chebyshev polynomial of order k k k with scaled Laplacian Δ ~ = 2 λ m a x Δ − I \tilde{\Delta} = \frac{2}{\lambda_{max}}\Delta-I Δ~=λmax2Δ−I.
- (3) In Equation 6, the expansive computation of Laplacian Eigen vectors is excluded and only matrix powers, additions and multiplications are included, which brings further improvement in computation complexity. We can further let
K
=
1
K = 1
K=1 to limit the order of convolution operation due to that the Laplacian in hypergraph can already well represent the high-order correlation between nodes. It is also suggested in (Kipf and Welling 2017) that
λ
m
a
x
≈
2
\lambda_{max}\approx2
λmax≈2 because of the scale adaptability of neural networks. Then, the convolution operation can be further simplified to (在方程6中,排除了拉普拉斯特征向量的扩展计算,仅包括矩阵幂、加法和乘法,从而进一步提高了计算复杂度。我们可以进一步让K=1来限制卷积运算的阶数,因为超图中的拉普拉斯算子已经能够很好地表示节点之间的高阶相关性。(Kipf和Welling 2017)中还建议
λ
m
a
x
≈
2
\lambda_{max}\approx2
λmax≈2由于神经网络的规模适应性。然后,卷积运算可以进一步简化为)
- where
θ
0
\theta_0
θ0 and
θ
1
\theta_1
θ1 is parameters of filters over all nodes. We further use a single parameter
θ
\theta
θ to avoid the overfitting problem, which is defined as (我们进一步使用单参数hetaθ来避免过拟合问题,该问题定义为)
- where
θ
0
\theta_0
θ0 and
θ
1
\theta_1
θ1 is parameters of filters over all nodes. We further use a single parameter
θ
\theta
θ to avoid the overfitting problem, which is defined as (我们进一步使用单参数hetaθ来避免过拟合问题,该问题定义为)
- (4)Then, the convolution operation can be simplified to the following expression
- where ( W + I ) (W + I) (W+I) can be regarded as the weight of the hyperedges. W W W is initialized as an identity matrix, which means equal weights for all hyperedges.
- (5) When we have a hypergraph signal
X
∈
R
n
×
C
1
X \in R^{n \times C_1}
X∈Rn×C1 with
n
n
n nodes and
C
1
C_1
C1 dimensional features, our hyperedge convolution can be formulated by
- where W = d i a g ( w 1 , . . . , w n ) W = diag(w_1, ..., w_n) W=diag(w1,...,wn). Θ ∈ R C 1 × C 2 \Theta \in R^{C_1 \times C_2} Θ∈RC1×C2 is the parameter to be learned during the training process. The filter Θ \Theta Θ is applied over the nodes in hypergraph to extract features. After convolution, we can obtain Y ∈ R n × C 2 Y \in R^{n \times C_2} Y∈Rn×C2, which can be used for classification.
3.3 Hypergraph neural networks analysis
- (1) Figure 3 illustrates the details of the hypergraph neural networks. Multi-modality datasets are divided into training data and testing data, and each data contains several nodes with features. Then multiple hyperedge structure groups are constructed from the complex correlation of the multi-modality datasets. We concatenate the hyperedge groups to generate the hypergraph adjacent matrix
H
H
H. The hypergraph adjacent matrix
H
H
H and the node feature are fed into the HGNN to get the node output labels. As introduced in the above section, we can build a hyperedge convolutional layer
f
(
X
,
W
,
Θ
)
f(X, W, \Theta)
f(X,W,Θ) in the following formulation
- where X ( 1 ) ∈ R N × C X^{(1)}\in R^{N \times C} X(1)∈RN×C is the signal of hypergraph at l l l layer, X ( 0 ) = X X^{(0)} = X X(0)=X and σ \sigma σ denotes the nonlinear activation function.
- (2) The HGNN model is based on the spectral convolution on the hypergraph. Here, we further investigate HGNN in the property of exploiting high-order correlation among data. As is shown in Figure 4, the HGNN layer can perform node-edge-node transform, which can better refine the features using the hypergraph structure. (HGNN模型基于超图上的谱卷积。在此,我们进一步研究了HGNN利用数据间高阶相关性的特性。如图4所示,HGNN层可以执行节点-边缘-节点变换,这可以使用hypergraph结构更好地细化特征。)
- More specifically,
- at first, the initial node feature X ( 1 ) X^{(1)} X(1) is processed by learnable filter matrix Θ ( 1 ) \Theta^{(1)} Θ(1) to extract C 2 C_2 C2-dimensional feature.
- Then, the node feature is gathered according to the hyperedge to form the hyperedge feature R E × C 2 R^{E×C2} RE×C2, which is implemented by the multiplication of H ⊤ ∈ R E × N H^{\top} \in R^{E×N} H⊤∈RE×N.
- Finally the output node feature is obtained by aggregating their related hyperedge feature, which is achieved by multiplying matrix H H H.
- Denote that D v D_v Dv and D e D_e De play a role of normalization in Equation 11. Thus, the HGNN layer can efficiently extract the high-order correlation on hypergraph by the node-edge-node transform.
- More specifically,
3.4 Relations to existing methods
- When the hyperedges only connect two vertices, the hypergraph is simplified into a simple graph and the Laplacian
Δ
\Delta
Δ is also coincident with the Laplacian of simple graph up to a factor of
1
2
\frac{1}{2}
21.
- Compared with the existing graph convolution methods, our HGNN can naturally model high-order relationship among data, which is effectively exploited and encoded in forming feature extraction.
- Compared with the traditional hypergraph method, our model is highly efficient in computation without the inverse operation of Laplacian Δ \Delta Δ. It should also be noted that our HGNN has great expansibility toward multi-modal feature with the flexibility of hyperedge generation.
4 Implementation
4.1 Hypergraph construction
- (1) In our visual object classification task, the features of
N
N
N visual object data can be represented as
X
=
[
x
1
,
.
.
.
,
x
n
]
⊤
X = {[x_1, ..., x_n]}^{\top}
X=[x1,...,xn]⊤. We build the hypergraph according to the distance between two features.
- More specifically, Euclidean distance is used to calculate d ( x i , x j ) d(x_i,x_j) d(xi,xj).
- In the construction, each vertex represents one visual object, and each hyperedge is formed by connecting one vertex and its K K K nearest neighbors, which brings N N N hyperedges that links K K K + 1 vertices. And thus, we get the incidence matrix H ∈ R N × N H \in R^{N \times N} H∈RN×N with N × ( K + 1 ) N × (K + 1) N×(K+1) entries equaling to 1 while others equaling to 0.
- In the citation network classification, where the data are organized in graph structure, each hyperedge is built by linking one vertex and their neighbors according to the adjacency relation on graph. So we also get N N N hyperedges and H ∈ R N × N H \in R^{N \times N} H∈RN×N
4.2 Model for node classification
- (1) In the problem of node classification, we build the HGNN model as in ==Figure 3**. The dataset is divided into training data and test data. Then hypergraph is constructed as the section above, which generates the incidence matrix
H
H
H and corresponding
D
e
D_e
De.
- We build a two-layer HGNN model to employ the powerful capacity of HGNN layer.
- And the softmax function is used to generate predicted labels.
- During training, the cross-entropy loss for the training data is back-propagated to update the parameters Θ \Theta Θ and in testing, the labels of test data is predicted for evaluating the performance.
- When there are multi-modal information incorporate them by the construction of hyper-edge groups and then various hyperedges are fused together to model the complex relationship on data.
5 Experiments
- In this section, we evaluate our proposed hypergraph neural networks on two task: citation network classification and visual object recognition. We also compare the proposed method with graph convolutional networks and other state-of-the-art methods.
5.1 Citation network classification
5.1.1Datasets
- In this experiment, the task is to classify citation data. Here, two widely used citation network datasets, i.e., Cora and Pubmed (Sen et al. 2008) are employed. The experimental setup follows the settings in (Y ang, Cohen, and Salakhutdinov 2016). In both of those two datasets, the feature for each data is the bag-of-words representation of documents. The data connection, i.e., the graph structure, indicates the citations among those data. (本实验的任务是对引文数据进行分类。这里使用了两个广泛使用的引文网络数据集,即Cora和Pubmed(Sen等人,2008年)。实验设置遵循(Y ang、Cohen和Salakhutdinov 2016)中的设置。在这两个数据集中,每个数据的特征都是文档的单词包表示。数据连接,即图形结构,表示这些数据之间的引用。)
- To generate the hypergraph structure for HGNN, each time one vertex in the graph is selected as the centroid and its connected vertices are used to generate one hyperedge including the centroid itself. Through this we can obtain the same size incidence matrix compared with the original graph. (为生成HGNN的超图结构,每次选择图中的一个顶点作为质心,并使用其连接的顶点生成一条超边,包括质心本身。通过这种方法,我们可以得到与原始图相同大小的关联矩阵。)
- It is noted that as there are no more information for data relationship, the generated hypergraph constructure is quite similar to the graph. (需要注意的是,由于没有关于数据关系的更多信息,因此生成的超图结构与图非常相似。)
- The Cora dataset contains 2708 data and 5% are used as labeled data for training.
- The Pubmed dataset contains 19717 data, and only 0.3% are used for training. The detailed description for the two datasets listed in Table 1.
5.1.2Experimental settings
- In this experiment, a two-layer HGNN is applied. (GNN,GCN,GAT…graph-based Molde中2层用的挺多的)
- The feature dimension of the hidden layer is set as 16 and the dropout (Srivastava et al. 2014) is employed to avoid overfitting with drop rate p p p = 0.5. We choose the ReLU as the nonlinear activation function. During the training process, we use Adam optimizer (Kingma and Ba 2014) to minimize our cross-entropy loss function with a learning rate of 0.001. We have also compared the proposed HGNN with recent methods in these experiments.
5.1.3 Results and discussion
- The results of the experimental results and comparisons on the citation network dataset are shown in Table 2. For our HGNN model, we report the average classification accuracy of 100 runs on Core and Pumbed, which is 81.6% and 80.1%. As shown in the results, the proposed HGNN model can achieve the best or comparable performance compared with the state-of-the-art methods. Compared with GCN, the proposed HGNN method can achieve a slight improvement on the Cora dataset and 1.1% improvement on the Pubmed dataset. We note that the generated hypergraph structure is quite similar to the graph structure as there is neither extra nor more complex information in these data. Therefore, the gain obtained by HGNN is not very significant.
5.2 Visual object classification
5.2.1 Datasets and experimental settings
- (1) In this experiment, the task is to classify visual objects. Two public benchmarks are employed here, including the Princeton ModelNet40 dataset (Wu et al. 2015) and the National Taiwan University (NTU) 3D model dataset (Chen et al. 2003), as shown in MTable 3.
- The ModelNet40 dataset consists of 12,311 objects from 40 popular categories, and the same training/testing split is applied as introduced in (Wu et al. 2015), where 9,843 objects are used for training and 2,468 objects are used for testing.
- The NTU dataset is composed of 2,012 3D shapes from 67 categories, including car, chair, chess, chip, clock, cup, door, frame, pen, plant leaf and so on. In the NTU dataset, 80% data are used for training and the other 20% data are used for testing.
- In this experiment, each 3D object is represented by the extracted features. Here, two recent state-of-the-art shape representation methods are employed, including Multi-view Convolutional Neural Network (MVCNN) (Su et al. 2015) and Group-View Convolutional Neural Network (GVCNN) (Feng et al. 2018). These two methods are selected due to that they have shown satisfactory performance on 3D object representation.
- We follow the experimental settings of MVCNN and GVCNN to generate multiple views of each 3D object. Here, 12 virtual cameras are employed to capture views with a interval angle of 30 degree, and then both the MVCNN and the GVCNN features are extracted accordingly.
- (2) To compare with GCN method, it is noted that there is no available graph structure in the ModelNet40 dataset and the NTU dataset. Therefore, we construct a probability graph based on the distance of nodes. Given the features of data, the affinity matrix
A
A
A is generated to represent the relationship among different vertices, and
A
i
j
A_{ij}
Aij can be calculated by
- where D i j D_{ij} Dij indicates the Euclidean distance between node i i i and node j j j.
- Δ \Delta Δ is the average pairwise distance between nodes.
- For the GCN experiment with two features constructed simple graphs, we simply average the two modality adjacency matrices to get the fused graph structure for comparison. (对于具有两个特征构造的简单图的GCN实验,我们简单地对两个模态邻接矩阵进行平均,以得到融合图结构以进行比较)
5.2.2 Hypergraph structure construction on visual datasets
- In experiments on ModelNet40 and NTU datasets, two hypergraph construction methods are employed.
- The first one is based on single modality feature
- and the other one is based on multi-modality feature.
- In the first case, only one feature is used. Each time one object in the dataset is selected as the centroid, and its 10 nearest neighbors in the selected feature space are used to generate one hyperedge including the centroid itself, as shown in Figure 5.
- Then, a hypergraph G \mathcal{G} G with N N N hyperedges can be constructed.
- In the second case, multiple features are used to generate a hypergraph
G
\mathcal{G}
G modeling complex multi-modality correlation.
- Here, for the i t h i^{th} ith modality data, a hypergraph adjacent matrix H i H_i Hi is constructed accordingly.
- After all the hypergraphs from different features have been generated, these adjacent matrices H i H_i Hi can be concatenated to build the multi-modality hypergraph adjacent matrix H H H.
- In this way, the hypergraphs using single modal feature and multi-modal features can be constructed.
图5:视觉对象分类任务中的超边生成的一个例子。左:对于每个节点,我们通过欧几里德距离聚合其N个相邻节点以生成超边。右:为了生成多模态超图邻接矩阵,我们将两个模态的矩阵进行拼接。
5.3 Results and discussions
-
(1) Experiments and comparisons on the visual object recognition task are shown in Table 4 and Table 5, respectively.
-
(2) For the ModelNet40 dataset, we have compared the proposed method using two features with recent state-of-the-are methods in Table 6. As shown in the results, we can have the following observations:
- 1.The proposed HGNN method outperforms the state-of-the-art object recognition methods in the ModelNet40 dataset. More specifically, compared with PointCNN and SO-Net, the proposed HGNN method can achieve gains of 4.8% and 3.2%, respectively. These results demonstrate the superior performance of the proposed HGNN method on visual object recognition.
- 2.Compared with GCN, the proposed method achieves better performance in all experiments. As shown in Table 4 and Table 5, when only one feature is used for graph/hypergraph structure generation, HGNN can obtain slightly improvement. For example, when GVCNN is used as the object feature and MVCNN is used for graph/hypergraph structure generation, HGNN achieves gains of 0.3% and 2.0% compared with GCN on the ModelNet40 and the NTU datasets, respectively. When more features, i.e., both GVCNN and MVCNN, are used for graph/hypergraph structure generation, HGNN achieves much better performance compared with GCN. For example, HGNN achieves gains of 8.3%, 10.4% and 8.1% compared with GCN when GVCNN, MVCNN and GVCNN+MVCNN are used as the object features on the NTU dataset, respectively.
-
(3) The better performance can be dedicated to the employed hypergraph structure. (所采用的超图结构具有更好的性能。)
- The hypergraph structure is able to convey complex and high-order correlations among data, which can better represent the underneath data relationship compared with graph structure or the methods without graph structure. (超图结构能够表达数据之间复杂的高阶相关性,与图结构或无图结构的方法相比,超图结构能够更好地表达底层数据之间的关系。)
- Moreover, when multi-modal data/features are available, HGNN has the advantage of combining such multi-modal information in the same structure by its flexible hyperedges. Compared with traditional hypergraph learning methods, which may suffer from the high computational complexity and storage cost, the proposed HGNN framework is much more efficient through the hyperedge convolution operation. (此外,当多模态数据/特征可用时,HGNN具有通过其灵活的超边将此类多模态信息组合在同一结构中的优势。与传统的超图学习方法相比,该方法具有较高的计算复杂度和存储开销,通过超边卷积运算,提高了学习效率。)
6 Conclusion
- (1) In this paper, we propose a framework of hypergraph neural networks (HGNN).
- In this method, HGNN generalizes the convolution operation to the hypergraph learning process. The convolution on spectral domain is conducted with hypergraph Laplacian and further approximated by truncated chebyshev polynomials. (在该方法中,HGNN将卷积运算推广到超图学习过程。利用超图拉普拉斯算子进行谱域卷积,并进一步用截断切比雪夫多项式逼近。)
- HGNN is a more general framework which is able to handle the complex and high-order correlations through the hypergraph structure for representation learning compared with traditional graph. (HGNN是一种更通用的框架,它能够通过超图结构处理复杂的高阶相关性,用于表示学习。)
- (2) We have conducted experiments on citation network classification and visual object recognition tasks to evaluate the performance of the proposed HGNN method. Experimental results and comparisons with the state-of-the-art methods demonstrate better performance of the proposed HGNN model. HGNN is able to take complex data correlation into representation learning and thus lead to potential wide applications in many tasks, such as visual recognition, retrieval and data classification. (我们对引文网络分类和视觉对象识别任务进行了实验,以评估所提出的HGNN方法的性能。实验结果和与现有方法的比较表明,所提出的HGNN模型具有更好的性能。HGNN能够将复杂的数据相关性引入表示学习,因此在视觉识别、检索和数据分类等许多任务中有着潜在的广泛应用。)