Paper：《Graph Neural Networks: A Review of Methods and Applications—图神经网络:方法与应用综述》翻译与解读

Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics systems, learning molecular fingerprints, predicting protein interface, and classifying diseases demand a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures (like the dependency trees of sentences and the scene graphs of images) is an important research topic which also needs graph reasoning models. Graph neural networks (GNNs) are neural models that capture the dependence of graphs via message passing between the nodes of graphs. In recent years, variants of GNNs such as graph convolutional network (GCN), graph attention network (GAT), graph recurrent network (GRN) have demonstrated ground-breaking performances on many deep learning tasks. In this survey, we propose a general design pipeline for GNN models and discuss the variants of each component, systematically categorize the applications, and propose four open problems for future research.

许多学习任务都需要处理包含丰富元素之间关系信息的图数据。物理系统建模、分子指纹学习、蛋白质界面预测和疾病分类都需要一个模型来从图形输入中学习。在其他领域，如从文本和图像等非结构数据中学习，对提取的结构（如句子的依赖树和图像的场景图）进行推理是一个重要的研究课题，也需要图推理模型。图神经网络(GNN)是一种神经模型，它通过图节点之间的消息传递来捕获图的依赖性。近年来，GNN的变体，如图卷积网络(GCN)、图注意网络(GAT)、图循环网络(GRN)在许多深度学习任务中表现出突破性的性能。在本次调查中，我们提出了 GNN 模型的通用设计流程并讨论了每个组件的变体，系统地对应用程序进行了分类，并提出了四个未解决的问题以供未来研究。

Keywords:Deep learning,Graph neural network

关键词：深度学习，图神经网络

1、Introduction介绍

Graphs are a kind of data structure which models a set of objects (nodes) and their relationships (edges). Recently, researches on analyzing graphs with machine learning have been receiving more and more attention because of the great expressive power of graphs, i.e. graphs can be used as denotation of a large number of systems across various areas including social science (social networks (Wu et al., 2020), natural science (physical systems (Sanchez et al., 2018; Battaglia et al., 2016) and protein-protein interaction networks (Fout et al., 2017)), knowledge graphs (Hamaguchi et al., 2017) and many other research areas (Khalil et al., 2017). As a unique non-Euclidean data structure for machine learning, graph analysis focuses on tasks such as node classiﬁ-cation, link prediction, and clustering. Graph neural networks (GNNs) are deep learning based methods that operate on graph domain. Due to its convincing performance, GNN has become a widely applied graph analysis method recently. In the following paragraphs, we will illustrate the fundamental motivations of graph neural networks.

图是一种数据结构，它对一组对象(节点)及其关系(边)建模。近年来，利用机器学习分析图的研究受到越来越多的关注，因为图具有强大的表达能力，即图可以被用作包括社会科学（社交网络（social networks）（Wu et al., 2020)、自然科学（物理系统（Sanchez et al., 2018; Battaglia et al., 2016）和蛋白质-蛋白质相互作用网络（Fout et al., 2017））、知识图谱（Hamaguchi et al., 2017) 和许多其他研究领域 (Khalil et al., 2017)。图分析是机器学习中一种独特的非欧几里德数据结构，主要用于节点分类、链接预测和聚类等任务。图神经网络(GNNs)是一种基于深度学习的操作在图域上的方法。由于其令人信服的性能，GNN已成为近年来广泛应用的图分析方法。在下面的段落中，我们将说明图神经网络的基本动机。

The ﬁrst motivation of GNNs roots in the long-standing history of neural networks for graphs. In the nineties, Recursive Neural Networks are ﬁrst utilized on directed acyclic graphs (Sperduti and Starita, 1997; Frasconi et al., 1998). Afterwards, Recurrent Neural Networks and Feedforward Neural Networks are introduced into this literature respectively in (Scarselli et al., 2009) and (Micheli, 2009) to tackle cy-cles. Although being successful, the universal idea behind these methods is building state transition systems on graphs and iterate until conver-gence, which constrained the extendability and representation ability. Recent advancement of deep neural networks, especially convolutional neural networks (CNNs) (LeCun et al., 1998) result in the rediscovery of GNNs. CNNs have the ability to extract multi-scale localized spatial features and compose them to construct highly expressive representa-tions, which led to breakthroughs in almost all machine learning areas and started the new era of deep learning (LeCun et al., 2015). The keys of CNNs are local connection, shared weights and the use of multiple layers (LeCun et al., 2015). These are also of great importance in solving problems on graphs. However, CNNs can only operate on regular Euclidean data like images (2D grids) and texts (1D sequences) while these data structures can be regarded as instances of graphs. Therefore, it is straightforward to generalize CNNs on graphs. As shown in Fig. 1, it is hard to deﬁne localized convolutional ﬁlters and pooling operators, which hinders the transformation of CNN from Euclidean domain to non-Euclidean domain. Extending deep neural models to non-Euclidean domains, which is generally referred to as geometric deep learning, has been an emerging research area (Bronstein et al., 2017). Under this umbrella term, deep learning on graphs receives enormous attention.

GNN的第一个动机源于图神经网络的悠久历史。

在九十年代，递归神经网络首次被用于有向无环图(Sperduti和Starita, 1997;Frasconi et al.， 1998)。随后，循环神经网络和前馈神经网络分别在 (Scarselli et al., 2009) 和 (Micheli, 2009) 中被引入该文献以解决循环问题。这些方法虽然取得了成功，但其普遍的思想是在图上构建状态转换系统并迭代直到收敛，这限制了可扩展性和表示能力。

最近深度神经网络，特别是卷积神经网络(CNNs) (LeCun et al.， 1998)的进展导致了GNN的重新发现。CNN能够提取多尺度的局部空间特征，并将其组合起来构建具有高度表达表示的能力，这导致了几乎所有机器学习领域的突破，并开启了深度学习的新时代(LeCun et al.， 2015)。CNN的关键是本地连接、共享权重和多层使用(LeCun et al.， 2015)。这些在解决图上的问题时也非常重要。然而，CNN只能对图像(2D网格)和文本(1D序列)等规则的欧几里得数据进行操作，而这些数据结构可以被视为图的实例。因此，在图上推广CNN是很简单的。如图1所示，很难定义局部卷积滤波器和池化运算符，这阻碍了CNN从欧几里得域向非欧几里得域的转换。将深度神经模型扩展到非欧几里得域，通常被称为几何深度学习，一直是一个新兴的研究领域(Bronstein et al.， 2017)。在这个总称下，图上的深度学习受到了极大的关注。

The other motivation comes from graph representation learning (Cui et al., 2018a; Hamilton et al., 2017b; Zhang et al., 2018a; Cai et al., 2018; Goyal and Ferrara, 2018), which learns to represent graph nodes, edges or subgraphs by low-dimensional vectors. In the ﬁeld of graph analysis, traditional machine learning approaches usually rely on hand engineered features and are limited by its inﬂexibility and high cost. Following the idea of representation learning and the success of word embedding (Mikolov et al., 2013), DeepWalk (Perozzi et al., 2014), regarded as the ﬁrst graph embedding method based on representation learning, applies SkipGram model (Mikolov et al., 2013) on the generated random walks. Similar approaches such as node2vec (Grover and Leskovec, 2016), LINE (Tang et al., 2015) and TADW (Yang et al., 2015) also achieved break-throughs. However, these methods suffer from two severe drawbacks (Hamilton et al., 2017b). First, no parameters are shared between nodes in the encoder, which leads to computationally inefﬁciency, since it means the number of parameters grows linearly with the number of nodes. Second, the direct embedding methods lack the ability of gener-alization, which means they cannot deal with dynamic graphs or generalize to new graphs.

Based on CNNs and graph embedding, variants of graph neural net-works (GNNs) are proposed to collectively aggregate information from graph structure. Thus they can model input and/or output consisting of elements and their dependency.

There exists several comprehensive reviews on graph neural net-works. Bronstein et al. (2017) provide a thorough review of geometric deep learning, which presents its problems, difﬁculties, solutions, ap-plications and future directions. Zhang et al. (2019a) propose another comprehensive overview of graph convolutional networks. However, they mainly focus on convolution operators deﬁned on graphs while we investigate other computation modules in GNNs such as skip connections and pooling operators.

另一个动机来自于图表示学习（Cui et al., 2018a; Hamilton et al., 2017b; Zhang et al., 2018a; Cai et al., 2018; Goyal and Ferrara, 2018），学习用低维向量表示图节点、边或子图。在图分析领域，传统的机器学习方法通常依赖于人工设计的特征，受到其不灵活性和高成本的限制。继表征学习的思想和词嵌入的成功(Mikolov et al.， 2013)之后，DeepWalk (Perozzi et al.， 2014)被认为是第一个基于表征学习的图嵌入方法，应用了SkipGram模型（Mikolov et al. ., 2013) 关于生成的随机游走。类似的方法如Node2vec (Grover and Leskovec, 2016)， LINE (Tang et al.， 2015)和TADW (Yang et al.， 2015)也取得了突破。然而，这些方法存在两个严重的缺陷(Hamilton et al.， 2017b)。首先，编码器中的节点之间没有共享参数，这导致计算效率低下，因为这意味着参数的数量随着节点的数量线性增长。其次，直接嵌入方法缺乏泛化能力，不能处理动态图，也不能泛化到新图。

在CNN和图嵌入的基础上，提出了图神经网络(GNNs)的变体，以共同聚合来自图结构的信息。因此，它们可以对由元素及其依赖性组成的输入和/或输出进行建模。

关于图神经网络的研究已有一些较为全面的综述。Bronstein等人(2017)全面回顾了几何深度学习，介绍了它的问题、困难、解决方案、应用和未来方向。Zhang等人(2019a)提出了图卷积网络的另一个全面概述。然而，他们主要关注定义在图上的卷积算子，而我们研究了GNN中的其他计算模块，如跳跃连接和池化操作。

Papers by Zhang et al. (2018b), Wu et al. (2019a), Chami et al. (2020) are the most up-to-date survey papers on GNNs and they mainly focus on models of GNN. Wu et al. (2019a) categorize GNNs into four groups: recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. Zhang et al. (2018b) give a systematic overview of different graph deep learning methods and Chami et al. (2020) propose a Graph Encoder Decoder Model to unify network embedding and graph neural network models. Our paper provides a different taxonomy with them and we mainly focus on classic GNN models. Besides, we summarize variants of GNNs for different graph types and also provide a detailed summary of GNNs’ applications in different domains.

There have also been several surveys focusing on some speciﬁc graph learning ﬁelds. Sun et al. (2018) and Chen et al. (2020a) give detailed overviews for adversarial learning methods on graphs, including graph data attack and defense. Lee et al. (2018a) provide a review over graph attention models. The paper proposed by Yang et al. (2020) focuses on heterogeneous graph representation learning, where nodes or edges are of multiple types. Huang et al. (2020) review over existing GNN models for dynamic graphs. Peng et al. (2020) summarize graph embeddings methods for combinatorial optimization. We conclude GNNs for het-erogeneous graphs, dynamic graphs and combinatorial optimization in Section 4.2, Section 4.3, and Section 8.1.6 respectively.

Zhang et al. (2018b)， Wu et al. (2019a)， Chami et al.(2020)的论文是关于GNN的最新调查论文，他们主要关注GNN的模型。Wu等人(2019a)将GNN分为四组:循环图神经网络、卷积图神经网络、图自动编码器和时空图神经网络。Zhang等人(2018b)系统地概述了不同的图深度学习方法，Chami等人(2020)提出了一种图形编码器解码器模型来统一网络嵌入和图形神经网络模型。我们的论文提供了一个不同的分类，我们主要关注经典的GNN模型。此外，我们总结了不同图类型的GNN变体，并详细总结了GNN在不同领域的应用。

还有一些调查侧重于一些特定的图学习领域。Sun et al.(2018)和Chen et al. (2020a)详细概述了图上的对抗性学习方法，包括图数据攻击和防御。Lee等人(2018a)对图注意力模型进行了综述。Yang等人(2020)提出的论文侧重于异构图表示学习，其中节点或边具有多种类型。Huang等人(2020)回顾了现有的动态图GNN模型。Peng等人(2020)总结了用于组合优化的图嵌入方法。我们分别在第4.2节、第4.3节和第8.1.6节中总结了同构图、动态图和组合优化的GNN。

In this paper, we provide a thorough review of different graph neural network models as well as a systematic taxonomy of the applications. To summarize, our contributions are:

The rest of this survey is organized as follows. In Section 2, we present a general GNN design pipeline. Following the pipeline, we discuss each step in detail to review GNN model variants. The details are included in Section 3 to Section 6. In Section 7, we revisit research works over theoretical and empirical analyses of GNNs. In Section 8, we introduce several major applications of graph neural networks applied to structural scenarios, non-structural scenarios and other scenarios. In Section 9, we propose four open problems of graph neural networks as well as several future research directions. And ﬁnally, we conclude the survey in Section 10.

在本文中，我们对不同的图神经网络模型进行了全面的回顾，并对其应用进行了系统的分类。综上所述，我们的贡献是:

本次调查的其余部分安排如下。在第2节中，我们提出了一个通用的 GNN 设计流程。接下来，我们将详细讨论每个步骤，以回顾GNN模型的变体。详细内容见第三节至第六节。在第7节中，我们回顾了GNN理论和实证分析的研究工作。在第8节中，我们介绍了图神经网络在结构场景、非结构场景和其他场景中的几个主要应用。在第9节中，我们提出了图神经网络的四个悬而未决的问题以及未来的几个研究方向。最后，我们在第10节结束调查。

2、General design pipeline of GNNs—GNN通用设计流水线

In this paper, we introduce models of GNNs in a designer view. We ﬁrst present the general design pipeline for designing a GNN model in this section. Then we give details of each step such as selecting computational modules, considering graph type and scale, and designing loss function in Section 3, 4, and 5, respectively. And ﬁnally, we use an example to illus-trate the design process of GNN for a speciﬁc task in Section 6.

In later sections, we denote a graph as G ¼ðV;EÞ, where jVj ¼ N is the number of nodes in the graph and jEj¼Ne is the number of edges. A 2 RN

In this section, we present the general design pipeline of a GNN model for a speciﬁc task on a speciﬁc graph type. Generally, the pipeline con-tains four steps: (1) ﬁnd graph structure, (2) specify graph type and scale,(3) design loss function and (4) build model using computational mod-ules. We give general design principles and some background knowledge in this section. The design details of these steps are discussed in later sections.

在本文中，我们在设计器视图中介绍了GNN的模型。在本节中，我们首先介绍设计GNN模型的通用设计管道。然后分别在第3节、第4节和第5节给出了计算模块的选择、图类型和尺度的考虑、损失函数的设计等各个步骤的细节。最后，在第6节中，我们用一个例子来说明GNN针对特定任务的设计过程。

在后面的章节中，我们将一个图表示为G¼ðV;EÞ，其中jVj¼N是图中的节点数量，jEj¼Ne是边的数量。a2rn

在本节中，我们将介绍GNN模型的通用设计管道，用于特定图类型上的特定任务。通常，该管道包含四个步骤:(1)查找图结构，(2)指定图类型和比例，(3)设计损失函数，(4)使用计算模块建立模型。在本节中，我们将给出一般的设计原则和一些背景知识。这些步骤的设计细节将在后面的部分中讨论。

Fig. 1. Left: image in Euclidean space. Right: graph in non-Euclidean space.

2.1、Find graph structure—查找图结构

At ﬁrst, we have to ﬁnd out the graph structure in the application. There are usually two scenarios: structural scenarios and non-structural scenarios. In structural scenarios, the graph structure is explicit in the applications, such as applications on molecules, physical systems, knowledge graphs and so on. In non-structural scenarios, graphs are implicit so that we have to ﬁrst build the graph from the task, such as building a fully-connected “word” graph for text or building a scene graph for an image. After we get the graph, the later design process at-tempts to ﬁnd an optimal GNN model on this speciﬁc graph.

首先，我们要找出应用程序中的图结构。通常有两种场景:结构性场景和非结构性场景。在结构化场景中，图的结构在应用中是明确的，如分子、物理系统、知识图等应用。在非结构化场景中，图是隐式的，因此我们必须首先从任务构建图，例如为文本构建一个完全连接的“单词”图或为图像构建一个场景图。在我们得到图之后，后面的设计过程试图在这个特定的图上找到一个最优的GNN模型。

2.2、Specify graph type and scale指定图形类型和比例

After we get the graph in the application, we then have to ﬁnd out the graph type and its scale.

Graphs with complex types could provide more information on nodes and their connections. Graphs are usually categorized as:

在应用程序中获得图之后，我们必须找出图的类型及其比例。

具有复杂类型的图可以提供关于节点及其连接的更多信息。图通常分为:

Note these categories are orthogonal, which means these types can be combined, e.g. one can deal with a dynamic directed heterogeneous graph. There are also several other graph types designed for different tasks such as hypergraphs and signed graphs. We will not enumerate all types here but the most important idea is to consider the additional in-formation provided by these graphs. Once we specify the graph type, the additional information provided by these graph types should be further considered in the design process.

As for the graph scale, there is no clear classiﬁcation criterion for “small” and “large” graphs. The criterion is still changing with the development of computation devices (e.g. the speed and memory of GPUs). In this paper, when the adjacency matrix or the graph Laplacian of a graph (the space complexity is Oðn2Þ) cannot be stored and processed by the device, then we regard the graph as a large-scale graph and then some sampling methods should be considered.

注意，这些类别是正交的，这意味着这些类型可以组合，例如，可以处理动态有向异构图。还有一些其他的图类型是为不同的任务设计的，比如超图和有符号图。这里我们不会列举所有类型，但最重要的想法是考虑这些图提供的附加信息。一旦我们指定了图形类型，在设计过程中就应该进一步考虑这些图形类型所提供的附加信息。

至于图的尺度，“小”图和“大”图并没有明确的分类标准。该标准仍在随着计算设备的发展而变化(例如gpu的速度和内存)。在本文中，当一个图(空间复杂度为Oðn2Þ)的邻接矩阵或图拉普拉斯数不能被设备存储和处理时，我们将图视为一个大尺度图，这时需要考虑一些采样方法。

Table 1 Notations used in this paper.表1本文使用的符号。

2.3、Design loss function设计损失函数

In this step we should design the loss function based on our task type and the training setting.

For graph learning tasks, there are usually three kinds of tasks:

在这一步中，我们应该根据我们的任务类型和训练设置来设计损失函数。

对于图学习任务，通常有三种任务:

From the perspective of supervision, we can also categorize graph learning tasks into three different training settings:

从监督的角度，我们还可以将图学习任务分为三种不同的训练设置:

With the task type and the training setting, we can design a speciﬁc loss function for the task. For example, for a node-level semi-supervised classiﬁcation task, the cross-entropy loss can be used for the labeled nodes in the training set.

通过任务类型和训练设置，我们可以为任务设计特定的损失函数。例如，对于节点级半监督分类任务，交叉熵损失可用于训练集中的标记节点。

2.4、Build model using computational modules使用计算模块构建模型

Finally, we can start building the model using the computational modules. Some commonly used computational modules are:

�� Propagation Module. The propagation module is used to propagate information between nodes so that the aggregated information could capture both feature and topological information. In propagation modules, the convolution operator and recurrent operator are usually used to aggregate information from neighbors while the skip connection operation is used to gather information from historical representations of nodes and mitigate the over-smoothing problem.

�� Sampling Module. When graphs are large, sampling modules are usually needed to conduct propagation on graphs. The sampling module is usually combined with the propagation module.

�� Pooling Module. When we need the representations of high-level subgraphs or graphs, pooling modules are needed to extract infor-mation from nodes.

最后，我们可以开始使用计算模块构建模型。一些常用的计算模块有:

��传播模块。传播模块用于在节点之间传播信息，以便聚合的信息可以捕获特征和拓扑信息。在传播模块中，卷积算子和循环算子通常用于聚集来自邻居的信息，而跳过连接操作用于从节点的历史表示中收集信息，并缓解过平滑问题。

��采样模块。当图很大时，通常需要采样模块在图上进行传播。采样模块通常与传播模块结合在一起。

��池化模块。当我们需要高级子图或图的表示时，需要池化模块从节点中提取信息。

With these computation modules, a typical GNN model is usually built by combining them. A typical architecture of the GNN model is illustrated in the middle part of Fig. 2 where the convolutional operator, recurrent operator, sampling module and skip connection are used to propagate information in each layer and then the pooling module is added to extract high-level information. These layers are usually stacked to obtain better representations. Note this architecture can generalize most GNN models while there are also exceptions, for example, NDCN (Zang and Wang, 2020) combines ordinary differential equation systems (ODEs) and GNNs. It can be regarded as a continuous-time GNN model which integrates GNN layers over continuous time without propagating through a discrete number of layers.

An illustration of the general design pipeline is shown in Fig. 2. In later sections, we ﬁrst give the existing instantiations of computational modules in Section 3, then introduce existing variants which consider different graph types and scale in Section 4. Then we survey on variants designed for different training settings in Section 5. These sections correspond to details of step (4), step (2), and step (3) in the pipeline. And ﬁnally, we give a concrete design example in Section 6.

利用这些计算模块，通常将它们组合起来构建典型的GNN模型。GNN模型的典型架构如图2的中间部分所示，使用卷积算子、循环算子、采样模块和跳过连接在每一层传播信息，然后加入池化模块提取高层信息。这些层通常被堆叠以获得更好的表示。注意，这种架构可以推广大多数GNN模型，但也有例外，例如NDCN (Zang and Wang, 2020)结合了常微分方程系统(ode)和GNN。它可以看作是一个连续时间的GNN模型，它在连续时间内集成了GNN层，而不通过离散数量的层进行传播。

图2所示为总体设计管道示意图。在后面的部分中，我们首先在第3节中给出计算模块的现有实例化，然后在第4节中介绍考虑不同图类型和缩放的现有变体。然后，我们在第5节中调查了为不同训练设置设计的变体。这些部分对应于管道中的步骤(4)、步骤(2)和步骤(3)的详细信息。最后，在第6节给出了一个具体的设计实例。

3、Instantiations of computational modules计算模块的实例化

In this section we introduce existing instantiations of three compu-tational modules: propagation modules, sampling modules and pooling modules. We introduce three sub-components of propagation modules: convolution operator, recurrent operator and skip connection in Section 3.1, 3.2, and 3.3 respectively. Then we introduce sampling modules and pooling modules in Section 3.4 and 3.5. An overview of computational modules is shown in Fig. 3.

在本节中，我们将介绍三个计算模块的现有实例:传播模块、采样模块和池化模块。我们分别在3.1节、3.2节和3.3节介绍了传播模块的三个子组件:卷积算子、递归算子和跳过连接。然后在3.4节和3.5节介绍采样模块和池化模块。计算模块的概述如图3所示。

3.1. Propagation modules - convolution operator传播模块-卷积运算符

Convolution operators that we introduce in this section are the mostly used propagation operators for GNN models. The main idea of convolu-tion operators is to generalize convolutions from other domain to the graph domain. Advances in this direction are often categorized as spec-tral approaches and spatial approaches.

我们在本节中介绍的卷积算子是GNN模型中最常用的传播算子。卷积算子的主要思想是将卷积从其他域推广到图域。在这个方向上的进展通常分为光谱方法和空间方法。

3.1.1. Spectral approaches谱方法

Spectral approaches work with a spectral representation of the graphs. These methods are theoretically based on graph signal processing (Shuman et al., 2013) and deﬁne the convolution operator in the spectral domain.

In spectral methods, a graph signal x is ﬁrstly transformed to the spectral domain by the graph Fourier transform F , then the convolution operation is conducted. After the convolution, the resulted signal is transformed back using the inverse graph Fourier transform F �� 1. These transforms are deﬁned as:

Here U is the matrix of eigenvectors of the normalized graph Laplacian L ¼ IN �� D�� AD�� (D is the degree matrix and A is the adjacency matrix of the graph). The normalized graph Laplacian is real symmetric positive semideﬁnite, so it can be factorized as L ¼ UΛUT (where Λ is a diagonal matrix of the eigenvalues). Based on the convolution theorem (Mallat, 1999), the convolution operation is deﬁned as:

光谱方法使用图形的光谱表示。这些方法理论上基于图信号处理(Shuman et al.， 2013)，并在谱域定义卷积算子。

在谱方法中，首先将图信号x通过图傅里叶变换F变换到谱域，然后进行卷积运算。卷积后，得到的信号用傅里叶逆变换F��1变换回来。这些转换被定义为:

这里U是归一化图拉普拉斯L¼IN��D��AD��的特征向量矩阵(D是度矩阵，A是图的邻接矩阵)。归一化图拉普拉斯是实对称正半定的，因此可以分解为L¼UΛUT(其中Λ是特征值的对角线矩阵)。基于卷积定理(Mallat, 1999)，卷积运算定义为:

using a learnable diagonal matrix gw, then we have the basic function of the spectral methods:

Next we introduce several typical spectral methods which design different ﬁlters gw.

Spectral Network. Spectral network (Bruna et al., 2014) uses a learn-able diagonal matrix as the ﬁlter, that is gw ¼ diagðw Þ, where w 2 RN is the parameter. However, this operation is computationally inefﬁcient and the ﬁlter is non-spatially localized. Henaff et al. (2015) attempt to make the spectral ﬁlters spatially localized by introducing a parameter-ization with smooth coefﬁcients.

ChebNet. Hammond et al. (2011) suggest that gw can be approximated by a truncated expansion in terms of Chebyshev polynomials TkðxÞ up to Kth order. Defferrard et al. (2016) propose the ChebNet based on this theory. Thus the operation can be written as:

使用一个可学习的对角矩阵gw，那么我们就有了谱方法的基本函数:

接下来介绍了几种设计不同滤波器的典型光谱方法。

光谱的网络。光谱网络(Bruna et al.， 2014)使用一个可学习的对角矩阵作为过滤器，即gw¼diagðw Þ，其中w2rn为参数。然而，这种操作计算效率低，而且过滤器是非空间局部化的。Henaff等人(2015)试图通过引入平滑系数的参数化使光谱滤波器在空间上本地化。

ChebNet。Hammond等人(2011)认为gw可以用切比雪夫多项式TkðxÞ到k阶的截断展开来近似。Defferrard等人(2016)基于这一理论提出了ChebNet。因此，该操作可以写成:

AGCN. All of these models use the original graph structure to denote relations between nodes. However, there may have implicit relations between different nodes. The Adaptive Graph Convolution Network (AGCN) is proposed to learn the underlying relations (Li et al., 2018a). AGCN learns a “residual” graph Laplacian and add it to the original Laplacian matrix. As a result, it is proven to be effective in several graph-structured datasets.

DGCN. The dual graph convolutional network (DGCN) (Zhuang and Ma, 2018) is proposed to jointly consider the local consistency and global consistency on graphs. It uses two convolutional networks to capture the local and global consistency and adopts an unsupervised loss to ensemble them. The ﬁrst convolutional network is the same as Eq. (7), and the second network replaces the adjacency matrix with positive pointwise mutual information (PPMI) matrix:

where AP is the PPMI matrix and DP is the diagonal degree matrix of AP.

AGCN。所有这些模型都使用原始的图结构来表示节点之间的关系。但是，不同节点之间可能存在隐式关系。提出了自适应图卷积网络(AGCN)来学习底层关系(Li等人，2018a)。AGCN学习一个“残差”拉普拉斯图，并将其添加到原始拉普拉斯矩阵中。结果证明，它在一些图结构数据集中是有效的。

DGCN。双图卷积网络(DGCN) (Zhuang and Ma, 2018)被提出来共同考虑图上的局部一致性和全局一致性。它使用两个卷积网络来捕获局部和全局一致性，并采用无监督损失对它们进行集成。第一个卷积网络与式(7)相同，第二个网络用正点互信息(PPMI)矩阵替换邻接矩阵:

其中AP为PPMI矩阵，DP为AP的对角度矩阵。

GWNN. Graph wavelet neural network (GWNN) (Xu et al., 2019a) uses the graph wavelet transform to replace the graph Fourier transform. It has several advantages: (1) graph wavelets can be fastly obtained without matrix decomposition; (2) graph wavelets are sparse and local-ized thus the results are better and more explainable. GWNN outperforms several spectral methods on the semi-supervised node classiﬁcation task.

AGCN and DGCN try to improve spectral methods from the perspec-tive of augmenting graph Laplacian while GWNN replaces the Fourier transform. In conclusion, spectral approaches are well theoretically based and there are also several theoretical analyses proposed recently (see Section 7.1.1). However, in almost all of the spectral approaches mentioned above, the learned ﬁlters depend on graph structure. That is to say, the ﬁlters cannot be applied to a graph with a different structure and those models can only be applied under the “transductive” setting of graph tasks.

GWNN。图小波神经网络(GWNN) (Xu et al.， 2019a)使用图小波变换来代替图傅里叶变换。它的优点是:(1)无需矩阵分解即可快速得到图小波;(2)图小波具有稀疏性和局域性，因此结果更好，更可解释。在半监督节点分类任务中，GWNN优于几种光谱方法。

AGCN和DGCN从增强图拉普拉斯的角度对谱方法进行了改进，GWNN代替了傅里叶变换。综上所述，光谱方法具有良好的理论基础，最近也提出了一些理论分析(见章节7.1.1)。然而，在上面提到的几乎所有的光谱方法中，学习滤波器依赖于图结构。也就是说，过滤器不能应用于具有不同结构的图，这些模型只能应用于图任务的“传导”设置下。

Fig. 2. The general design pipeline for a GNN model.

Fig. 3. An overview of computational modules.

3.1.2. Basic spatial approaches基本空间方法

Spatial approaches deﬁne convolutions directly on the graph based on the graph topology. The major challenge of spatial approaches is deﬁning the convolution operation with differently sized neighborhoods and maintaining the local invariance of CNNs.

Neural FPs. Neural FPs (Duvenaud et al., 2015) uses different weight matrices for nodes with different degrees:

空间方法基于图拓扑直接在图上定义卷积。空间方法的主要挑战是定义具有不同大小邻域的卷积运算和保持cnn的局部不变性。

神经FPs。神经FPs (Duvenaud et al.， 2015)对不同程度的节点使用不同的权重矩阵:

DCNN. The diffusion convolutional neural network (DCNN) (Atwood and Towsley, 2016) uses transition matrices to deﬁne the neighborhood for nodes. For node classiﬁcation, the diffusion representations of each node in the graph can be expressed as:

where X 2 RN��F is the matrix of input features (F is the dimensio2 n).KP* is an N �� K �� N tensor which contains the power series {P; P , …, P } of matrix P. And P is the degree-normalized transition matrix from the graphs adjacency matrix A. Each entity is transformed to a diffusion convolutional representation which is a K �� F matrix deﬁned by K hops of graph diffusion over F features. And then it will be deﬁned by a K �� F weight matrix and a non-linear activation function f.

DCNN。扩散卷积神经网络(DCNN) (Atwood和Towsley, 2016)使用转换矩阵来定义节点的邻域。对于节点分类，图中每个节点的扩散表示可以表示为:

其中x2 RN��F是输入特征的矩阵(F是维度sio2 n)。kp *是一个n��K��n张量，它包含幂级数{P;P，…，P}是图邻接矩阵a的度归一化转换矩阵，每个实体被转换为扩散卷积表示，即由F个特征上的图扩散K跳定义的K��F矩阵。然后它将被定义为K��F权重矩阵和一个非线性激活函数F。

PATCHY-SAN. The PATCHY-SAN model (Niepert et al., 2016) ex-tracts and normalizes a neighborhood of exactly k nodes for each node. The normalized neighborhood serves as the receptive ﬁeld in the tradi-tional convolutional operation.

LGCN. The learnable graph convolutional network (LGCN) (Gao et al., 2018a) also exploits CNNs as aggregators. It performs max pooling on neighborhood matrices of nodes to get top-k feature elements and then applies 1-D CNN to compute hidden representations.

GraphSAGE. GraphSAGE (Hamilton et al., 2017a) is a general inductive framework which generates embeddings by sampling and aggregating features from a node’s local neighborhood:

Instead of using the full neighbor set, GraphSAGE uniformly samples a ﬁxed-size set of neighbors to aggregate information. AGGtþ 1 is the ag-gregation function and GraphSAGE suggests three aggregators: mean aggregator, LSTM aggregator, and pooling aggregator. GraphSAGE with a mean aggregator can be regarded as an inductive version of GCN while the LSTM aggregator is not permutation invariant, which requires a speciﬁed order of the nodes.

PATCHY-SAN。patch - san模型(Niepert等人，2016)提取并规范化每个节点的k个节点的邻域。在传统的卷积运算中，归一化邻域作为接受域。

LGCN。可学习图卷积网络(LGCN) (Gao等人，2018a)也利用cnn作为聚合器。它对节点的邻域矩阵进行最大池化，得到top-k特征元素，然后应用1-D CNN计算隐藏表示。

GraphSAGE。GraphSAGE (Hamilton et al.， 2017a)是一个通用的归纳框架，通过从节点的本地邻域中采样和聚合特征来生成嵌入:

GraphSAGE没有使用完整的邻居集，而是统一地对固定大小的邻居集进行采样以聚合信息。AGGtþ 1是聚合-聚合函数，GraphSAGE建议使用三种聚合器:平均聚合器、LSTM聚合器和池化聚合器。具有平均聚合器的GraphSAGE可以被视为GCN的归纳版本，而LSTM聚合器不是排列不变的，它需要指定的节点顺序。

3.1.3. Attention-based spatial approaches基于注意力的空间方法

The attention mechanism has been successfully used in many sequence-based tasks such as machine translation (Bahdanau et al., 2015; Gehring et al., 2017; Vaswani et al., 2017), machine reading (Cheng et al., 2016) and so on. There are also several models which try to generalize the attention operator on graphs (Velickovic et al., 2018; Zhang et al., 2018c). Compared with the operators we mentioned before, attention-based operators assign different weights for neighbors, so that they could alleviate noises and achieve better results.

GAT. The graph attention network (GAT) (Velickovic et al., 2018) incorporates the attention mechanism into the propagation step. It computes the hidden states of each node by attending to its neighbors, following a self-attention strategy. The hidden state of node v can be ob-tained by:

where W is the weight matrix associated with the linear transformation which is applied to each node, and a is the weight vector of a single-layer MLP.

注意力机制已成功应用于许多基于顺序的任务，如机器翻译(Bahdanau et al.， 2015;Gehring等人，2017;Vaswani等人，2017)，机器阅读(Cheng等人，2016)等等。还有一些模型试图在图上推广注意力算子(Velickovic等人，2018;Zhang等，2018c)。与我们之前提到的算子相比，基于注意的算子为邻居分配了不同的权重，从而可以缓解噪声，获得更好的结果。

手枪。图注意网络(GAT) (Velickovic等人，2018)将注意机制纳入传播步骤。它通过关注它的邻居来计算每个节点的隐藏状态，遵循一种自我关注策略。节点v的隐藏状态可以通过以下方式获得:

其中W是与应用于每个节点的线性变换相关的权重矩阵，a是单层MLP的权重向量。

Moreover, GAT utilizes the multi-head attention used by Vaswani et al.(2017) to stabilize the learning process. It applies K independent atten-tion head matrices to compute the hidden states and then concatenates their features (or computes the average), resulting in the following twoMoreover, GAT utilizes the multi-head attention used by Vaswani et al.(2017) to stabilize the learning process. It applies K independent atten-tion head matrices to compute the hidden states and then concatenates their features (or computes the average), resulting in the following two output representations:

此外，GAT利用Vaswani等人(2017)使用的多头注意力来稳定学习过程。它应用K个独立注意头矩阵来计算隐藏状态，然后将它们的特征连接起来(或计算平均值)，从而得到以下两个。此外，GAT利用Vaswani等人(2017)使用的多头注意来稳定学习过程。它应用K个独立的注意头矩阵来计算隐藏状态，然后将它们的特征连接起来(或计算平均值)，得到以下两个输出表示:

Here αkij is the normalized attention coefﬁcient computed by the k-th attention head. The attention architecture has several properties: (1) the computation of the node-neighbor pairs is parallelizable thus the oper-ation is efﬁcient; (2) it can be applied to graph nodes with different de-grees by specifying arbitrary weights to neighbors; (3) it can be applied to the inductive learning problems easily.

GaAN. The gated attention network (GaAN) (Zhang et al., 2018c) also uses the multi-head attention mechanism. However, it uses a self-attention mechanism to gather information from different heads to replace the average operation of GAT.

其中αkij是由第k个注意头计算的归一化注意系数。该注意结构具有以下几个特点:(1)节点邻居对的计算是可并行的，因此操作是高效的;(2)通过对邻居指定任意权值，可以应用于不同度的图节点;(3)该方法易于应用于归纳学习问题。

GaAN. 门控注意网络(GaAN) (Zhang et al.， 2018c)也使用多头注意机制。但是，它采用自注意机制从不同的头部收集信息，以取代GAT的平均运算。

3.1.4. General frameworks for spatial approaches空间方法的一般框架

Apart from different variants of spatial approaches, several general frameworks are proposed aiming to integrate different models into one single framework. Monti et al. (2017) propose the mixture model network (MoNet), which is a general spatial framework for several methods deﬁned on graphs or manifolds. Gilmer et al. (2017) propose the message passing neural network (MPNN), which uses message passing functions to unify several variants. Wang et al. (2018a) propose the non-local neural network (NLNN) which uniﬁes several “self--attention”-style methods (Hoshen, 2017; Vaswani et al., 2017; Velickovic et al., 2018). Battaglia et al. (2018) propose the graph network (GN). It deﬁnes a more general framework for learning node-level, edge-level and graph-level representations.

MoNet. Mixture model network (MoNet) (Monti et al., 2017)is a spatial framework that try to uniﬁes models for non-euclidean do-mains, including CNNs for manifold and GNNs. The Geodesic CNN (GCNN) (Masci et al., 2015) and Anisotropic CNN (ACNN) (Boscaini et al., 2016) on manifolds or GCN (Kipf and Welling, 2017) and DCNN (Atwood and Towsley, 2016) on graphs can be formulated as partic-ular instances of MoNet. In MoNet, each point on a manifold or each vertex on a graph, denoted by v, is regarded as the origin of a pseudo-coordinate system. The neighbors u ∈.N, are associated withpseudo-coordinates u(v, u). Given two functions f,g defined on thevertices of a graph (or points on a manifold), the convolution operatorin MoNet is defined as:

Here w1(u),..., wj(u) are the functions assigning weights for neighborsaccording to their pseudo-coordinates.Thus the D; (v)f is the aggregatedvalues of the neighbors' functions. By defining different u and w, MoNetcan instantiate several methods. For GCN, the function f,g map the nodesto their features, the pseudo-coordinate for (v,u) is u(v,u)=(v,u),J= 1 and w1(u(v,u))= 1. In MoNet's own model, the parameters ware learnable.

除了空间方法的不同变体，还提出了几个通用框架，旨在将不同的模型集成到一个单一的框架中。Monti等人(2017)提出了混合模型网络(MoNet)，这是定义在图或流形上的几种方法的通用空间框架。Gilmer等人(2017)提出了消息传递神经网络(MPNN)，它使用消息传递函数来统一几个变体。Wang等人(2018a)提出了非局部神经网络(NLNN)，它统一了几种“自我关注”风格的方法(Hoshen, 2017;Vaswani等人，2017;Velickovic等人，2018)。Battaglia等人(2018)提出了图网络(GN)。它为学习节点级、边级和图级表示定义了一个更通用的框架。

莫奈。混合模型网络(MoNet) (Monti et al.， 2017)是一个空间框架，试图统一非欧几里得do- maines的模型，包括流形的cnn和GNN。流形上的测地CNN (GCNN) (Masci et al.， 2015)和各向异性CNN (ACNN) (Boscaini et al.， 2016)或图上的GCN (Kipf and Welling, 2017)和DCNN (Atwood and Towsley, 2016)可以表述为MoNet的特定实例。在MoNet中，流形上的每个点或图上的每个顶点，用v表示为伪坐标系的原点。邻居u∈。N，与伪坐标u(v, u)相关。给定两个函数f,g定义在图的顶点(或流形上的点)上，MoNet中的卷积算子定义为:

这里w1 (u),……， wj(u)是根据邻居的伪坐标为邻居分配权重的函数。因此D;(v)f为相邻函数的聚合值。通过定义不同的u和w, MoNetcan实例化了几个方法。对于GCN，函数f,g将节点映射到它们的特征，(v,u)的伪坐标为u(v,u)=(v,u)，J= 1, w1(u(v,u))= 1。在莫奈自己的模型中，参数是可学习的。

MPNN. The message passing neural network (MPNN) (Gilmer et al., 2017) extracts the general characteristics among several classic models. The model contains two phases: a message passing phase and a readout phase. In the message passing phase, the model ﬁrst uses the message function Mt to aggregate the “message” mtv from neighbors and then uses the update function Ut to update the hidden state htv:

Here evu represents features of undirected edge ðv;uÞ. The readout phase computes a feature vector of the whole graph using the readout function R:

where T denotes the total time steps. The message function Mt , vertex update function Ut and readout function R may have different settings. Hence the MPNN framework could instantiate several different models via different function settings. Speciﬁc settings for different models could be found in (Gilmer et al., 2017).

MPNN。消息传递神经网络(MPNN) (Gilmer et al.， 2017)提取了几种经典模型的一般特征。该模型包含两个阶段:消息传递阶段和读取阶段。在消息传递阶段，模型首先使用消息函数Mt聚合来自邻居的“消息”mtv，然后使用更新函数Ut更新隐藏状态htv:

这里evu代表无向边ðv;uÞ的特征。读出阶段使用读出函数R计算整个图的特征向量:

其中T为总时间步长。消息函数Mt、顶点更新函数Ut和读出函数R可以有不同的设置。因此MPNN框架可以通过不同的函数设置实例化几个不同的模型。不同模型的具体设置可以在(Gilmer et al.， 2017)中找到。

NLNN. The non-local neural network (NLNN) generalizes and extends the classic non-local mean operation (Buades et al., 2005) in computer vision. The non-local operation computes the hidden state at a position as a weighted sum of features at all possible positions. The potential posi-tions can be in space, time or spacetime. Thus the NLNN can be viewed as a uniﬁcation of different “self-attention”-style methods (Hoshen, 2017; Vaswani et al., 2017; Velickovic et al., 2018).

Following the non-local mean operation (Buades et al., 2005), the generic non-local operation is deﬁned as

where u is the index of all possible positions for position v, f ðhtv; htuÞ computes a scalar between v and u representing the relation between them, gðhtuÞ denotes a transformation of the input htu and C ðht Þ is a normalization factor. Different variants of NLNN can be deﬁned by different f and g settings and more details can be found in the original paper (Buades et al., 2005).

NLNN。非局部神经网络(NLNN)在计算机视觉中推广和扩展了经典的非局部平均运算(Buades et al.， 2005)。非局部运算将某个位置的隐藏状态计算为所有可能位置特征的加权和。可能的位置可以是空间，时间或时空。因此，NLNN可以被视为不同“自我关注”风格方法的统一(Hoshen, 2017;Vaswani等人，2017;Velickovic等人，2018)。

继非局部均值运算(Buades et al.， 2005)之后，一般的非局部运算被定义为

其中u是位置v的所有可能位置的指数，f ðhtv;htuÞ计算v和u之间的标量，表示它们之间的关系，gðhtuÞ表示输入htu的转换，C ðht Þ是一个归一化因子。不同的NLNN变体可以通过不同的f和g设置来定义，更多细节可以在原始论文中找到(Buades et al.， 2005)。

Graph Network. The graph network (GN) (Battaglia et al., 2018) is a more general framework compared to others by learning node-level, edge-level and graph level representations. It can unify many variants like MPNN, NLNN, Interaction Networks (Battaglia et al., 2016; Watters et al., 2017), Neural Physics Engine (Chang et al., 2017), CommNet (Sukhbaatar Ferguset al., 2016), structure2vec (Dai et al., 2016; Khalil et al., 2017), GGNN (Li et al., 2016), Relation Network (Raposo et al.,2017; Santoro et al., 2017), Deep Sets (Zaheer et al., 2017), Point Net (Qi et al., 2017a) and so on.

图网络。图网络(GN) (Battaglia等人，2018)通过学习节点级、边级和图级表示，与其他框架相比，是一个更通用的框架。它可以统一许多变体，如MPNN, NLNN，交互网络(Battaglia等人，2016;Watters等人，2017)，神经物理引擎(Chang等人，2017)，CommNet (Sukhbaatar Ferguset al.， 2016)， structure2vec (Dai等人，2016;Khalil等人，2017)，GGNN (Li等人，2016)，关系网络(Raposo等人，2017;Santoro等人，2017)，深度集(Zaheer等人，2017)，点网(Qi等人，2017a)等。

The core computation unit of GN is called the GN block. A GN block deﬁnes three update functions and three aggregation functions:

Here r. is the receiver node and sx is the sender node of edge k.E+1 andH+1 are the matrices of stacked edge vectors and node vectors at timestep t+1, respectively. E{+ 1 collects edge vectors with receiver node v.uis the global attribute for graph representation. The (p and p functions canhave various settings and the p functions must be invariant to input or-ders and should take variable lengths of arguments.

GN的核心计算单元称为GN块。一个GN块定义了三个更新函数和三个聚合函数:

其中r为边k的接收节点，sx为边k的发送节点。e +1和h +1分别为时刻t+1的边缘向量和节点向量的堆叠矩阵。E{+ 1收集边缘向量与接收节点v.ui是全局属性的图形表示。(p和p函数可以有各种设置，p函数必须对输入的or-der是不变的，并且参数的长度应该是可变的。

3.2. Propagation modules - recurrent operator传播模块——循环运算符

Recurrent methods are pioneers in this research line. The major dif-ference between recurrent operators and convolution operators is that layers in convolution operators use different weights while layers in recurrent operators share same weights. Early methods based on recur-sive neural networks focus on dealing with directed acyclic graphs (Sperduti and Starita, 1997; Frasconi et al., 1998; Micheli et al., 2004; Hammer et al., 2004). Later, the concept of graph neural network (GNN) was ﬁrst proposed in (Scarselli et al., 2009; Gori et al., 2005), which extended existing neural networks to process more graph types. We name the model as GNN in this paper to distinguish it with the general name. We ﬁrst introduce GNN and its later variants which require convergence of the hidden states and then we talk about methods based on the gate mechanism.

循环方法是这一研究领域的先驱。循环算子和卷积算子之间的主要区别是卷积算子中的层使用不同的权值，而循环算子中的层共享相同的权值。早期基于递归神经网络的方法专注于处理有向无环图(Sperduti和Starita, 1997;Frasconi等人，1998;Micheli等人，2004年;Hammer et al.， 2004)。后来，图神经网络(GNN)的概念首次在(Scarselli et al.， 2009;Gori等人，2005)，它扩展了现有的神经网络来处理更多的图类型。本文将该模型命名为GNN，以区别于通用名称。我们首先介绍了GNN及其以后需要隐态收敛的变体，然后讨论了基于门机制的方法。

3.2.1. Convergence-based methods基于收敛的方法

In a graph, each node is naturally deﬁned by its features and the related nodes. The target of GNN is to learn a state embedding hv 2 Rs which contains the information of the neighborhood and itself for each node. The state embedding hv is an s-dimension vector of node v and can be used to produce an output ov such as the distribution of the predicted node label. Then the computation steps of hv and ov are deﬁned as:

where xv; xco½v��; hN v ; xN v are the features of v, the features of its edges, the states and the features of the nodes in the neighborhood of v, respec-tively. f here is a parametric function called the local transition function. It is shared among all nodes and updates the node state according to the input neighborhood. g is the local output function that describes how the output is produced. Note that both f and g can be interpreted as the feedforward neural networks.

Let H, O, X, and XN be the matrices constructed by stacking all the states, all the outputs, all the features, and all the node features, respectively. Then we have a compact form as:

where F, the global transition function, and G, the global output function are stacked versions of f and g for all nodes in a graph, respectively. The value of H is the ﬁxed point of Eq. (20) and is uniquely deﬁned with the assumption that F is a contraction map.

在图中，每个节点都是由其特征和相关节点自然定义的。GNN的目标是学习每个节点的状态嵌入hv2rs，其中包含了邻居和自身的信息。状态嵌入hv是节点v的s维向量，可以用来产生一个输出ov，例如预测节点标签的分布。定义hv和ov的计算步骤为:

十五;xco½v��;hN v;xN v分别是v的特征，v的边的特征，v的状态和v的邻域节点的特征。F是一个参数函数，叫做局部转换函数。它在所有节点之间共享，并根据输入邻域更新节点状态。G是局部输出函数，它描述了输出是如何产生的。注意，f和g都可以解释为前馈神经网络。

设H、O、X和XN分别是由所有状态、所有输出、所有特征和所有节点特征叠加而成的矩阵。那么我们有一个紧凑的形式为:

其中F是全局转移函数，G是全局输出函数，分别是图中所有节点的F和G的堆叠版本。H的值是Eq.(20)的不动点，是唯一定义的，假设F是一个收缩映射。

With the suggestion of Banach’s ﬁxed point theorem (Khamsi and Kirk, 2011), GNN uses the following classic iterative scheme to compute the state:

where Ht denotes the t-th iteration of H. The dynamical system Eq. (21) converges exponentially fast to the solution for any initial value.

Though experimental results have shown that GNN is a powerful architecture for modeling structural data, there are still several limitations:

�� GNN requires f to be a contraction map which limits the model’s ability. And it is inefﬁcient to update the hidden states of nodes

�� It is unsuitable to use the ﬁxed points if we focus on the representa-tion of nodes instead of graphs because the distribution of represen-tation in the ﬁxed point will be much smoother in value and less informative for distinguishing each node.

根据Banach不动点定理(Khamsi and Kirk, 2011)的建议，GNN使用以下经典迭代方案来计算状态:

其中Ht表示h的第t次迭代。动力系统式(21)以指数速度收敛到任意初值的解。

尽管实验结果表明，GNN是一种强大的结构数据建模架构，但仍然存在一些局限性:

��GNN要求f是一个收缩映射，这限制了模型的能力。对节点隐藏状态的更新效率较低

��如果我们关注的是节点的表示而不是图，那么使用不动点是不合适的，因为在不动点上的表示分布在值上要平滑得多，并且对于区分每个节点来说信息量较少。

GraphESN. Graph echo state network (GraphESN) (Gallicchio and Micheli, 2010) generalizes the echo state network (ESN) (Jaeger, 2001) on graphs. It uses a ﬁxed contractive encoding function, and only trains a readout function. The convergence is ensured by the contractivity of reservoir dynamics. As a consequence, GraphESN is more efﬁcient than GNN.

SSE. Stochastic Steady-state Embedding (SSE) (Dai et al., 2018a) is also proposed to improve the efﬁciency of GNN. SSE proposes a learning framework which contains two steps. Embeddings of each node are updated by a parameterized operator in the update step and these em-beddings are projected to the steady state constraint space to meet the steady-state conditions.

LP-GNN. Lagrangian Propagation GNN (LP-GNN) (Tiezzi et al., 2020) formalizes the learning task as a constraint optimization problem in the Lagrangian framework and avoids the iterative computations for the ﬁxed point. The convergence procedure is implicitly expressed by a constraint satisfaction mechanism.

GraphESN。图回声状态网络(GraphESN) (Gallicchio and Micheli, 2010)在图上推广了回声状态网络(ESN) (Jaeger, 2001)。它使用一个固定的压缩编码函数，并且只训练一个读出函数。储层动力学的收缩性保证了收敛性。因此，GraphESN比GNN更有效率。

上交所。随机稳态嵌入(SSE) (Dai et al.， 2018a)也被提出来提高GNN的效率。SSE提出了一个包含两个步骤的学习框架。在更新步骤中，通过参数化算子更新每个节点的嵌入，并将这些嵌入投影到稳态约束空间以满足稳态条件。

LP-GNN。拉格朗日传播GNN (LP-GNN) (Tiezzi et al.， 2020)将学习任务形式化为拉格朗日框架中的约束优化问题，并避免了对固定点的迭代计算。收敛过程由约束满足机制隐式表示。

3.2.2. Gate-based methods基于门的方法

There are several works attempting to use the gate mechanism like GRU (Cho et al., 2014) or LSTM (Hochreiter and Schmidhuber, 1997) in the propagation step to diminish the computational limitations in GNN and improve the long-term propagation of information across the graph structure. They run a ﬁxed number of training steps without the guar-antee of convergence.

GGNN. The gated graph neural network (GGNN) (Li et al., 2016) is proposed to release the limitations of GNN. It releases the requirement of function f to be a contraction map and uses the Gate Recurrent Units (GRU) in the propagation step. It also uses back-propagation through time (BPTT) to compute gradients. The computation step of GGNN can be found in Table 2.

The node v ﬁrst aggregates messages from its neighbors. Then the GRU-like update functions incorporate information from the other nodes and from the previous timestep to update each node’s hidden state. hN v gathers the neighborhood information of node v, while z and r are the update and reset gates.

有几项工作试图在传播步骤中使用栅极机制，如GRU (Cho et al.， 2014)或LSTM (Hochreiter and Schmidhuber, 1997)，以减少GNN的计算限制，并改善信息在图结构中的长期传播。它们在没有收敛保证的情况下运行固定数量的训练步骤。

GGNN。门控图神经网络(GGNN) (Li et al.， 2016)的提出是为了释放GNN的局限性。它释放了函数f为收缩映射的要求，并在传播步骤中使用栅极循环单元(Gate Recurrent Units, GRU)。它还使用时间反向传播(BPTT)来计算梯度。GGNN的计算步骤见表2。

节点v首先聚合来自其邻居的消息。然后，类似gru的更新函数结合来自其他节点和前一个时间步长的信息来更新每个节点的隐藏状态。hN v收集节点v的邻域信息，z和r为更新门和复位门。

LSTMs are also used in a similar way as GRU through the propagation process based on a tree or a graph.

Tree LSTM. Tai et al. (2015) propose two extensions on the tree structure to the basic LSTM architecture: the Child-Sum Tree-LSTM and the N-ary Tree-LSTM. They are also extensions to the recursive neural network based models as we mentioned before. Tree is a special case of graph and each node in Tree-LSTM aggregates information from its children. Instead of a single forget gate in traditional LSTM, the Tree-LSTM unit for node v contains one forget gate fvk for each child k. The computation step of the Child-Sum Tree-LSTM is displayed in Table 2. itv, otv, and ctv are the input gate, output gate and memory cell respectively. xtv is the input vector at time t. The N-ary Tree-LSTM is further designed for a special kind of tree where each node has at most K children and the children are ordered. The equations for computing htiN v ;htfN v k; hNto v ; hNtu v in Table 2 introduce separate parameters for each child k. These parameters allow the model to learn more ﬁne-grained represen-tations conditioning on the states of a unit’s children than the Child-Sum Tree-LSTM.

通过基于树或图的传播过程，lstm也以与GRU类似的方式使用。

树LSTM。Tai等人(2015)提出了对基本LSTM架构的树结构的两个扩展:子和树LSTM和N-ary树LSTM。它们也是我们之前提到的基于递归神经网络模型的扩展。Tree是图的一种特殊情况，Tree- lstm中的每个节点都聚集了其子节点的信息。与传统LSTM中单一的遗忘门不同，节点v的Tree-LSTM单元为每个子k包含一个遗忘门fvk。child - sum Tree-LSTM的计算步骤如表2所示。Itv、otv和CTV分别是输入门、输出门和存储单元。xtv是t时刻的输入向量。N-ary tree lstm进一步设计为一种特殊的树，其中每个节点最多有K个子节点，并且子节点是有序的。htiN v;htfN v k;hNto v;表2中的hNtu v为每个子k引入了单独的参数。这些参数使模型能够比child - sum Tree-LSTM学习更细粒度的表示，以适应单元的子状态。

Graph LSTM. The two types of Tree-LSTMs can be easily adapted to the graph. The graph-structured LSTM in (Zayats and Ostendorf, 2018) is an example of the N-ary Tree-LSTM applied to the graph. However, it is a simpliﬁed version since each node in the graph has at most 2 incoming edges (from its parent and sibling predecessor). Peng et al. (2017) pro-pose another variant of the Graph LSTM based on the relation extraction task. The edges of graphs in (Peng et al., 2017) have various labels so that Peng et al. (2017) utilize different weight matrices to represent different labels. In Table 2, mðv; kÞ denotes the edge label between node v and k.

Liang et al. (2016) propose a Graph LSTM network to address the semantic object parsing task. It uses the conﬁdence-driven scheme to adaptively select the starting node and determine the node updating sequence. It follows the same idea of generalizing the existing LSTMs into the graph-structured data but has a speciﬁc updating sequence while methods mentioned above are agnostic to the order of nodes.

S-LSTM. Zhang et al. (2018d) propose Sentence LSTM (S-LSTM) for improving text encoding. It converts text into a graph and utilizes the Graph LSTM to learn the representation. The S-LSTM shows strong rep-resentation power in many NLP problems.

图LSTM。这两种类型的tree - lstm可以很容易地适应图。(Zayats and Ostendorf, 2018)中的图结构LSTM是应用于图的N-ary树LSTM的一个例子。然而，这是一个简化版本，因为图中的每个节点最多有2条入边(来自它的父和兄弟前辈)。Peng等人(2017)提出了基于关系提取任务的图LSTM的另一种变体。(Peng et al.， 2017)中图的边有不同的标签，因此Peng et al.(2017)利用不同的权重矩阵来表示不同的标签。在表2中，mðv;KÞ表示节点v和k之间的边标签。

Liang等人(2016)提出了一个图LSTM网络来解决语义对象解析任务。采用信心驱动方案自适应地选择起始节点，确定节点更新顺序。它遵循将现有lstm泛化为图结构数据的相同思想，但具有特定的更新顺序，而上面提到的方法与节点的顺序无关。

S-LSTM。Zhang等人(2018d)提出了句子LSTM (S-LSTM)来改进文本编码。它将文本转换为图形，并利用图形LSTM学习表示。S-LSTM在许多自然语言处理问题中表现出很强的表示能力。

3.3. Propagation modules - skip connection传播模块-跳过连接

Many applications unroll or stack the graph neural network layer aiming to achieve better results as more layers (i.e k layers) make each node aggregate more information from neighbors k hops away. However, it has been observed in many experiments that deeper models could not improve the performance and deeper models could even perform worse. This is mainly because more layers could also propagate the noisy in-formation from an exponentially increasing number of expanded neigh-borhood members. It also causes the over smoothing problem because nodes tend to have similar representations after the aggregation opera-tion when models go deeper. So that many methods try to add “skip connections” to make GNN models deeper. In this subsection we intro-duce three kinds of instantiations of skip connections.

Highway GCN. Rahimi et al. (2018) propose a Highway GCN which uses layer-wise gates similar to highway networks (Zilly et al., 2016). The output of a layer is summed with its input with gating weights:

许多应用程序展开或堆叠图神经网络层，目的是获得更好的结果，因为更多的层(即k层)使每个节点聚集更多来自邻居k跳的信息。然而，在许多实验中观察到，更深入的模型并不能提高性能，甚至可能表现得更差。这主要是因为更多的层也可以从指数增长的扩展邻域成员中传播噪声信息。它还会导致过度平滑问题，因为当模型深入时，节点在聚合操作后倾向于具有相似的表示。因此，许多方法都试图添加“跳过连接”来使GNN模型更深入。在本小节中，我们将介绍三种跳过连接的实例化。

公路之下。Rahimi等人(2018)提出了一种高速公路GCN，它使用类似于高速公路网络的分层门(Zilly等人，2016)。一个层的输出和它的输入与门控权重相加:

By adding the highway gates, the performance peaks at 4 layers in a speciﬁc problem discussed in (Rahimi et al., 2018). The column network (CLN) (Pham et al., 2017) also utilizes the highway network. But it has different functions to compute the gating weights.

JKN. Xu et al. (2018) study properties and limitations of neighbor-hood aggregation schemes. They propose the jump knowledge network (JKN) which could learn adaptive and structure-aware representations. JKN selects from all of the intermediate representations (which “jump” to the last layer) for each node at the last layer, which makes the model adapt the effective neighborhood size for each node as needed. Xu et al.(2018) use three approaches of concatenation, max-pooling and LSTM-attention in the experiments to aggregate information. The JKN performs well on the experiments in social, bioinformatics and citation networks. It can also be combined with models like GCN, GraphSAGE and GAT to improve their performance.

通过添加高速公路闸门，在(Rahimi等人，2018)中讨论的特定问题中，性能在4层达到峰值。柱状网络(CLN) (Pham et al.， 2017)也利用了高速公路网络。但是它有不同的功能来计算门控权值。

JKN。Xu等人(2018)研究了邻域聚合方案的性质和局限性。他们提出了跳跃知识网络(JKN)，它可以学习自适应和结构感知的表示。JKN从最后一层的每个节点的所有中间表示(“跳转”到最后一层)中选择，这使得模型根据需要调整每个节点的有效邻域大小。Xu et al.(2018)在实验中使用了拼接、max-pooling和LSTM-attention三种方法来聚合信息。JKN在社会、生物信息学和引文网络的实验中表现良好。它还可以与GCN、GraphSAGE和GAT等模型相结合，以提高它们的性能。

DeepGCNs. Li et al. (2019a) borrow ideas from ResNet (He et al., 2016a, 2016b) and DenseNet (Huang et al., 2017). ResGCN and Den-seGCN are proposed by incorporating residual connections and dense connections to solve the problems of vanishing gradient and over smoothing. In detail, the hidden state of a node in ResGCN and Den-seGCN can be computed as:

The experiments of DeepGCNs are conducted on the point cloud se-mantic segmentation task and the best results are achieved with a 56-layer model.

DeepGCNs。Li等人(2019a)借鉴了ResNet (He等人，2016a, 2016b)和DenseNet (Huang等人，2017)的想法。ResGCN和Den-seGCN结合残差连接和密集连接，解决梯度消失和过平滑问题。其中ResGCN和Den-seGCN中节点的隐藏状态可计算为:

对DeepGCNs在点云语义分割任务上进行了实验，采用56层模型获得了最佳结果。

3.4. Sampling modules采样模块

GNN models aggregate messages for each node from its neighborhood in the previous layer. Intuitively, if we track back multiple GNN layers, the size of supporting neighbors will grow exponentially with the depth. To alleviate this “neighbor explosion” issue, an efﬁcient and efﬁcacious way is sampling. Besides, when we deal with large graphs, we cannot always store and process all neighborhood information for each node, thus the sampling module is needed to conduct the propagation. In this section, we introduce three kinds of graph sampling modules: node sampling, layer sampling, and subgraph sampling.

GNN模型从上一层的邻域聚集每个节点的消息。直观地，如果我们回溯多个GNN层，支持邻居的大小将随着深度呈指数增长。为了缓解这种“邻居爆炸”问题，一个有效的方法就是抽样。此外，当我们处理大型图时，我们不能总是存储和处理每个节点的所有邻域信息，因此需要采样模块来进行传播。在本节中，我们将介绍三种图采样模块:节点采样、层采样和子图采样。

3.4.1. Node sampling节点采样

A straightforward way to reduce the size of neighboring nodes would be selecting a subset from each node’s neighborhood. GraphSAGE (Hamilton et al., 2017a) samples a ﬁxed small number of neighbors, ensuring a 2 to 50 neighborhood size for each node. To reduce sampling variance, Chen et al.(2018a) introduce a control-variate based stochastic approximation algo-rithm for GCN by utilizing the historical activations of nodes as a control variate. This method limits the receptive ﬁeld in the 1-hop neighborhood, and uses the historical hidden state as an affordable approximation.

PinSage (Ying et al., 2018a) proposes importance-based sampling method. By simulating random walks starting from target nodes, this approach chooses the top T nodes with the highest normalized visit counts.

减少相邻节点大小的一种直接方法是从每个节点的邻域中选择一个子集。GraphSAGE (Hamilton et al.， 2017a)对固定数量的邻居进行采样，确保每个节点的邻居大小为2到50个。为了减少采样方差，Chen等人(2018a)通过利用节点的历史激活作为控制变量，引入了一种基于控制变量的GCN随机逼近算法。该方法将接受域限制在1跳附近，并使用历史隐藏状态作为一个可负担的近似。

PinSage (Ying et al.， 2018a)提出了基于重要性的抽样方法。通过模拟从目标节点开始的随机行走，该方法选择规范化访问次数最高的top T节点。

3.4.2. Layer sampling层抽样

Instead of sampling neighbors for each node, layer sampling retains a small set of nodes for aggregation in each layer to control the expansion factor. FastGCN (Chen et al., 2018b) directly samples the receptive ﬁeld for each layer. It uses importance sampling, where the important nodes are more likely to be sampled.

In contrast to ﬁxed sampling methods above, Huang et al. (2018) introduce a parameterized and trainable sampler to perform layer-wise sampling conditioned on the former layer. Furthermore, this adaptive sampler could optimize the sampling importance and reduce variance simultaneously. LADIES (Zou et al., 2019) intends to alleviate the sparsity issue in layer-wise sampling by generating samples from the union of neighbors of the nodes.

层抽样不是对每个节点的邻居进行抽样，而是在每层中保留一小组节点进行聚合，以控制扩展因子。FastGCN (Chen et al.， 2018b)直接对每一层的接受野进行采样。它使用重要性抽样，其中重要节点更有可能被抽样。

与上述固定采样方法相比，Huang等人(2018)引入了一种参数化和可训练的采样器，以前一层为条件进行分层采样。此外，该自适应采样器在优化抽样重要性的同时降低了方差。LADIES (Zou等人，2019)打算通过从节点的邻居并集中生成样本来缓解分层采样中的稀疏性问题。

3.4.3. Subgraph sampling子图抽样

Rather than sampling nodes and edges which builds upon the full graph, a fundamentally different way is to sample multiple subgraphs and restrict the neighborhood search within these subgraphs. Clus-terGCN (Chiang et al., 2019) samples subgraphs by graph clustering al-gorithms, while GraphSAINT (Zeng et al., 2020) directly samples nodes or edges to generate a subgraph.

不同于在完整图的基础上对节点和边进行采样，一种根本不同的方法是对多个子图进行采样，并将邻域搜索限制在这些子图内。cluster - tergcn (Chiang et al.， 2019)通过图聚类算法对子图进行采样，而GraphSAINT (Zeng et al.， 2020)则直接对节点或边进行采样以生成子图。

Fig. 4. An overview of variants considering graph type and scale.

3.5. Pooling modules池模块

In the area of computer vision, a convolutional layer is usually followed by a pooling layer to get more general features. Complicated and large-scale graphs usually carry rich hierarchical structures which are of great impor-tance for node-level and graph-level classiﬁcation tasks. Similar to these pooling layers, a lot of work focuses on designing hierarchical pooling layers on graphs. In this section, we introduce two kinds of pooling modules: direct pooling modules and hierarchical pooling modules.

在计算机视觉领域，卷积层之后通常是池化层，以获得更多的一般特征。复杂且大规模的图通常具有丰富的层次结构，这对于节点级和图级分类任务具有重要意义。与这些池化层类似，很多工作都侧重于在图上设计分层池化层。在本节中，我们将介绍两种池化模块:直接池化模块和分层池化模块。

3.5.1. Direct pooling modules直接池化模块

Direct pooling modules learn graph-level representations directly from nodes with different node selection strategies. These modules are also called readout functions in some variants.

Simple Node Pooling. Simple node pooling methods are used by several models. In these models, node-wise max/mean/sum/attention opera-tions are applied on node features to get a global graph representation.

直接池化模块直接从具有不同节点选择策略的节点学习图级表示。这些模块在某些变体中也称为读出函数。

简单节点池。一些模型使用简单的节点池方法。在这些模型中，在节点特征上应用节点的max/mean/sum/attention操作来获得全局图表示。

Set2set. MPNN uses the Set2set method (Vinyals et al., 2015a) as the readout function to get graph representations. Set2set is designed to deal with the unordered set T ¼ fðhvT ; xvÞg and uses a LSTM-based method to produce an order invariant representation after a prediﬁned number of steps.

SortPooling. SortPooling (Zhang et al., 2018e) ﬁrst sorts the node embeddings according to the structural roles of the nodes and then the sorted embeddings are fed into CNNs to get the representation.

Set2set。MPNN使用Set2set方法(Vinyals et al.， 2015a)作为读出函数来获得图表示。Set2set用于处理无序集T¼fðhvT;xvÞg并使用基于lstm的方法在预定义的步骤数之后生成有序不变表示。

SortPooling。SortPooling (Zhang et al.， 2018e)首先根据节点的结构角色对节点嵌入进行排序，然后将排序后的嵌入输入到cnn中得到表示。

3.5.2. Hierarchical pooling modules分层池模块

The methods mentioned before directly learn graph representations from nodes and they do not investigate the hierarchical property of the graph structure. Next we will talk about methods that follow a hierar-chical pooling pattern and learn graph representations by layers.

Graph Coarsening. Early methods are usually based on graph coarsening algorithms. Spectral clustering algorithms are ﬁrstly used but they are inefﬁcient because of the eigendecomposition step. Graclus (Dhillon et al., 2007) provides a faster way to cluster nodes and it is applied as a pooling module. For example, ChebNet and MoNet use Graclus to merge node pairs and further add additional nodes to make sure the pooling procedure forms a balanced binary tree.

前面提到的方法直接从节点学习图表示，它们不研究图结构的层次结构。接下来，我们将讨论遵循分层池模式的方法，并按层学习图形表示。

图粗化。早期的方法通常基于图粗化算法。首先采用谱聚类算法，但由于特征分解步骤的原因，谱聚类算法效率较低。Graclus (Dhillon et al.， 2007)提供了一种更快的集群节点的方法，它被应用为池化模块。例如，ChebNet和MoNet使用Graclus合并节点对，并进一步添加额外的节点，以确保池化过程形成平衡的二叉树。

ECC. Edge-Conditioned Convolution (ECC) (Simonovsky and Komo-dakis, 2017) designs its pooling module with recursively downsampling operation. The downsampling method is based on splitting the graph into two components by the sign of the largest eigenvector of the Laplacian.

DiffPool. DiffPool (Ying et al., 2018b) uses a learnable hierarchical clustering module by training an assignment matrix St in each layer:

where Ht is the node feature matrix and At is coarsened adjacency matrix of layer t. St denotes the probabilities that a node in layer t can be assigned to a coarser node in layer t þ 1.

ECC。边条件卷积(ECC) (Simonovsky和Komo-dakis, 2017)设计了具有递归下采样操作的池化模块。下采样方法是根据拉普拉斯最大特征向量的符号将图分成两个分量。

DiffPool。DiffPool (Ying等人，2018b)通过在每一层训练赋值矩阵St来使用可学习的层次聚类模块:

其中Ht是节点特征矩阵，At是t层的粗化邻接矩阵。St表示t层中的节点可以分配给t ~ 1层中的粗化节点的概率。

gPool. gPool (Gao and Ji, 2019) uses a project vector to learn pro-jection scores for each node and select nodes with top-k scores. Compared to DiffPool, it uses a vector instead of a matrix at each layer, thus it reduces the storage complexity. But the projection procedure does not consider the graph structure.

EigenPooling. EigenPooling (Ma et al., 2019a) is designed to use the node features and local structure jointly. It uses the local graph Fourier transform to extract subgraph information and suffers from the in-efﬁciency of graph eigendecomposition.

SAGPool. SAGPool (Lee et al., 2019) is also proposed to use features and topology jointly to learn graph representations. It uses a self-attention based method with a reasonable time and space complexity.

gPool。gPool (Gao and Ji, 2019)使用项目向量学习每个节点的投影分数，并选择得分最高的节点。与DiffPool相比，它在每一层使用一个向量而不是一个矩阵，从而降低了存储的复杂性。但是投影过程没有考虑图的结构。

EigenPooling。特征池(Ma等人，2019a)的设计是联合使用节点特征和局部结构。利用局部图傅里叶变换提取子图信息，存在图特征分解效率低的问题。

SAGPool。SAGPool (Lee等人，2019)还提出联合使用特征和拓扑来学习图表示。它采用基于自我注意的方法，具有合理的时间和空间复杂度。

4、Variants considering graph type and scale考虑图形类型和规模的变体

In the above sections, we assume the graph to be the simplest format.However, many graphs in the real world are complex. In this subsection, we will introduce the approaches which attempt to address the chal-lenges of complex graph types. An overview of these variants is shown in Fig. 4.

在上面的部分中，我们假设图表是最简单的格式。然而，现实世界中的许多图是复杂的。在本小节中，我们将介绍尝试解决复杂图类型挑战的方法。这些变体的概述如图4所示。

4.1. Directed graphs指示图

The ﬁrst type is the directed graphs. Directed edges usually contain more information than undirected edges. For example, in a knowledge graph where a head entity is the parent class of a tail entity, the edge direction offers information about the partial order. Instead of simply adopting an asymmetric adjacency matrix in the convolution operator, we can model the forward and reverse directions of an edge differently. DGP (Kampffmeyer et al., 2019) uses two kinds of weight matrices Wp and Wc for the convolution in forward and reverse directions.

4.2. Heterogeneous graphs异构图形

The second variant of graphs is heterogeneous graphs, where the nodes and edges are multi-typed or multi-modal. More speciﬁcally, in a heterogeneous graph fV; E; φ; ψg, each node vi is associated with a type φðviÞ and each edge ej with a type ψðejÞ.

图的第二种变体是异构图，其中的节点和边是多类型或多模态的。更具体地说，在异质图fV;E;φ;ψg中，各节点vi与类型φðviÞ相关联，各边ej与类型ψðejÞ相关联。

4.2.1. Meta-path-based methods基于元路径的方法

Most approaches toward this graph type utilize the concept of meta-path. Meta-path is a path scheme which determines the type of node in each position of the path, e.g. φ1→1 φ2→2 φ3⋯→L φLþ 1, where L is the length of the meta-path. In the training process, the meta-paths are instantiated as node sequences. By connecting the two end nodes of a meta-path in-stances, the meta-path captures the similarity of two nodes which may not be directly connected. Consequently, one heterogeneous graph can be reduced to several homogeneous graphs, on which graph learning algorithms can be applied. In early work, meta-path based similarity search is investigated (Sun et al., 2011). Recently, more GNN models which utilize the meta-path are proposed. HAN (Wang et al., 2019a) ﬁrst performs graph attention on the meta-path-based neighbors under each meta-path and then uses a semantic attention over output embeddings of nodes under all meta-path schemes to generate the ﬁnal representation of nodes. MAGNN (Fu et al., 2020) proposes to take the intermediate nodes in a meta-path into consideration. It ﬁrst aggregates the information along the meta-path using a neural module and then performs attention over different meta-path instances associated with a node and ﬁnally performs attention over different meta-path schemes. GTN (Yun et al., 2019) proposes a novel graph transformer layer which identiﬁes new connections between unconnected nodes while learning representations of nodes. The learned new connections can connect nodes which are serveral hops away from each other but are closely related, which function as the meta-paths.

大多数处理这种图类型的方法都利用元路径的概念。元路径是一种路径方案，它决定了路径每个位置的节点类型，例如φ1→1 φ2→2 φ3⋯→L φLþ 1，其中L为元路径的长度。在训练过程中，元路径被实例化为节点序列。通过连接元路径实例的两个端点节点，元路径捕获可能不直接连接的两个节点的相似性。因此，一个异构图可以简化为几个同构图，在这些同构图上可以应用图学习算法。在早期工作中，研究了基于元路径的相似性搜索(Sun et al.， 2011)。近年来，越来越多利用元路径的GNN模型被提出。HAN (Wang et al.， 2019a)首先对每个元路径下基于元路径的邻居执行图注意，然后对所有元路径方案下节点的输出嵌入使用语义注意，以生成节点的最终表示。MAGNN (Fu et al.， 2020)提出考虑元路径中的中间节点。它首先使用神经模块聚合元路径上的信息，然后对与节点关联的不同元路径实例执行关注，最后对不同元路径方案执行关注。GTN (Yun等人，2019)提出了一种新的图转换器层，可以在学习节点表示的同时识别未连接节点之间的新连接。学习到的新连接可以连接彼此相距数跳但又密切相关的节点，即元路径。

4.2.2. Edge-based methods基于边缘的方法

There are also works which don’t utilize meta-paths. These works typically use different functions in terms of sampling, aggregation, etc. for different kinds of neighbors and edges. HetGNN (Zhang et al., 2019b) addresses the challenge by directly treating neighbors of different types differently in sampling, feature encoding and aggregation steps. HGT (Hu et al., 2020a) deﬁnes the meta-relation to be the type of two neighboring nodes and their link ? It assigns different attention weight matrices to different meta-relations, empowering the model to take type information into consideration.

还有一些作品没有使用元路径。这些工作通常针对不同类型的邻居和边使用不同的采样、聚合等函数。HetGNN (Zhang et al.， 2019b)通过在采样、特征编码和聚合步骤中直接不同地处理不同类型的邻居来解决这一挑战。HGT (Hu et al.， 2020a)将元关系定义为两个相邻节点的类型及其链接?它为不同的元关系分配不同的注意权重矩阵，使模型能够考虑类型信息。

4.2.3. Methods for relational graphs关系图的方法

The edge of some graphs may contain more information than the type, or the quantity of types may be too large, exerting difﬁculties to applying the meta-path or meta-relation based methods. We refer to this kind of graphs as relational graphs (Schlichtkrull et al., 2018), To handle the relational graphs, G2S (Beck et al., 2018) converts the original graph to a bipartite graph where the original edges also become nodes and one original edge is split into two new edges which means there are two new edges between the edge node and begin/end nodes. After this transformation, it uses a Gated Graph Neural Network followed by a Recurrent Neural Network to convert graphs with edge information into sentences. The aggregation function of GGNN takes both the hidden representations of nodes and the relations as the input. As another approach, R-GCN (Schlichtkrull et al., 2018) doesn’t require to convert the original graph format. It assigns different weight matrices for the propagation on different kinds of edges. However, When the number of relations is very large, the number of parameters in the model explodes. Therefore, it introduces two kinds of regularizations to reduce the number of parameters for modeling amounts of relations: basis- and block-diagonal-decomposition. With the basis decomposition, each Wr is deﬁned as follows:

Here each Wr is a linear combination of basis transformations Vb 2 Rdin ��dout with coefﬁcients arb. In the block-diagonal decomposition, R-GCN deﬁnes each Wr through the direct sum over a set of low-dimensional matrices, which need more parameters than the ﬁrst one.

有些图的边缘包含的信息可能多于类型，或者类型的数量可能过大，这给基于元路径或元关系的方法的应用带来了困难。我们将这种图称为关系图(Schlichtkrull等人，2018)，为了处理关系图，G2S (Beck等人，2018)将原始图转换为二部图，其中原始边也成为节点，一条原始边被分割为两条新边，这意味着在边缘节点和开始/结束节点之间有两条新边。在此转换之后，它使用门控图神经网络和循环神经网络将具有边缘信息的图转换为句子。GGNN的聚合函数将节点的隐藏表示和关系同时作为输入。作为另一种方法，R-GCN (Schlichtkrull et al.， 2018)不需要转换原始图形格式。它为不同类型边上的传播分配不同的权值矩阵。然而，当关系的数量非常大时，模型中的参数数量会激增。因此，引入了两种正则化方法来减少关系建模的参数数量:基分解和块对角分解。通过基分解，每个Wr定义如下:

这里每个Wr都是基变换vb2rdin��dout的线性组合，系数为arb。在分块对角线分解中，R-GCN通过对一组低维矩阵的直接和来定义每个Wr，这些低维矩阵需要比第一个低维矩阵更多的参数。

4.2.4. Methods for multiplex graphs多重图的方法

In more complex scenarios, a pair of nodes in a graph can be associ-ated with multiple edges of different types. By viewing under different types of edges, the graph can form multiple layers, in which each layer represents one type of relation. Therefore, multiplex graph can also be referred to as multi-view graph (multi-dimensional graph). For example, in YouTube, there can be three different relations between two users: sharing, subscription, comment. Edge types are not assumed independent with each other, therefore simply splitting the graph into subgraphs with one type of edges might not be an optimal solution. mGCN (Ma et al., 2019b) introduces general representations and dimension-speciﬁc rep-resentations for nodes in each layer of GNN. The dimension-speciﬁc representations are projected from general representations using different projection matrices and then aggregated to form the next layer’s general representations.

在更复杂的场景中，图中的一对节点可以与不同类型的多条边相关联。通过在不同类型的边下查看，图形可以形成多层，每层代表一种关系。因此，多重图也可以称为多视图图(multi-view graph，多维图)。例如，在YouTube上，两个用户之间可以有三种不同的关系:分享、订阅、评论。边类型不是相互独立的，因此简单地将图划分为具有一种边类型的子图可能不是最佳解决方案。mGCN (Ma et al.， 2019b)介绍了GNN每层节点的一般表示和特定维度的表示。特定维度的表示使用不同的投影矩阵从一般表示投影出来，然后聚合形成下一层的一般表示。

Fig. 5. An overview of methods with unsupervised loss.

4.3. Dynamic graphs动态图

Another variant of graphs is dynamic graphs, in which the graph structure, e.g. the existence of edges and nodes, keeps changing over time. To model the graph structured data together with the time series data, DCRNN (Li et al., 2018b) and STGCN (Yu et al., 2018) ﬁrst collect spatial information by GNNs, then feed the outputs into a sequence model like sequence-to-sequence models or RNNs. Differently, Structural-RNN (Jain et al., 2016) and ST-GCN (Yan et al., 2018) collect spatial and temporal messages at the same time. They extend static graph structure with temporal connections so they can apply traditional GNNs on the extended graphs. Similarly, DGNN (Manessi et al., 2020) feeds the output embeddings of each node from the GCN into separate LSTMs. The weights of LSTMs are shared between each node. On the other hand, EvolveGCN (Pareja et al., 2020) argues that directly modeling dynamics of the node representation will hamper the model’s performance on graphs where node set keeps changing. Therefore, instead of treating node features as the input to RNN, it feeds the weights of the GCN into the RNN to capture the intrinsic dynamics of the graph interactions. Recently, a survey (Huang et al., 2020) classiﬁes the dynamic networks into several categories based on the link duration, and groups the existing models into these categories according to their specialization. It also establishes a general framework for models of dynamic graphs and ﬁts existing models into the general framework.

图的另一种变体是动态图，其中图的结构，例如边和节点的存在，随着时间不断变化。为了将图结构数据与时间序列数据一起建模，DCRNN (Li et al.， 2018b)和STGCN (Yu et al.， 2018)首先通过GNN收集空间信息，然后将输出输入序列模型，如序列到序列模型或RNNs。不同的是，structure - rnn (Jain et al.， 2016)和ST-GCN (Yan et al.， 2018)同时收集空间和时间信息。他们用时间连接扩展静态图结构，从而可以将传统GNN应用于扩展图。类似地，DGNN (Manessi et al.， 2020)将GCN中每个节点的输出嵌入馈送到单独的lstm中。每个节点共享lstm的权值。另一方面，EvolveGCN (Pareja et al.， 2020)认为，直接对节点表示的动态建模将阻碍模型在节点集不断变化的图上的性能。因此，它不是将节点特征作为RNN的输入，而是将GCN的权重输入到RNN中，以捕获图交互的内在动态。最近，一项调查(Huang et al.， 2020)根据链路持续时间将动态网络分为几类，并根据现有模型的专业化将其分组。建立了动态图模型的总体框架，并将现有模型纳入总体框架。

4.4. Other graph types其他图形类型

For other variants of graphs, such as hypergraphs and signed graphs, there are also some models proposed to address the challenges.

对于图的其他变体，例如超图和有符号图，也提出了一些模型来解决这些挑战。

4.4.1. Hypergraphs超图

A hypergraph can be denoted by G = (V,E, We),

where an edge e ∈ E

connects two or more vertices and is assigned a weight w∈ We.Theadjacency matrix of a hypergraph can be represented in a |V|×|E| matrix L:

HGNN (Feng et al., 2019) proposes hypergraph convolution to pro-cess these high order interaction between nodes:

where the Dv; We; De; X are the node degree matrix, edge weight matrix, edge degree matrix and node feature matrix respectively. W is the learnable parameters. This formula is derived by approximating the hypergraph Laplacian using truncated Chebyshev polynomials.

超图可以用G = (V,E, We)表示，

边e∈e

连接两个或多个顶点，并赋予权重w∈We。超图的邻接矩阵可以表示为|V|×|E|矩阵xl:

HGNN (Feng et al.， 2019)提出了超图卷积来处理节点之间的这些高阶相互作用:

式中Dv;我们;德;X分别为节点度矩阵、边权矩阵、边度矩阵和节点特征矩阵。W是可学习参数。该公式由截断切比雪夫多项式逼近超图拉普拉斯多项式得到。

4.4.2. Signed graphs签名图

Signed graphs are the graphs with signed edges, i.e. an edge can be either positive or negative. Instead of simply treating the negative edges as the absent edges or another type of edges, SGCN (Derr et al., 2018) utilizes balance theory to capture the interactions between positive edges and negative edges. Intuitively, balance theory suggests that the friend (positive edge) of my friend is also my friend and the enemy (negative edge) of my enemy is my friend. Therefore it provides theoretical foun-dation for SGCN to model the interactions between positive edges and negative edges.

有符号图是有符号边的图，即边可以是正的也可以是负的。SGCN (Derr等人，2018)没有简单地将负边视为缺失边或另一种类型的边，而是利用平衡理论来捕获正边和负边之间的相互作用。直观地说，平衡理论认为，我朋友的朋友(正面边)也是我的朋友，我敌人的敌人(负面边)也是我的朋友。为SGCN模拟正边和负边之间的相互作用提供了理论基础。

4.5. Large graphs大的图

As we mentioned in Section 3.4, sampling operators are usually used to process large-scale graphs. Besides sampling techniques, there are also other methods for the scaling problem. Leveraging approximate personalized PageRank, methods proposed by Klicpera et al. (2019) and Bojchevski et al. (2020) avoid calculating high-order propagation matrices. Rossi et al. (2020) propose a method to precompute graph convolutional ﬁlters of different sizes for efﬁcient training and inference. PageRank-based models squeeze multiple GCN layers into one single propagation layer to mitigate the “neighbor explosion” issue, hence are highly scalable and efﬁcient.

正如我们在第3.4节中提到的，采样算子通常用于处理大规模图。除了抽样技术，还有其他方法来解决尺度问题。利用近似的个性化PageRank, Klicpera等人(2019)和Bojchevski等人(2020)提出的方法避免了计算高阶传播矩阵。Rossi等人(2020)提出了一种预先计算不同大小的图卷积滤波器的方法，用于有效的训练和推理。基于pagerank的模型将多个GCN层压缩到一个传播层中，以缓解“邻居爆炸”问题，因此具有高度的可伸缩性和效率。

5、Variants for different training settings不同训练设置的变体

In this section, we introduce variants for different training settings.For supervised and semi-supervised settings, labels are provided so that loss functions are easy to design for these labeled samples. For unsu-pervised settings, there are no labeled samples so that loss functions should depend on the information provided by the graph itself, such as input features or the graph topology. In this section, we mainly introduce variants for unsupervised training, which are usually based on the ideas of auto-encoders or contrastive learning. An overview of the methods we mention is shown in Fig. 5.

在本节中，我们将介绍针对不同训练设置的变体。对于监督和半监督设置，提供了标签，以便于为这些标记样本设计损失函数。对于无监督设置，没有标记样本，因此损失函数应该依赖于图本身提供的信息，例如输入特征或图拓扑。在本节中，我们主要介绍用于无监督训练的变体，这些变体通常基于自动编码器或对比学习的思想。我们所提到的方法的概述如图5所示。

5.1. Graph auto-encoders图形自动编码器

For unsupervised graph representation learning, there has been a trend to extend auto-encoder (AE) to graph domains.

Graph Auto-Encoder (GAE) (Kipf and Welling, 2016) ﬁrst uses GCNs to encode nodes in the graph. Then it uses a simple decoder to reconstruct the adjacency matrix and compute the loss from the similarity between the original adjacency matrix and the reconstructed matrix:

对于无监督图表示学习，将自动编码器(AE)扩展到图域已成为一种趋势。

图自动编码器(GAE) (Kipf and Welling, 2016)首次使用GCNs对图中的节点进行编码。然后使用简单的解码器重构邻接矩阵，计算原始邻接矩阵与重构矩阵相似度的损失:

Kipf and Welling (2016) also train the GAE model in a variational manner and the model is named as the variational graph auto-encoder (VGAE).

Adversarially Regularized Graph Auto-encoder (ARGA) (Pan et al., 2018) employs generative adversarial networks (GANs) to regularize a GCN-based graph auto-encoder, which could learn more robust node representations.

Instead of recovering the adjacency matrix, Wang et al. (2017), Park et al. (2019) try to reconstruct the feature matrix. MGAE (Wang et al., 2017) utilizes marginalized denoising auto-encoder to get robust node representation. To build a symmetric graph auto-encoder, GALA (Park et al., 2019) proposes Laplacian sharpening, the inverse operation of Laplacian smoothing, to decode hidden states. This mechanism alleviates the oversmoothing issue in GNN training.

Different from above, AGE (Cui et al., 2020) states that the recovering losses are not compatible with downstream tasks. Therefore, they apply adaptive learning for the measurement of pairwise node similarity and achieve state-of-the-art performance on node clustering and link prediction.

Kipf和Welling(2016)也以变分的方式训练GAE模型，该模型被命名为变分图自动编码器(VGAE)。

对抗正则化图形自动编码器(ARGA) (Pan等人，2018)采用生成对抗网络(GANs)来正则化基于gcn的图形自动编码器，该编码器可以学习更健壮的节点表示。

Wang et al. (2017)， Park et al.(2019)试图重建特征矩阵，而不是恢复邻接矩阵。MGAE (Wang et al.， 2017)利用边缘去噪自动编码器来获得稳健的节点表示。为了构建对称图形自动编码器，GALA (Park等人，2019)提出了拉普拉斯锐化，即拉普拉斯平滑的逆运算，来解码隐藏状态。这种机制缓解了GNN训练中的过平滑问题。

与上述不同，AGE (Cui et al.， 2020)认为恢复损失与下游任务不兼容。因此，他们应用自适应学习来测量成对节点相似度，并在节点聚类和链路预测方面实现了最先进的性能。

5.2. Contrastive learning对比学习

Besides graph auto-encoders, contrastive learning paves another way for unsupervised graph representation learning. Deep Graph Infomax (DGI) (Velickovic et al., 2019) maximizes mutual information between node representations and graph representations. Infograph (Sun et al., 2020) aims to learn graph representations by mutual information maxi-mization between graph-level representations and the substructure-level representations of different scales including nodes, edges and triangles. Multi-view (Hassani and Khasahmadi, 2020) contrasts representations from ﬁrst-order adjacency matrix and graph diffusion, achieves state-of-the-art performances on multiple graph learning tasks.

除了图自动编码器之外，对比学习为无监督图表示学习铺平了另一条道路。深度图信息max (DGI) (Velickovic等人，2019)最大化了节点表示和图表示之间的相互信息。Infograph (Sun et al.， 2020)旨在通过图级表示和不同尺度的子结构级表示(包括节点、边和三角形)之间的互信息最大化来学习图表示。多视图(Hassani和Khasahmadi, 2020)对比了一阶邻接矩阵和图扩散的表示，在多图学习任务上实现了最先进的性能。

6、A design example of GNN—GNN的设计实例

In this section, we give an existing GNN model to illustrated the design process. Taking the task of heterogeneous graph pretraining as an example, we use GPT-GNN (Hu et al., 2020b) as the model to illustrate the design process.

在本节中，我们给出了一个现有的GNN模型来说明设计过程。以异构图预训练任务为例，我们使用GPT-GNN (Hu et al.， 2020b)作为模型来说明设计过程。

(1)、Find graph structure. The paper focuses on applications on the aca-demic knowledge graph and the recommendation system. In the ac-ademic knowledge graph, the graph structure is explicit. In recommendation systems, users, items and reviews can be regarded as nodes and the interactions among them can be regarded as edges, so the graph structure is also easy to construct.

(2)、Specify graph type and scale. The tasks focus on heterogeneous graphs,so that types of nodes and edges should be considered and incorpo-rated in the ﬁnal model. As the academic graph and the recommen-dation graph contain millions of nodes, so that the model should further consider the efﬁciency problem. In conclusion, the model should focus on large-scale heterogeneous graphs.

(3)、Design loss function. As downstream tasks in (Hu et al., 2020b) are all node-level tasks (e.g. Paper-Field prediction in the academic graph), so that the model should learn node representations in the pretraining step. In the pretraining step, no labeled data is available, so that a self-supervised graph generation task is designed to learn node em-beddings. In the ﬁnetuning step, the model is ﬁnetuned based on the training data of each task, so that the supervised loss of each task is applied.

(4)、Build model using computational modules. Finally the model is built with computational modules. For the propagation module, the au-thors use a convolution operator HGT (Hu et al., 2020a) that we mentioned before. HGT incorporates the types of nodes and edges into the propagation step of the model and the skip connection is also added in the architecture. For the sampling module, a specially designed sampling method HGSampling (Hu et al., 2020a) is applied, which is a heterogeneous version of LADIES (Zou et al., 2019). As the model focuses on learning node representations, the pooling module is not needed. The HGT layer are stacked multiple layers to learn better node embeddings.

(1)、寻找图结构。本文重点研究了学术知识图谱和推荐系统的应用。在学术知识图谱中，图谱结构是显式的。在推荐系统中，用户、商品和评论可以看作节点，它们之间的相互作用可以看作边，因此图结构也很容易构造。

(2)、指定图形类型和比例。任务集中在异构图上，因此节点和边的类型应该被考虑并纳入最终模型中。由于学术图和推荐图包含数百万个节点，因此模型需要进一步考虑效率问题。综上所述，该模型应侧重于大尺度异构图。

(3)、设计损失函数。由于(Hu et al.， 2020b)中的下游任务都是节点级任务(例如学术图中的Paper-Field预测)，因此模型应该在预训练步骤中学习节点表示。在预训练步骤中，没有标记数据可用，因此设计了一个自监督图生成任务来学习节点嵌入。在调优步骤中，根据每个任务的训练数据对模型进行调优，从而应用每个任务的监督损失。

(4)利用计算模块建立模型。最后利用计算模块建立模型。对于传播模块，作者使用了我们之前提到的卷积算子HGT (Hu et al.， 2020a)。HGT将节点和边的类型合并到模型的传播步骤中，并在体系结构中添加了跳跃连接。采样模块采用了专门设计的HGSampling采样方法(Hu et al.， 2020a)，该方法是LADIES的异构版本(Zou et al.， 2019)。由于该模型侧重于学习节点表示，因此不需要池化模块。HGT层被堆叠成多层，以学习更好的节点嵌入。

7、Analyses of GNNs—GNN的分析

7.1. Theoretical aspect理论方面

In this section, we summarize the papers about the theoretic foun-dations and explanations of graph neural networks from various perspectives.

在本节中，我们从各个角度对图神经网络的理论基础和解释进行了总结。

7.1.1. Graph signal processing图信号处理

From the spectral perspective of view, GCNs perform convolution operation on the input features in the spectral domain, which follows graph signal processing in theory.

There exists several works analyzing GNNs from graph signal pro-cessing. Li et al. (2018c) ﬁrst address the graph convolution in graph neural networks is actually Laplacian smoothing, which smooths the feature matrix so that nearby nodes have similar hidden representations. Laplacian smoothing reﬂects the homophily assumption that nearby nodes are supposed to be similar. The Laplacian matrix serves as a low-pass ﬁlter for the input features. SGC (Wu et al., 2019b) further removes the weight matrices and nonlinearties between layers, showing that the low-pass ﬁlter is the reason why GNNs work.

从谱的角度来看，GCNs在谱域对输入特征进行卷积运算，理论上遵循图信号处理。

从图信号处理的角度分析GNN已有一些研究。Li等人(2018c)首先解决了图神经网络中的图卷积实际上是拉普拉斯平滑，它平滑特征矩阵，使附近的节点具有相似的隐藏表示。拉普拉斯平滑反映了同质性假设，附近的节点应该是相似的。拉普拉斯矩阵作为输入特征的低通滤波器。SGC (Wu et al.， 2019b)进一步去除层之间的权重矩阵和非线性，表明低通滤波器是GNN工作的原因。

Following the idea of low-pass ﬁltering, Zhang et al. (2019c), Cui et al.(2020), NT and Maehara (Nt and Maehara, 2019), Chen et al. (2020b) analyze different ﬁlters and provide new insights. To achieve low-pass ﬁltering for all the eigenvalues, AGC (Zhang et al., 2019c) designs a graph ﬁlter I �� L according to the frequency response function. AGE (Cui et al., 2020) further demonstrates that ﬁlter with I �� L could get better results, where λmax is the maximum eigenvalue of the Laplacian matrix. Despite linear ﬁlters, GraphHeat (Xu et al., 2019a) leverages heat kernels for better low-pass properties. NT and Maehara (Nt and Maehara, 2019) state that graph convolution is mainly a denoising process for input features, the model performances heavily depend on the amount of noises in the feature matrix. To alleviate the over-smoothing issue, Chen et al. (2020b) present two metrics for measuring the smoothness of node representations and the over-smoothness of GNN models. The authors conclude that the information-to-noise ratio is the key factor for over-smoothing.

遵循低通滤波的思想，Zhang等人(2019c)， Cui等人(2020)，NT和Maehara (NT和Maehara, 2019)， Chen等人(2020b)分析了不同的滤波器，并提供了新的见解。为了实现对所有特征值的低通滤波，AGC (Zhang et al.， 2019c)根据频响函数设计了一个图滤波器I��L。AGE (Cui et al.， 2020)进一步证明用I��L进行滤波可以得到更好的结果，其中λmax是拉普拉斯矩阵的最大特征值。尽管有线性滤波器，GraphHeat (Xu等人，2019a)利用热内核获得更好的低通特性。NT和Maehara (NT和Maehara, 2019)指出，图卷积主要是输入特征的去噪过程，模型性能在很大程度上取决于特征矩阵中的噪声量。为了缓解过度平滑问题，Chen等人(2020b)提出了两个度量标准，用于测量节点表示的平滑度和GNN模型的过平滑度。作者得出结论，信息噪声比是过度平滑的关键因素。

7.1.2. Generalization泛化

The generalization ability of GNNs have also received attentions recently. Scarselli et al. (2018) prove the VC-dimensions for a limited class of GNNs. Garg et al. (2020) further give much tighter generalization bounds based on Rademacher bounds for neural networks.

Verma and Zhang (2019) analyze the stability and generalization properties of single-layer GNNs with different convolutional ﬁlters. The authors conclude that the stability of GNNs depends on the largest eigenvalue of the ﬁlters. Knyazev et al. (2019) focus on the generalization ability of attention mechanism in GNNs. Their conclusion shows that attention helps GNNs generalize to larger and noisy graphs.

GNN的泛化能力也是近年来倍受关注的问题。Scarselli等人(2018)证明了有限类GNN的vc维。Garg等人(2020)进一步给出了基于神经网络Rademacher边界的更严格的泛化边界。

Verma和Zhang(2019)分析了具有不同卷积滤波器的单层GNN的稳定性和泛化特性。作者得出结论，GNN的稳定性取决于滤波器的最大特征值。Knyazev等人(2019)关注GNN中注意机制的泛化能力。他们的结论表明，注意力有助于GNN泛化到更大和有噪声的图。

7.1.3. Expressivity善于表达

On the expressivity of GNNs, Xu et al. (2019b), Morris et al. (2019) show that GCNs and GraphSAGE are less discriminative than Weisfeiler-Leman (WL) test, an algorithm for graph isomorphism testing. Xu et al. (2019a) also propose GINs for more expressive GNNs. Going beyond WL test, Barcelo�� et al. (2019) discuss if GNNs are expressible for FOC2, a fragment of ﬁrst order logic. The authors ﬁnd that existing GNNs can hardly ﬁt the logic. For learning graph topologic structures, Garg et al. (2020) prove that locally dependent GNN variants are not capable to learn global graph properties, including diameters, biggest/smallest cycles, or motifs.

Loukas (2020) and Dehmamy et al. (2019) argue that existing works only consider the expressivity when GNNs have inﬁnite layers and units. Their work investigates the representation power of GNNs with ﬁnite depth and width. Oono and Suzuki (2020) discuss the asymptotic be-haviors of GNNs as the model deepens and model them as dynamic systems.

关于GNN的表达能力，Xu等人(2019b)， Morris等人(2019)表明，GCNs和GraphSAGE的鉴别能力低于Weisfeiler-Leman (WL)检验，这是一种用于图同构检验的算法。Xu等人(2019a)还提出了更具表现力的GNN的gin。除了WL测试之外，Barcelo��等人(2019)讨论了GNN是否对FOC2(一阶逻辑的片段)可表达。作者发现现有的GNN很难符合这种逻辑。对于学习图拓扑结构，Garg等人(2020)证明了局部依赖的GNN变体无法学习全局图属性，包括直径、最大/最小循环或母块。

Loukas(2020)和Dehmamy et al.(2019)认为现有工作只考虑GNN具有无限层和单元时的表达性。他们的工作研究了具有有限深度和宽度的GNN的表示能力。Oono和Suzuki(2020)讨论了GNN在模型深化时的渐近行为，并将其建模为动态系统。

7.1.4. Invariance不变性

As there are no node orders in graphs, the output embeddings of GNNs are supposed to be permutation-invariant or equivariant to the input features. Maron et al. (2019a) characterize permutation-invariant or equivariant linear layers to build invariant GNNs. Maron et al.(2019b) further prove the result that the universal invariant GNNs can be obtained with higher-order tensorization. Keriven and Peyr��e (2019) provide an alternative proof and extend this conclusion to the equivariant case. Chen et al. (2019) build connections between permutation-invariance and graph isomorphism testing. To prove their equivalence, Chen et al. (2019) leverage sigma-algebra to describe the expressivity of GNNs.

由于图中没有节点序，GNN的输出嵌入对输入特征假定为置换不变或等变。Maron等人(2019a)描述了置换不变或等变线性层，以构建不变GNN。Maron et al.(2019b)进一步证明了通过高阶张量化可以获得普遍不变GNN的结果。Keriven和Peyr��e(2019)提供了另一种证明，并将这一结论推广到等变情况。Chen等人(2019)建立了置换不变性和图同构检验之间的联系。为了证明它们的等价性，Chen等人(2019)利用σ代数来描述GNN的表达性。

7.1.5. Transferability可转移性

A deterministic characteristic of GNNs is that the parameterization is untied with graphs, which suggests the ability to transfer across graphs (so-called transferability) with performance guarantees. Levie et al.(2019) investigate the transferability of spectral graph ﬁlters, showing that such ﬁlters are able to transfer on graphs in the same domain. Ruiz et al. (2020) analyze GNN behaviour on graphons. Graphon refers to the limit of a sequence of graphs, which can also be seen as a generator for dense graphs. The authors conclude that GNNs are transferable across graphs obtained deterministically from the same graphon with different sizes.

GNN的一个确定性特征是参数化与图无关，这表明能够在性能保证的情况下跨图传输(所谓的可转移性)。Levie等人(2019)研究了谱图滤波器的可迁移性，表明这种滤波器能够在同一域中的图上迁移。Ruiz等人(2020)分析了GNN在图元上的行为。Graphon指的是图序列的极限，也可以看作是密集图的生成器。作者得出结论，GNN在不同大小的相同图中确定地获得的图之间是可转移的。

7.1.6. Label efﬁciency标签效率

(Semi-) Supervised learning for GNNs needs a considerable amount of labeled data to achieve a satisfying performance. Improving the label efﬁciency has been studied in the perspective of active learning, in which informative nodes are actively selected to be labeled by an oracle to train the GNNs. Cai et al. (2017), Gao et al. (2018b), Hu et al. (2020c) demonstrate that by selecting the informative nodes such as the high-degree nodes and uncertain nodes, the labeling efﬁciency can be dramatically improved.

GNN的(半)监督学习需要相当数量的标记数据才能达到令人满意的性能。从主动学习的角度研究了提高标记效率的方法，通过oracle主动选择信息节点进行标记来训练GNN。Cai et al. (2017)， Gao et al. (2018b)， Hu et al. (2020c)证明，通过选择高次节点和不确定节点等信息节点，可以显著提高标记效率。

7.2. Empirical aspect经验方面

Besides theoretical analysis, empirical studies of GNNs are also required for better comparison and evaluation. Here we include several empirical studies for GNN evaluation and benchmarks.

除了理论分析，还需要对GNN进行实证研究，以便更好地进行比较和评价。在这里，我们包括几个关于GNN评估和基准的实证研究。

7.2.1. Evaluation评价

Evaluating machine learning models is an essential step in research. Concerns about experimental reproducibility and replicability have been raised over the years. Whether and to what extent do GNN models work?Which parts of the models contribute to the 铿乶al performance? To investigate such fundamental questions, studies about fair evaluation strategies are urgently needed.

On semi-supervised node classiﬁcation task, Shchur et al. (2018a) explore how GNN models perform under same training strategies and hyperparameter tune. Their works concludes that different dataset splits lead to dramatically different rankings of models. Also, simple models could outperform complicated ones under proper settings. Errica et al.(2020) review several graph classiﬁcation models and point out that they are compared inproperly. Based on rigorous evaluation, structural in-formation turns up to not be fully exploited for graph classiﬁcation. You et al. (2020) discuss the architectural designs of GNN models, such as the number of layers and the aggregation function. By a huge amount of experiments, this work provides comprehensive guidelines for GNN designation over various tasks.

评估机器学习模型是研究的重要一步。多年来，人们一直在关注实验的可重复性和可复制性。GNN模型是否有效，在多大程度上有效?模型的哪些部分对铿乶al性能有贡献?为了探究这些基本问题，迫切需要对公平评价策略进行研究。

在半监督节点分类任务中，Shchur等人(2018a)探讨了GNN模型在相同训练策略和超参数调优下的表现。他们的研究得出结论，不同的数据集分割会导致模型的排名显著不同。此外，在适当的设置下，简单模型的表现也优于复杂模型。Errica等人(2020)回顾了几种图分类模型，并指出它们的比较不恰当。经过严格的评估，结构信息并不能完全用于图分类。You等人(2020)讨论了GNN模型的架构设计，如层数和聚合功能。通过大量的实验，这项工作为各种任务的GNN指定提供了全面的指导方针。

7.2.2. Benchmarks基准

High-quality and large-scale benchmark datasets such as ImageNet are signiﬁcant in machine learning research. However in graph learning, widely-adopted benchmarks are problematic. For example, most node classiﬁcation datasets contain only 3000 to 20,000 nodes, which are small compared with real-world graphs. Furthermore, the experimental protocols across studies are not uniﬁed, which is hazardous to the liter-ature. To mitigate this issue, Dwivedi et al. (2020), Hu et al. (2020d) provide scalable and reliable benchmarks for graph learning. Dwivedi et al. (2020) build medium-scale benchmark datasets in multiple do-mains and tasks, while OGB (Hu et al., 2020d) offers large-scale datasets. Furthermore, both works evaluate current GNN models and provide leaderboards for further comparison.

高质量和大规模的基准数据集(如ImageNet)在机器学习研究中具有重要意义。然而，在图学习中，广泛采用的基准是有问题的。例如，大多数节点分类数据集只包含3000到20000个节点，与现实世界的图相比，这是很小的。此外，跨研究的实验方案不统一，这对文献是危险的。为了缓解这个问题，Dwivedi等人(2020)，Hu等人(2020d)为图学习提供了可扩展和可靠的基准。Dwivedi等人(2020)在多个领域和任务中构建了中等规模的基准数据集，而OGB (Hu等人，2020d)提供了大规模数据集。此外，这两项工作都评估了当前的GNN模型，并提供了排行榜供进一步比较。

8、ApplicationsApplications

Graph neural networks have been explored in a wide range of domains across supervised, semi-supervised, unsupervised and reinforce-ment learning settings. In this section, we generally group the applications in two scenarios: (1) Structural scenarios where the data has explicit relational structure. These scenarios, on the one hand, emerge from scientiﬁc researches, such as graph mining, modeling physical systems and chemical systems. On the other hand, they rise from in-dustrial applications such as knowledge graphs, trafﬁc networks and recommendation systems. (2) Non-structural scenarios where the rela-tional structure is implicit or absent. These scenarios generally include image (computer vision) and text (natural language processing), which are two of the most actively developing branches of AI researches. A simple illustration of these applications is in Fig. 6. Note that we only list several representative applications instead of providing an exhaustive list. The summary of the applications could be found in Table 3.

图神经网络已经在有监督、半监督、无监督和强化学习设置的广泛领域进行了探索。在本节中，我们通常将应用程序分为两种场景:(1)结构化场景，其中数据具有显式的关系结构。这些场景，一方面来自于科学研究，如图挖掘，物理系统和化学系统建模。另一方面，它们来自于工业应用，如知识图、交通网络和推荐系统。(2)关系结构隐含或不存在的非结构场景。这些场景通常包括图像(计算机视觉)和文本(自然语言处理)，这是人工智能研究中发展最活跃的两个分支。这些应用的简单说明如图6所示。请注意，我们只列出了几个有代表性的应用程序，而不是提供一个详尽的列表。应用程序的摘要见表3。

8.1. Structural scenarios结构的场景

In the following subsections, we will introduce GNNs’ applications in structural scenarios, where the data are naturally performed in the graph structure.

在接下来的小节中，我们将介绍GNN在结构场景中的应用，其中数据自然地在图结构中执行。

8.1.1. Graph mining图挖掘

The ﬁrst application is to solve the basic tasks in graph mining. Generally, graph mining algorithms are used to identify useful structures for downstream tasks. Traditional graph mining challenges include frequent sub-graph mining, graph matching, graph classiﬁcation, graph clustering, etc. Although with deep learning, some downstream tasks can be directly solved without graph mining as an intermediate step, the basic challenges are worth being studied in the GNNs’ perspective.

Graph Matching. The ﬁrst challenge is graph matching. Traditional methods for graph matching usually suffer from high computational complexity. The emergence of GNNs allows researchers to capture the structure of graphs using neural networks, thus offering another solution to the problem. Riba et al. (2018) propose a siamese MPNN model to learn the graph editing distance. The siamese framework is two parallel MPNNs with the same structure and weight sharing, The training objective is to embed a pair of graphs with small editing distance into close latent space. Li et al. (2019b) design similar methods while ex-periments on more real-world scenario such as similarity search in control ﬂow graph.

第一个应用是解决图挖掘中的基本任务。通常，图挖掘算法用于识别下游任务的有用结构。传统的图挖掘挑战包括频繁的子图挖掘、图匹配、图分类、图聚类等。尽管有了深度学习，一些下游任务可以直接解决，而不需要图挖掘作为中间步骤，但从GNN的角度来看，基本的挑战是值得研究的。

图匹配。第一个挑战是图匹配。传统的图匹配方法计算量大。GNN的出现使研究人员能够使用神经网络捕获图的结构，从而为该问题提供了另一种解决方案。Riba等人(2018)提出了一种暹罗式MPNN模型来学习图形编辑距离。siamese框架是两个具有相同结构和权值共享的并行mpnn，训练目标是将一对编辑距离较小的图嵌入到接近的潜空间中。Li等人(2019b)设计了类似的方法，同时对更多现实场景进行实验，如控制流图中的相似性搜索。

Graph Clustering. Graph clustering is to group the vertices of a graph into clusters based on the graph structure and/or node attributes. Various works (Zhang et al., 2019c) in node representation learning are devel-oped and the representation of nodes can be passed to traditional clus-tering algorithms. Apart of learning node embeddings, graph pooling (Ying et al., 2018b) can be seen as a kind of clustering. More recently, Tsitsulin et al. (2020) directly target at the clustering task. They study the desirable property of a good graph clustering method and propose to optimize the spectral modularity, which is a remarkably useful graph clustering metric.

图聚类。图聚类是根据图的结构和/或节点属性将图的顶点分组成簇。在节点表示学习方面开展了各种工作(Zhang等人，2019c)，节点表示可以传递给传统的聚类算法。除了学习节点嵌入，图池化(Ying et al.， 2018b)可以被视为一种聚类。最近，Tsitsulin等人(2020)直接针对聚类任务。他们研究了一个好的图聚类方法的理想性质，并提出了优化谱模性的方法，这是一个非常有用的图聚类度量。

8.1.2. Physics物理

Modeling real-world physical systems is one of the most fundamental aspects of understanding human intelligence. A physical system can be modeled as the objects in the system and pair-wise interactions between objects. Simulation in the physical system requires the model to learn the law of the system and make predictions about the next state of the sys-tem. By modeling the objects as nodes and pair-wise interactions as edges, the systems can be simpliﬁed as graphs. For example, in particle systems, particles can interact with each other via multiple interactions, including collision (Hoshen, 2017), spring connection, electromagnetic force (Kipf et al., 2018), etc., where particles are seen as nodes and in-teractions are seen as edges. Another example is the robotic system, which is formed by multiple bodies (e.g., arms, legs) connected with joints. The bodies and joints can be seen as nodes and edges, respectively. The model needs to infer the next state of the bodies based on the current state of the system and the principles of physics.

模拟真实世界的物理系统是理解人类智能的最基本方面之一。物理系统可以被建模为系统中的对象和对象之间的成对交互。物理系统的仿真要求模型学习系统的规律，并对系统的下一个状态做出预测。通过将对象建模为节点，将成对交互建模为边，系统可以简化为图。例如，在粒子系统中，粒子之间可以通过多种相互作用相互作用，包括碰撞(Hoshen, 2017)、弹簧连接、电磁力(Kipf et al.， 2018)等，其中粒子被视为节点，相互作用被视为边。另一个例子是机器人系统，它由连接关节的多个身体(如手臂、腿)组成。主体和关节可以分别看作节点和边。该模型需要根据系统的当前状态和物理原理来推断物体的下一个状态。

Before the advent of graph neural networks, works process the graph representation of the systems using the available neural blocks. Interac-tion Networks (Battaglia et al., 2016) utilizes MLP to encode the inci-dence matrices of the graph. CommNet (Sukhbaatar Ferguset al., 2016) performs nodes updates using the nodes’ previous representations and the average of all nodes’ previous representations. VAIN (Hoshen, 2017) further introduces the attention mechanism. VIN (Watters et al., 2017) combines CNNs, RNNs and IN (Battaglia et al., 2016).

The emergence of GNNs let us perform GNN-based reasoning about objects, relations, and physics in a simpliﬁed but effective way. NRI (Kipf et al., 2018) takes the trajectory of objects as input and infers an explicit interaction graph, and learns a dynamic model simultaneously. The interaction graphs are learned from former trajectories, and trajectory predictions are generated from decoding the interaction graphs.

Sanchez et al. (2018) propose a Graph Network-based model to encode the graph formed by bodies and joints of a robotic system. They further learn the policy of stably controlling the system by combining GNs with Reinforcement learning.

在图神经网络出现之前，工作是使用可用的神经块来处理系统的图表示。交互网络(Battaglia et al.， 2016)利用MLP对图的关联矩阵进行编码。CommNet (Sukhbaatar Ferguset al.， 2016)使用节点以前的表示和所有节点以前表示的平均值来执行节点更新。VAIN (Hoshen, 2017)进一步介绍了注意机制。VIN (Watters等人，2017)结合了cnn, rnn和IN (Battaglia等人，2016)。

GNN的出现让我们能够以一种简单而有效的方式对对象、关系和物理进行基于GNN的推理。NRI (Kipf et al.， 2018)将物体的轨迹作为输入，并推断出一个显式的交互图，并同时学习一个动态模型。从以前的轨迹中学习交互图，并通过解码交互图生成轨迹预测。

Sanchez等人(2018)提出了一个基于图网络的模型来编码机器人系统的身体和关节形成的图。通过将gn与强化学习相结合，进一步学习稳定控制系统的策略。

8.1.3. Chemistry and biology化学与生物

Molecular Fingerprints. Molecular ﬁngerprints serve as a way to encode the structure of molecules. The simplest ﬁngerprint can be a one-hot vector, where each digit represents the existence or absence of a particular substructure. These ﬁngerprints can be used in molecule searching, which is a core step in computer-aided drug design. Conven-tional molecular ﬁngerprints are hand-made and ﬁxed (e.g., the one-hot vector). However, molecules can be naturally seen as graphs, with atoms being the nodes and chemical-bonds being the edges. Therefore, by applying GNNs to molecular graphs, we can obtain better ﬁngerprints. Duvenaud et al. (2015) propose neural graph ﬁngerprints (Neural FPs), which calculate substructure feature vectors via GCNs and sum to get overall representations. Kearnes et al. (2016) explicitly model atom and atom pairs independently to emphasize atom interactions. It introduces edge representation etuv instead of aggregation function, i.e.	分子指纹。分子指纹是一种对分子结构进行编码的方法。最简单的指纹可以是一个单热向量，其中每个数字代表一个特定子结构的存在或不存在。这些指纹可以用于分子搜索，这是计算机辅助药物设计的核心步骤。传统的分子指纹是手工制作和固定的(例如，单热载体)。然而，分子可以被自然地视为图形，原子是节点，化学键是边缘。因此，通过将GNN应用于分子图，我们可以获得更好的指纹。 Duvenaud等人(2015)提出了神经图指纹(neural FPs)，通过GCNs计算子结构特征向量并求和以得到整体表示。Kearnes等人(2016)明确地独立模拟原子和原子对，以强调原子之间的相互作用。引入边缘表示etuv代替聚合函数，即。
Chemical Reaction Prediction. Chemical reaction product predic-tion is a fundamental issue in organic chemistry. Graph Transformation Policy Network (Do et al., 2019) encodes the input molecules and gen-erates an intermediate graph with a node pair prediction network and a policy network.	化学反应预测。化学反应产物预测是有机化学中的一个基本问题。图转换策略网络(Do等人，2019)对输入分子进行编码，并生成带有节点对预测网络和策略网络的中间图。
Protein Interface Prediction. Proteins interact with each other using the interface, which is formed by the amino acid residues from each participating protein. The protein interface prediction task is to deter-mine whether particular residues constitute part of a protein. Generally, the prediction for a single residue depends on other neighboring resi-dues. By letting the residues to be nodes, the proteins can be represented as graphs, which can leverage the GNN-based machine learning algo-rithms. Fout et al. (2017) propose a GCN-based method to learn ligand and receptor protein residue representation and to merge them for pair-wise classiﬁcation. MR-GNN (Xu et al., 2019d) introduces a multi-resolution approach to extract and summarize local and global features for better prediction.	蛋白质界面预测。蛋白质通过界面相互作用，界面是由每个参与蛋白质的氨基酸残基形成的。蛋白质界面预测任务是确定特定的残基是否构成蛋白质的一部分。一般来说，对单个残差的预测依赖于其他相邻残差。通过让残基作为节点，蛋白质可以表示为图，这可以利用基于GNN的机器学习算法。Fout等人(2017)提出了一种基于gcn的方法来学习配体和受体蛋白残留表示，并将它们合并进行成对分类。MR-GNN (Xu等人，2019d)引入了一种多分辨率方法来提取和总结局部和全局特征，以便更好地预测。
Biomedical Engineering. With Protein-Protein Interaction Network, Rhee et al. (2018) leverage graph convolution and relation network for breast cancer subtype classiﬁcation. Zitnik et al. (2018) also suggest a GCN-based model for polypharmacy side effects prediction. Their work models the drug and protein interaction network and separately deals with edges in different types.	生物医学工程。通过蛋白质-蛋白质相互作用网络，Rhee等人(2018)利用图卷积和关系网络进行乳腺癌亚型分类。Zitnik等人(2018)还提出了一种基于gcn的多药副作用预测模型。他们的工作建立了药物和蛋白质相互作用网络的模型，并分别处理不同类型的边缘。

8.1.4. Knowledge graph知识图谱

The knowledge graph (KG) represents a collection of real-world en-tities and the relational facts between pairs of the entities. It has wide application, such as question answering, information retrieval and knowledge guided generation. Tasks on KGs include learning low-dimensional embeddings which contain rich semantics for the entities and relations, predicting the missing links between entities, and multi-hop reasoning over the knowledge graph. One line of research treats the graph as a collection of triples, and proposes various kinds of loss functions to distinguish the correct triples and false triples (Bordes et al., 2013). The other line leverages the graph nature of KG, and uses GNN-based methods for various tasks. When treated as a graph, KG can be seen as a heterogeneous graph. However, unlike other heterogeneous graphs such as social networks, the logical relations are of more impor-tance than the pure graph structure.

知识图(KG)表示现实世界实体的集合以及实体对之间的关系事实。它在问答、信息检索、知识引导生成等方面有着广泛的应用。知识图谱的任务包括学习包含实体和关系丰富语义的低维嵌入，预测实体之间缺失的链接，以及对知识图谱的多跳推理。一个研究方向是将图视为三元组的集合，并提出各种损失函数来区分正确的三元组和错误的三元组(Bordes et al.， 2013)。另一条线利用KG的图性质，并使用基于GNN的方法完成各种任务。当把KG作为一个图来处理时，KG可以看作是一个异构图。然而，与其他异构图(如社交网络)不同的是，逻辑关系比纯图结构更重要。

R-GCN (Schlichtkrull et al., 2018) is the ﬁrst work to incorporate GNNs for knowledge graph embedding. To deal with various relations, R-GCN proposes relation-speciﬁc transformation in the message passing steps. Structure-Aware Convolutional Network (Shang et al., 2019) combines a GCN encoder and a CNN decoder together for better knowledge representations.

A more challenging setting is knowledge base completion for out-of-knowledge-base (OOKB) entities. The OOKB entities are unseen in the training set, but directly connect to the observed entities in the training set. The embeddings of OOKB entities can be aggregated from the observed entities. Hamaguchi et al. (2017) use GNNs to solve the prob-lem, which achieve satisfying performance both in the standard KBC setting and the OOKB setting.

Besides knowledge graph representation learning, Wang et al.(2018b) utilize GCN to solve the cross-lingual knowledge graph align-ment problem. The model embeds entities from different languages into a uniﬁed embedding space and aligns them based on the embedding sim-ilarity. To align large-scale heterogeneous knowledge graphs, OAG (Zhang et al., 2019d) uses graph attention networks to model various types of entities. With representing entities as their surrounding sub-graphs, Xu et al. (2019c) transfer the entity alignment problem to a graph matching problem and then solve it by graph matching networks.

R-GCN (Schlichtkrull et al.， 2018)是第一个将GNN纳入知识图嵌入的工作。为了处理各种关系，R-GCN在消息传递步骤中提出了特定于关系的转换。结构感知卷积网络(Shang等人，2019)将GCN编码器和CNN解码器结合在一起，以获得更好的知识表示。

一个更具挑战性的设置是为知识库外(OOKB)实体完成知识库。OOKB实体在训练集中是看不见的，但是直接连接到训练集中观察到的实体。OOKB实体的嵌入可以从观察到的实体中聚合。Hamaguchi et al.(2017)使用GNN来解决该问题，在标准KBC设置和OOKB设置中都取得了令人满意的性能。

除了知识图表示学习，Wang等人(2018b)利用GCN解决跨语言知识图对齐问题。该模型将来自不同语言的实体嵌入到统一的嵌入空间中，并根据嵌入相似性对它们进行对齐。为了对齐大规模异构知识图，OAG (Zhang et al.， 2019d)使用图注意网络来建模各种类型的实体。Xu et al. (2019c)将表示实体作为其周围的子图，将实体对齐问题转化为图匹配问题，然后通过图匹配网络来解决。

8.1.5. Generative models生成模型

Generative models for real-world graphs have drawn signiﬁcant attention for their important applications including modeling social in-teractions, discovering new chemical structures, and constructing knowledge graphs. As deep learning methods have powerful ability to learn the implicit distribution of graphs, there is a surge in neural graph generative models recently.	生成模型的现实世界的图形已经引起了重要的应用，包括建模社会互动，发现新的化学结构，以及构建知识图。由于深度学习方法具有强大的学习图的隐式分布的能力，近年来神经图生成模型激增。
NetGAN (Shchur et al., 2018b) is one of the ﬁrst work to build neural graph generative model, which generates graphs via random walks. It transforms the problem of graph generation to the problem of walk generation which takes the random walks from a speciﬁc graph as input and trains a walk generative model using GAN architecture. While the generated graph preserves important topological properties of the orig-inal graph, the number of nodes is unable to change in the generating process, which is as same as the original graph. GraphRNN (You et al., 2018b) manages to generate the adjacency matrix of a graph by gener-ating the adjacency vector of each node step by step, which can output networks with different numbers of nodes. Li et al. (2018d) propose a model which generates edges and nodes sequentially and utilizes a graph neural network to extract the hidden state of the current graph which is used to decide the action in the next step during the sequential generative process. GraphAF (Shi et al., 2020) also formulates graph generation as a sequential decision process. It combines the ﬂow-based generation with the autogressive model. Towards molecule generation, it also conducts validity check of the generated molecules using existing chemical rules after each step of generation.	NetGAN (Shchur et al.， 2018b)是最早建立神经图生成模型的工作之一，该模型通过随机游走生成图。它将图生成问题转化为行走生成问题，将特定图的随机行走作为输入，并使用GAN架构训练一个行走生成模型。生成的图虽然保留了原始图的重要拓扑性质，但在生成过程中节点数不能改变，与原始图相同。GraphRNN (You et al.， 2018b)通过逐级生成每个节点的邻接向量来生成图的邻接矩阵，可以输出节点数量不同的网络。Li等人(2018d)提出了一种模型，该模型顺序生成边和节点，并利用图神经网络提取当前图的隐藏状态，用于在顺序生成过程中决定下一步的动作。GraphAF (Shi等人，2020)还将图生成定义为一个连续的决策过程。它结合了基于流的生成和自进模型。对于分子的生成，在生成的每一步之后，都使用现有的化学规则对生成的分子进行有效性检查。
Instead of generating graph sequentially, other works generate the adjacency matrix of graph at once. MolGAN (De Cao and Kipf, 2018) utilizes a permutation-invariant discriminator to solve the node variant problem in the adjacency matrix. Besides, it applies a reward network for RL-based optimization towards desired chemical properties. What’s more, Ma et al. (2018) propose constrained variational auto-encoders to ensure the semantic validity of generated graphs. And, GCPN (You et al., 2018a) incorporates domain-speciﬁc rules through reinforcement learning. GNF (Liu et al., 2019) adapts normalizing ﬂow to the graph data. Normalizing ﬂow is a kind of generative model which uses a invertable mapping to transform observed data into latent vector space. Transforming from the latent vector back into the observed data using the inverse matrix serves as the generating process. GNF combines normalizing ﬂow with a permutation-invariant graph auto-encoder to take graph structured data as the input and generate new graphs at the test time. Graphite (Grover et al., 2019) integrates GNN into variational auto-encoders to encode the graph structure and features into latent variables. More speciﬁcally, it uses isotropic Gaussian as the latent var-iables and then uses iterative reﬁnement strategy to decode from the latent variables.	其他工作不是按顺序生成图，而是一次性生成图的邻接矩阵。MolGAN (De Cao和Kipf, 2018)利用置换不变鉴别器来解决邻接矩阵中的节点变异问题。此外，它还应用奖励网络对期望的化学性质进行基于rl的优化。此外，Ma等人(2018)提出了约束变分自动编码器，以确保生成的图形的语义有效性。并且，GCPN (You等人，2018a)通过强化学习整合了特定领域的规则。GNF (Liu et al.， 2019)将归一化流适应于图数据。归一化流是一种生成模型，它利用可逆映射将观测数据转化为潜在向量空间。利用逆矩阵将潜在向量转换回观测数据是生成过程。GNF将归一化流与置换不变图自动编码器相结合，将图结构化数据作为输入，在测试时生成新的图。Graphite (Grover et al.， 2019)将GNN集成到变分自动编码器中，将图结构和特征编码为潜在变量。更具体地说，它采用各向同性高斯函数作为潜在变量，然后采用迭代细化策略对潜在变量进行解码。

8.1.6. Combinatorial optimization组合优化

Combinatorial optimization problems over graphs are set of NP-hard problems which attract much attention from scientists of all ﬁelds. Some speciﬁc problems like traveling salesman problem (TSP) and minimum spanning trees (MST) have got various heuristic solutions. Recently, using a deep neural network for solving such problems has been a hot-spot, and some of the solutions further leverage graph neural network because of their graph structure.

Bello et al. (2017) ﬁrst propose a deep-learning approach to tackle TSP. Their method consists of two parts: a Pointer Network (Vinyals et al., 2015b) for parameterizing rewards and a policy gradient (Sutton and Barto, 2018) module for training. This work has been proved to be comparable with traditional approaches. However, Pointer Networks are designed for sequential data like texts, while order-invariant encoders are more appropriate for such work.

图上的组合优化问题是一组np难问题，引起了各领域科学家的广泛关注。一些具体问题如旅行商问题(TSP)和最小生成树(MST)已经有了各种启发式的解决方法。近年来，利用深度神经网络来解决这类问题已成为一个热点，一些解决方案由于其图结构而进一步利用了图神经网络。

Bello等人(2017)首先提出了一种深度学习方法来解决TSP问题。他们的方法由两部分组成:用于参数化奖励的指针网络(Vinyals等人，2015b)和用于训练的策略梯度模块(Sutton和Barto, 2018)。这项工作已被证明可以与传统方法相比较。然而，指针网络是为像文本这样的顺序数据设计的，而有序不变编码器更适合这类工作。

Khalil et al. (2017), Kool et al. (2019) improve the above method by including graph neural networks. The former work ﬁrst obtains the node embeddings from structure2vec (Dai et al., 2016), then feed them into a Q-learning module for making decisions. The latter one builds an attention-based encoder-decoder system. By replacing reinforcement learning module with an attention-based decoder, it is more efﬁcient for training. These works achieve better performances than previous algo-rithms, which prove the representation power of graph neural networks. More generally, Gasse et al. (2019) represent the state of a combinatorial problem as a bipartite graph and utilize GCN to encode it.

Khalil等人(2017)，Kool等人(2019)通过加入图神经网络改进了上述方法。前者首先从structure2vec获取节点嵌入(Dai et al.， 2016)，然后将它们输入Q-learning模块进行决策。后者构建了一个基于注意的编码器-解码器系统。通过将强化学习模块替换为基于注意的解码器，提高了训练效率。这些工作取得了比以往算法更好的性能，证明了图神经网络的表示能力。更一般地，Gasse et al.(2019)将组合问题的状态表示为二部图，并利用GCN对其进行编码。

For speciﬁc combinatorial optimization problems, Nowak et al.(2018) focus on Quadratic Assignment Problem i.e. measuring the sim-ilarity of two graphs. The GNN based model learns node embeddings for each graph independently and matches them using attention mechanism. This method offers intriguingly good performance even in regimes where standard relaxation-based techniques appear to suffer. Zheng et al.(2020a) use a generative graph neural network to model the DAG-structure learning problem, which is also a combinatorial optimi-zation and NP-hard problem. NeuroSAT (Selsam et al., 2019) learns a message passing neural network to classify the satisﬁability of SAT problem. It proves that the learned model can generalize to novel dis-tributions of SAT and other problems which can be converted to SAT.

Unlike previous works which try to design speciﬁc GNNs to solve combinatorial problems, Sato et al. (2019) provide a theoretical analysis of GNN models on these problems. It establishes connections between GNNs and the distributed local algorithms which is a group of classical algorithms on graphs for solving these problems. Moreover, it demon-strates the optimal approximation ratios to the optimal solutions that the most powerful GNN can reach. It also proves that most of existing GNN models cannot exceed this upper bound. Furthermore, it adds coloring to the node feature to improve the approximation ratios.

对于特定的组合优化问题，Nowak等人(2018)专注于二次分配问题，即测量两个图的相似性。基于GNN的模型独立地学习每个图的节点嵌入，并使用注意机制进行匹配。这种方法提供了有趣的良好表现，即使在制度，标准的放松为基础的技术似乎是痛苦的。Zheng等人(2020a)使用生成图神经网络对dag -结构学习问题建模，这也是一个组合优化和np难问题。神经SAT (Selsam et al.， 2019)学习一个消息传递神经网络来对SAT问题的可满足性进行分类。证明了该模型可以推广到新的SAT分布问题和其他可以转化为SAT的问题。

与之前尝试设计特定GNN来解决组合问题的工作不同，Sato等人(2019)对这些问题的GNN模型进行了理论分析。它建立了GNN与分布式局部算法之间的联系，分布式局部算法是解决这些问题的一组经典的图上算法。此外，它证明了最强大的GNN可以达到的最优解的最优近似比。这也证明了现有的大多数GNN模型都不能超过这个上界。此外，对节点特征进行着色，以提高近似比。

8.1.7. Trafﬁc networks交通网络

Predicting trafﬁc states is a challenging task since trafﬁc networks are dynamic and have complex dependencies. Cui et al. (2018b) combine GNNs and LSTMs to capture both spatial and temporal dependencies. STGCN (Yu et al., 2018) constructs ST-Conv blocks with spatial and temporal convolution layers, and applies residual connection with bottleneck strategies. Zheng et al. (2020b), Guo et al. (2019) both incorporate attention mechanism to better model spatial temporal correlation.

预测交通状态是一项具有挑战性的任务，因为交通网络是动态的，具有复杂的依赖关系。Cui等人(2018b)结合GNN和lstm来捕获空间和时间依赖性。STGCN (Yu et al.， 2018)构造了具有空间和时间卷积层的ST-Conv块，并应用瓶颈策略的残差连接。Zheng等人(2020b)， Guo等人(2019)都纳入了注意力机制，以更好地建模时空相关性。

8.1.8. Recommendation systems推荐系统

User-item interaction prediction is one of the classic problems in recommendation. By modeling the interaction as a graph, GNNs can be utilized in this area. GC-MC (van den Berg et al., 2017) ﬁrstly applies GCN on user-item rating graphs to learn user and item embeddings. To efﬁciently adopt GNNs in web-scale scenarios, PinSage (Ying et al., 2018a) builds computational graphs with weighted sampling strategy for the bipartite graph to reduce repeated computation.

Social recommendation tries to incorporate user social networks to enhance recommendation performance. GraphRec (Fan et al., 2019) learns user embeddings from both item side and user side. Wu et al.(2019c) go beyond static social effects. They attempt to model homophily and inﬂuence effects by dual attentions.

用户-物品交互预测是推荐中的经典问题之一。通过将相互作用建模为图形，GNN可以用于这一领域。GC-MC (van den Berg et al.， 2017)首次将GCN应用于用户-物品评级图，学习用户和物品嵌入。为了在网络规模场景中有效地采用GNN, PinSage (Ying等人，2018a)构建了具有二部图加权采样策略的计算图，以减少重复计算。

社交推荐试图结合用户的社交网络来提高推荐性能。GraphRec (Fan等人，2019)从项目端和用户端学习用户嵌入。Wu等人(2019c)超越了静态的社会效应。他们试图通过双重关注来模拟同质性和影响效应。

8.1.9. Other Applications in structural scenarios结构场景中的其他应用

Because of the ubiquity of graph-structured data, GNNs have been applied to a larger variety of tasks than what we have introduced above. We list more scenarios very brieﬂy. In ﬁnancial market, GNNs are used to model the interaction between different stocks to predict the future trends of the stocks (Matsunaga et al., 2019; Yang et al., 2019; Chen et al., 2018c; Li et al., 2020). Kim et al. (2019) also predict the market index movement by formulating it as a graph classiﬁcation problem. In Software-Deﬁned Networks (SDN), GNNs are used to optimize the rout-ing performance (Rusek et al., 2019). In Abstract Meaning Representa-tion (AMR) graph to Text generation tasks, Song et al. (2018a), Beck et al.(2018) use GNNs to encode the graph representation of the abstract meaning.

由于图结构数据的普遍存在，GNN已经应用于比我们上面介绍的更广泛的任务。我们将简要地列出更多的场景。在金融市场中，GNN被用来模拟不同股票之间的相互作用，以预测股票的未来趋势(Matsunaga et al.， 2019;Yang等，2019;陈等，2018c;Li et al.， 2020)。Kim等人(2019)还通过将其描述为图形分类问题来预测市场指数的运动。在软件定义网络(SDN)中，GNN用于优化路由性能(Rusek et al.， 2019)。在抽象意义表示(AMR)图到文本生成任务中，Song等人(2018a)， Beck等人(2018)使用GNN对抽象意义的图表示进行编码。

8.2. Non-structural scenarios非结构性的场景

In this section we will talk about applications on non-structural sce-narios. Generally, there are two ways to apply GNNs on non-structural scenarios: (1) Incorporate structural information from other domains to improve the performance, for example using information from knowl-edge graphs to alleviate the zero-shot problems in image tasks; (2) Infer or assume the relational structure in the task and then apply the model to solve the problems deﬁned on graphs, such as the method in (Zhang et al., 2018d) which models text into graphs. Common non-structure scenarios include image, text, and programming source code (Allama-nis et al., 2018; Li et al., 2016). However, we only give detailed intro-duction to the ﬁrst two scenarios.

在本节中，我们将讨论在非结构性场景中的应用。一般来说，GNN在非结构化场景中的应用有两种方式:(1)结合其他领域的结构信息来提高性能，例如使用知识边缘图中的信息来缓解图像任务中的零镜头问题;(2)推断或假设任务中的关系结构，然后应用模型来解决图上定义的问题，如(Zhang et al.， 2018d)中将文本建模为图的方法。常见的非结构场景包括图像、文本和编程源代码(Allama-nis等人，2018;Li等人，2016)。但是，我们只详细介绍了前两个场景。

8.2.1. Image图像

Few(Zero)-shot Image Classiﬁcation. Image classiﬁcation is a very basic and important task in the ﬁeld of computer vision, which attracts much attention and has many famous datasets like ImageNet (Russa-kovsky et al., 2015). Recently, zero-shot and few-shot learning become more and more popular in the ﬁeld of image classiﬁcation. In N-shot learning, to make predictions for the test data samples in some classes, only N training samples in the same classes are provided in the training set. Thereby, few-shot learning restricts N to be small, and zero-shot re-quires N to be 0. Models must learn to generalize from the limited training data to make new predictions for testing data. Graph neural networks, on the other hand, can assist the image classiﬁcation system in these challenging scenarios.

First, knowledge graphs can be used as extra information to guide zero-shot recognition classiﬁcation (Wang et al., 2018d; Kampffmeyer et al., 2019). Wang et al. (2018d) make the visual classiﬁers learn not only from the visual input but also from word embeddings of the cate-gories’ names and their relationships to other categories. A knowledge graph is developed to help connect the related categories, and they use a 6-layer GCN to encode the knowledge graph. As the over-smoothing ef-fect happens when the graph convolution architecture becomes deep, the 6-layer GCN used in (Wang et al., 2018d) will wash out much useful information in the representation. To solve the smoothing problem, Kampffmeyer et al. (2019) use a single layer GCN with a larger neigh-borhood which includes both one-hop and multi-hop nodes in the graph. And it is proven effective in building a zero-shot classiﬁer from existing ones. As most knowledge graphs are large for reasoning, Marino et al.(2017) select some related entities to build a sub-graph based on the result of object detection and apply GGNN to the extracted graph for prediction. Besides, Lee et al. (2018b) also leverage the knowledge graph between categories. It further deﬁnes three types of relations between categories: super-subordinate, positive correlation, and negative corre-lation and propagates the conﬁdence of relation labels in the graph directly.

Except for the knowledge graph, the similarity between images in the dataset is also helpful for the few-shot learning (Garcia and Bruna, 2018). Garcia and Bruna (2018) build a weighted fully-connected image network based on the similarity and do message passing in the graph for few-shot recognition.

少(零)镜头图像分类。图像分类是计算机视觉领域中非常基础和重要的任务，备受关注，有很多著名的数据集，如ImageNet (Russa-kovsky et al.， 2015)。近年来，零镜头学习和少镜头学习在图像分类领域越来越受欢迎。在N-shot学习中，为了对某些类中的测试数据样本进行预测，训练集中只提供了N个同类的训练样本。因此，少次学习限制N为小，零次学习要求N为0。模型必须学会从有限的训练数据中泛化，对测试数据做出新的预测。另一方面，图神经网络可以在这些具有挑战性的场景中帮助图像分类系统。

首先，知识图可以作为额外的信息来指导零镜头识别分类(Wang et al.， 2018d;Kampffmeyer等人，2019)。Wang等人(2018d)使视觉分类器不仅从视觉输入中学习，还从类别名称及其与其他类别的关系的词嵌入中学习。开发了一个知识图来帮助连接相关的类别，他们使用6层GCN来编码知识图。当图的卷积结构变深时，会发生过平滑效应，(Wang et al.， 2018d)中使用的6层GCN会洗掉表示中很多有用的信息。为了解决平滑问题，Kampffmeyer等人(2019)使用了具有较大邻域的单层GCN，其中包括图中的单跳和多跳节点。在现有分类器的基础上建立零次分类器是有效的。由于大多数知识图都很大，无法进行推理，Marino et al.(2017)根据对象检测结果，选择一些相关实体构建子图，并对提取的图应用GGNN进行预测。此外，Lee等人(2018b)也利用了类别之间的知识图谱。它进一步定义了三类类别之间的关系:超从属关系、正相关关系和负相关关系，并直接在图中传播关系标签的置信度。

除了知识图，数据集中图像之间的相似性也有助于少镜头学习(Garcia and Bruna, 2018)。Garcia和Bruna(2018)基于相似度构建了一个加权的全连接图像网络，并在图中进行信息传递以实现少镜头识别。

Visual Reasoning. Computer-vision systems usually need to perform reasoning by incorporating both spatial and semantic information. So it is natural to generate graphs for reasoning tasks.

A typical visual reasoning task is visual question answering (VQA). In this task, a model needs to answer the questions about an image given the text description of the questions. Usually, the answer lies in the spatial relations among objects in the image. Teney et al. (2017) construct an image scene graph and a question syntactic graph. Then they apply GGNN to train the embeddings for predicting the ﬁnal answer. Despite spatial connections among objects, Norcliffebrown et al. (2018) build the relational graphs conditioned on the questions. With knowledge graphs, Wang et al. (2018c), Narasimhan et al. (2018) can perform ﬁner relation exploration and more interpretable reasoning process.

Other applications of visual reasoning include object detection, interaction detection, and region classiﬁcation. In object detection (Hu et al., 2018; Gu et al., 2018), GNNs are used to calculate RoI features. In interaction detection (Qi et al., 2018; Jain et al., 2016), GNNs are message-passing tools between humans and objects. In region classiﬁ-cation (Chen et al., 2018d), GNNs perform reasoning on graphs that connects regions and classes.

视觉推理。计算机视觉系统通常需要结合空间和语义信息来执行推理。因此，为推理任务生成图形是很自然的。

典型的视觉推理任务是视觉问题回答(VQA)。在这个任务中，模型需要根据问题的文本描述回答关于图像的问题。通常，答案在于图像中物体之间的空间关系。Teney等人(2017)构建了一个图像场景图和一个问题句法图。然后应用GGNN对嵌入进行训练，以预测最终答案。尽管对象之间存在空间联系，但Norcliffebrown等人(2018)根据问题构建了关系图。Wang et al. (2018c)， Narasimhan et al.(2018)使用知识图可以进行更精细的关系探索和更可解释的推理过程。

视觉推理的其他应用包括目标检测、交互检测和区域分类。在目标检测中(Hu等人，2018;Gu等人，2018)，GNN用于计算RoI特征。在交互检测中(Qi等人，2018;Jain等人，2016)，GNN是人类和物体之间的消息传递工具。在区域分类(Chen et al.， 2018d)中，GNN在连接区域和类的图上执行推理。

Semantic Segmentation. Semantic segmentation is a crucial step towards image understanding. The task here is to assign a unique label (or category) to every single pixel in the image, which can be considered as a dense classiﬁcation problem. However, regions in images are often not grid-like and need non-local information, which leads to the failure of traditional CNN. Several works utilize graph-structured data to handle it.

Liang et al. (2016) use Graph-LSTM to model long-term dependency together with spatial connections by building graphs in the form of distance-based superpixel map and applying LSTM to propagate neigh-borhood information globally. Subsequent work improves it from the perspective of encoding hierarchical information (Liang et al., 2017).

Furthermore, 3D semantic segmentation (RGBD semantic segmenta-tion) and point clouds classiﬁcation utilize more geometric information and therefore are hard to model by a 2D CNN. Qi et al. (2017b) construct a k-nearest neighbor (KNN) graph and use a 3D GNN as the propagation model. After unrolling for several steps, the prediction model takes the hidden state of each node as input and predicts its semantic label. As there are always too many points in point clouds classiﬁcation task, Landrieu and Simonovsky (2018) solve large-scale 3D point clouds seg-mentation by building superpoint graphs and generating embeddings for them. To classify supernodes, Landrieu and Simonovsky (2018) leverage GGNN and graph convolution. Wang et al. (2018e) propose to model point interactions through edges. They calculate edge representation vectors by feeding the coordinates of its terminal nodes. Then node embeddings are updated by edge aggregation.

语义分割。语义分割是图像理解的关键一步。这里的任务是为图像中的每个像素分配一个唯一的标签(或类别)，这可以被认为是一个密集分类问题。然而，图像中的区域往往不是网格状的，需要非局部信息，这导致了传统CNN的失败。一些作品利用图结构数据来处理它。

Liang et al.(2016)使用图-LSTM结合空间连接建模长期依赖关系，以基于距离的超像素地图形式构建图，并应用LSTM在全球范围内传播社区信息。后续工作从编码分层信息的角度对其进行了改进(Liang et al.， 2017)。

此外，三维语义分割(RGBD semantic segmentation -tion)和点云分类利用了更多的几何信息，因此很难用二维CNN建模。Qi等人(2017b)构造了一个k-最近邻(KNN)图，并使用3D GNN作为传播模型。在展开几个步骤后，预测模型将每个节点的隐藏状态作为输入，并预测其语义标签。由于点云分类任务中总是有太多的点，Landrieu和Simonovsky(2018)通过构建超点图并为其生成嵌入来解决大规模三维点云分割问题。为了对超级节点进行分类，Landrieu和Simonovsky(2018)利用了GGNN和图卷积。Wang等人(2018e)提出通过边来建模点的相互作用。它们通过输入其终端节点的坐标来计算边缘表示向量。然后通过边缘聚合对节点嵌入进行更新。

8.2.2. Text文本

The graph neural networks could be applied to several tasks based on texts. It could be applied to both sentence-level tasks (e.g. text classiﬁ-cation) as well as word-level tasks (e.g. sequence labeling). We list several major applications on text in the following.	图神经网络可以应用于多种基于文本的任务。它既可以应用于句子级任务(如文本分类)，也可以应用于单词级任务(如序列标记)。我们在下面列出了文本的几个主要应用。
Text Classiﬁcation. Text classiﬁcation is an important and classical problem in natural language processing. Traditional text classiﬁcation uses bag-of-words features. However, representing a text as a graph of words can further capture semantics between non-consecutive and long distance words (Peng et al., 2018). Peng et al. (2018) use a graph-CNN based deep learning model to ﬁrst convert texts to graph-of-words, and then use graph convolution operations in (Niepert et al., 2016) to convolve the word graph. Zhang et al. (2018d) propose the Sentence LSTM to encode text. They view the whole sentence as a single state, which consists of sub-states for individual words and an overall sentence-level state. They use the global sentence-level representation for classiﬁcation tasks. These methods either view a document or a sentence as a graph of word nodes. Yao et al. (2019) regard the documents and words as nodes to construct the corpus graph and use the Text GCN to learn embeddings of words and documents. Sentiment classiﬁcation could also be regarded as a text classiﬁcation problem and a Tree-LSTM approach is proposed by (Tai et al., 2015).	文本分类。文本分类是自然语言处理中一个重要而经典的问题。传统的文本分类使用词袋特征。然而，将文本表示为单词图可以进一步捕获非连续和长距离单词之间的语义(Peng et al.， 2018)。Peng et al.(2018)使用基于图- cnn的深度学习模型首先将文本转换为词图，然后在(Niepert et al.， 2016)中使用图卷积操作对词图进行卷积。Zhang等人(2018d)提出了句子LSTM来编码文本。他们将整个句子视为一个单一的状态，由单个单词的子状态和整个句子级别的状态组成。他们使用全局句子级表示法进行分类任务。这些方法将文档或句子视为单词节点图。Yao et al.(2019)将文档和单词作为节点构建语料库图，并使用Text GCN学习单词和文档的嵌入。情感分类也可以看作是一个文本分类问题，Tree-LSTM方法由(Tai et al.， 2015)提出。
Sequence Labeling. Given a sequence of observed variables (such as words), sequence labeling is to assign a categorical label for each vari-able. Typical tasks include POS-tagging, where we label the words in a sentence by their part-of-speech, and Named Entity Recognition (NER), where we predict whether each word in a sentence belongs to a part of a Named Entity. If we consider each variable in the sequence as a node and the dependencies as edges, we can utilize the hidden state of GNNs to address the task. Zhang et al. (2018d) utilize the Sentence LSTM to label the sequence. They have conducted experiments on POS-tagging and NER tasks and achieves promising performances. Semantic role labeling is another task of sequence labeling. Marche-ggiani and Titov (2017) present a Syntactic GCN to solve the problem. The Syntactic GCN which operates on the direct graph with labeled edges is a special variant of the GCN (Kipf and Welling, 2017). It integrates edge-wise gates which let the model regulate contributions of individual dependency edges. The Syntactic GCNs over syntactic dependency trees are used as sentence encoders to learn latent feature representations of words in the sentence.	序列标签。给定一个观察变量序列(如单词)，序列标记就是为每个变量分配一个分类标签。典型的任务包括pos标记，即根据词性标记句子中的单词，以及命名实体识别(NER)，即预测句子中的每个单词是否属于命名实体的一部分。如果我们将序列中的每个变量视为节点，将依赖项视为边，我们可以利用GNN的隐藏状态来解决任务。Zhang等(2018d)利用句子LSTM标记序列。他们在POS-tagging和NER任务上进行了实验，取得了很好的效果。语义角色标注是序列标注的另一项任务。Marche-ggiani和Titov(2017)提出了一种句法GCN来解决这个问题。句法GCN是GCN的一种特殊变体(Kipf和Welling, 2017)，它在具有标记边的直接图上操作。它集成了沿边门，使模型能够调节各个依赖边的贡献。基于句法依赖树的句法GCNs被用作句子编码器来学习句子中单词的潜在特征表示。
Neural Machine Translation. The neural machine translation (NMT) task is to translate text from source language to target language automatically using neural networks. It is usually considered as a sequence-to-sequence task. Transformer (Vaswani et al., 2017) in-troduces the attention mechanisms and replaces the most commonly used recurrent or convolutional layers. In fact, the Transformer assumes a fully connected graph structure between words. Other graph structure can be explored with GNNs. One popular application of GNN is to incorporate the syntactic or semantic information into the NMT task. Bastings et al. (2017) utilize the Syntactic GCN on syntax-aware NMT tasks. Marcheggiani et al. (2018) incorporate information about the predicate-argument structure of source sentences (namely, semantic-role representations) using Syntactic GCN and compare the results between incorporating only syntactic, only semantic information and both of the information. Beck et al. (2018) utilize the GGNN in syntax-aware NMT. They convert the syntactic de-pendency graph into a new structure called the Levi graph (Levi, 1942) by turning the edges into additional nodes and thus edge labels can be represented as embeddings.	神经机器翻译。神经机器翻译(NMT)任务是利用神经网络将文本从源语言自动翻译成目标语言。它通常被认为是一个序列到序列的任务。Transformer (Vaswani et al.， 2017)引入了注意力机制，并取代了最常用的循环层或卷积层。事实上，Transformer在单词之间假设一个完全连接的图结构。其他的图结构可以用GNN来探索。 GNN的一个流行应用是将句法或语义信息合并到NMT任务中。Bastings等人(2017)在语法感知NMT任务中使用了Syntactic GCN。Marcheggiani等人(2018)使用句法GCN合并了源句的谓词-参数结构(即语义-角色表示)的信息，并比较了仅合并句法、仅合并语义信息和同时合并这两种信息的结果。Beck等人(2018)在语法感知NMT中使用了GGNN。他们通过将边转换为额外的节点，将语法依赖图转换为称为Levi图的新结构(Levi, 1942)，因此边标签可以表示为嵌入。
Relation Extraction. Extracting semantic relations between entities in texts helps to expand existing knowledge base. Traditional methods use CNNs or RNNs to learn entities’ feature and predict the relation type for a pair of entities. A more sophisticated way is to utilize the de-pendency structure of the sentence. A document graph can be built where nodes represent words and edges represent various dependencies such as adjacency, syntactic dependencies and discourse relations. Zhang et al.(2018f) propose an extension of graph convolutional networks that is tailored for relation extraction and apply a pruning strategy to the input trees. Cross-sentence N-ary relation extraction detects relations among n entities across multiple sentences. Peng et al. (2017) explore a general framework for cross-sentence n-ary relation extraction by applying graph LSTMs on the document graphs. Song et al. (2018b) also use a graph-state LSTM model and speed up computation by allowing more parallelization.	关系抽取。提取文本中实体之间的语义关系有助于扩充现有知识库。传统方法使用cnn或rnn来学习实体的特征并预测一对实体的关系类型。一个更复杂的方法是利用句子的依存结构。可以构建一个文档图，其中节点表示单词，边表示各种依赖关系，如邻接关系、语法依赖关系和话语关系。Zhang等人(2018f)提出了一种图卷积网络的扩展，用于关系提取，并对输入树应用剪枝策略。跨句n元关系提取检测多句n个实体之间的关系。Peng等人(2017)通过在文档图上应用图LSTMs，探索了跨句n-ary关系提取的一般框架。Song等人(2018b)还使用了图状态LSTM模型，并通过允许更多的并行化来加快计算速度。
Event Extraction. Event extraction is an important information extraction task to recognize instances of speciﬁed types of events in texts. This is always conducted by recognizing the event triggers and then predicting the arguments for each trigger. Nguyen and Grishman (2018) investigate a convolutional neural network (which is the Syntactic GCN exactly) based on dependency trees to perform event detection. Liu et al.(2018) propose a Jointly Multiple Events Extraction (JMEE) framework to jointly extract multiple event triggers and arguments by introducing syntactic shortcut arcs to enhance information ﬂow to attention-based graph convolution networks to model graph information.	事件提取。事件提取是一项重要的信息提取任务，用于识别文本中指定类型的事件实例。这总是通过识别事件触发器，然后预测每个触发器的参数来实现。Nguyen和Grishman(2018)研究了一种基于依赖树的卷积神经网络(确切地说，是句法GCN)来执行事件检测。Liu等人(2018)提出了联合多事件提取(JMEE)框架，通过引入语法快捷弧来增强信息流到基于注意的图卷积网络来建模图信息，从而联合提取多个事件触发器和参数。
Fact Veriﬁcation. Fact veriﬁcation is a task requiring models to extract evidence to verify given claims. However, some claims require reasoning on multiple pieces of evidence. GNN-based methods like GEAR (Zhou et al., 2019) and KGAT (Liu et al., 2020) are proposed to conduct evidence aggregating and reasoning based on a fully connected evidence graph. Zhong et al. (2020) build an inner-sentence graph with the in-formation from semantic role labeling and achieve promising results.	事实验证。事实验证是一项需要模型提取证据来验证给定主张的任务。然而，有些说法需要在多个证据的基础上进行推理。提出了基于GNN的方法，如GEAR (Zhou et al.， 2019)和KGAT (Liu et al.， 2020)，基于全连接证据图进行证据聚合和推理。Zhong等人(2020)利用语义角色标注的信息构建了句内图，并取得了有前景的结果。
Other Applications on Text. GNNs can also be applied to many other tasks on text. For example, GNNs are also used in question answering and reading comprehension (Song et al., 2018c; De Cao et al., 2019; Qiu et al., 2019; Tu et al., 2019; Ding et al., 2019). Another important direction is relational reasoning, relational networks (Santoro et al., 2017), interac-tion networks (Battaglia et al., 2016) and recurrent relational networks (Palm et al., 2018) are proposed to solve the relational reasoning task based on text.	文本的其他应用。GNN还可以应用于许多其他关于文本的任务。例如，GNN也用于问答和阅读理解(Song et al.， 2018c;De Cao等，2019;Qiu等，2019;Tu等人，2019;丁等人，2019)。另一个重要的方向是关系推理，关系网络(Santoro et al.， 2017)、交互网络(Battaglia et al.， 2016)和循环关系网络(Palm et al.， 2018)被提出来解决基于文本的关系推理任务。

9、Open problems开放性问题—鲁棒性、可解释性、图预训练、复杂图结构

Although GNNs have achieved great success in different ﬁelds, it is remarkable that GNN models are not good enough to offer satisfying solutions for any graph in any condition. In this section, we list some open problems for further researches.

尽管GNN在不同的领域都取得了巨大的成功，但值得注意的是，GNN模型还不足以为任何条件下的任何图提供令人满意的解。在本节中，我们列出了一些有待进一步研究的开放性问题。

Robustness. As a family of models based on neural networks, GNNs are also vulnerable to adversarial attacks. Compared to adversarial at-tacks on images or text which only focuses on features, attacks on graphs further consider the structural information. Several works have been proposed to attack existing graph models (Zügner et al., 2018; Dai et al., 2018b) and more robust models are proposed to defend (Zhu et al., 2019). We refer to (Sun et al., 2018) for a comprehensive review.

Interpretability. Interpretability is also an important research di-rection for neural models. But GNNs are also black-boxes and lack of explanations. Only a few methods (Ying et al., 2019; Baldassarre and Azizpour, 2019) are proposed to generate example-level explanations for GNN models. It is important to apply GNN models on real-world appli-cations with trusted explanations. Similar to the ﬁelds of CV and NLP, interpretability on graphs is also an important direction to investigate.

鲁棒性。GNN作为一类基于神经网络的模型，也容易受到对抗性攻击。与针对图像或文本的攻击只关注特征相比，针对图的攻击进一步考虑了结构信息。已经提出了几项工作来攻击现有的图模型(Zügner等人，2018;Dai等人，2018b)和更健壮的模型被提出来防御(Zhu等人，2019)。我们参考(Sun et al.， 2018)进行全面综述。

可解释性。可解释性也是神经模型的一个重要研究方向。但GNN也是黑盒，缺乏解释。只有少数方法(Ying et al.， 2019;Baldassarre和Azizpour, 2019)提出为GNN模型生成示例级解释。将GNN模型应用于具有可信解释的实际应用是非常重要的。与CV和NLP领域类似，图的可解释性也是一个重要的研究方向。

Graph Pretraining. Neural network-based models require abundant labeled data and it is costly to obtain enormous human-labeled data. Self-supervised methods are proposed to guide models to learn from unla-beled data which is easy to obtain from websites or knowledge bases. These methods have achieved great success in the area of CV and NLP with the idea of pretraining (Krizhevsky et al., 2012; Devlin et al., 2019). Recently, there have been works focusing on pretraining on graphs (Qiu et al., 2020; Hu et al., 2020b, 2020e; Zhang et al., 2020), but they have different problem settings and focus on different aspects. This ﬁeld still has many open problems requiring research efforts, such as the design of the pretraining tasks, the effectiveness of existing GNN models on learning structural or feature information, etc.

Complex Graph Structures. Graph structures are ﬂexible and com-plex in real life applications. Various works are proposed to deal with complex graph structures such as dynamic graphs or heterogeneous graphs as we have discussed before. With the rapid development of social networks on the Internet, there are certainly more problems, challenges and application scenarios emerging and requiring more powerful models.

图预训练。基于神经网络的模型需要大量的标记数据，而获取大量的人类标记数据成本很高。提出了自监督方法来指导模型从网站或知识库中容易获得的未标签数据中进行学习。基于预训练的思想，这些方法在CV和NLP领域取得了巨大成功(Krizhevsky et al.， 2012;Devlin等人，2019)。最近，有一些工作专注于对图的预训练(Qiu et al.， 2020;Hu等，2020b, 2020e;Zhang et al.， 2020)，但它们的问题设置不同，侧重点也不同。该领域仍然存在许多需要研究的开放性问题，例如预训练任务的设计、现有 GNN 模型在学习结构或特征信息方面的有效性等。

复杂图结构。图结构在实际应用中是灵活而复杂的。提出了各种工作来处理复杂的图结构，如我们之前讨论过的动态图或异构图。随着互联网社交网络的快速发展，必然会出现更多的问题、挑战和应用场景，需要更强大的模型。

10、Conclusion结论

Over the past few years, graph neural networks have become powerful and practical tools for machine learning tasks in graph domain. This progress owes to advances in expressive power, model ﬂexibility, and training algorithms. In this survey, we conduct a comprehensive review of graph neural networks. For GNN models, we introduce its variants categorized by computation modules, graph types, and training types. Moreover, we also summarize several general frameworks and introduce several theoretical analyses. In terms of application taxonomy, we divide the GNN applications into structural scenarios, non-structural scenarios, and other scenarios, then give a detailed review for applica-tions in each scenario. Finally, we suggest four open problems indicating the major challenges and future research directions of graph neural networks, including robustness, interpretability, pretraining and com-plex structure modeling.

在过去的几年里，图神经网络已经成为图域机器学习任务的强大而实用的工具。这种进步归功于表达能力、模型灵活性和训练算法的进步。在这项调查中，我们对图神经网络进行了全面的回顾。对于GNN模型，我们我们介绍了其按计算模块、图形类型和训练类型分类的变体。此外，我们还总结了几个通用框架，并介绍了一些理论分析。在应用分类方面，我们将GNN应用分为结构化场景、非结构化场景和其他场景，并对每种场景下的应用进行了详细的回顾。最后，我们提出了四个悬而未决的问题，表明图神经网络的主要挑战和未来的研究方向，包括鲁棒性、可解释性、预训练和复杂结构建模。

Acknowledgements致谢

This work is supported by the National Key Research and Develop-ment Program of China (No. 2018YFB1004503), the National Natural Science Foundation of China (NSFC No.61772302) and the Beijing Academy of Artiﬁcial Intelligence (BAAI). This work is also supported by 2019 Tencent Marketing Solution Rhino-Bird Focused Research Program FR201908.

国家重点研发计划项目(No. 2018YFB1004503)、国家自然科学基金项目(No. 61772302)和北京人工智能研究院(BAAI)资助。这项工作也得到了2019腾讯营销解决方案犀牛鸟重点研究计划FR201908的支持。

Appendix A. Datasets附录A.数据集

Many tasks related to graphs are released to test the performance of various graph neural networks. Such tasks are based on the following commonly used datasets. We list the datasets in Table A.4.

发布了许多与图相关的任务，以测试各种图神经网络的性能。这些任务基于以下常用的数据集。我们在表A.4中列出了数据集。

Table A.4

Datasets commonly used in tasks related to graph.数据集通常用于与图相关的任务。

Field

Datasets

Citation Networks

Bio-chemical

Graphs

Social Networks

Knowledge Graphs

Pubmed (Yang et al., 2016) Cora (Yang et al., 2016) Citeseer (Yang et al., 2016) DBLP (Tang et al., 2008)

MUTAG (Debnath et al., 1991) NCI-1 (Wale et al., 2008) PPI (Zitnik and Leskovec, 2017)D&D(Dobson and Doig, 2003) PROTEIN (Borgwardt et al., 2005) PTC (Toivonen et al., 2003)

Reddit (Hamilton et al., 2017c) BlogCatalog (Zafarani and Liu, 2009)

FB13 (Socher et al., 2013) FB15K (Bordes et al., 2013) FB15K237 (Toutanova et al., 2015) WN11 (Socher et al., 2013) WN18 (Bordes et al., 2013) WN18RR (Dettmers et al., 2018)

There are also a broader range of open source datasets repository which contains more graph datasets. We list them in Table A.5.

还有更广泛的开源数据集存储库，其中包含更多的图形数据集。我们将它们列在表A.5中。

Table A.5

Popular graph learning dataset collections.

流行的图形学习数据集集合。

Repository	Introduction	Link
Network Repository Graph Kernel Datasets	A scientiﬁc network data repository interactive visualization and mining tools. Benchmark datasets for graph kernels.	http://networkrepository.com https://ls11-www.cs.tu-dortmund.de/staff/m orris/graphkerneldatasets
Relational Dataset Repository Stanford Large Network	To support the growth of relational machine learning The SNAP library is developed to study large social and information networks.	https://relational.ﬁt.cvut.cz/ https://snap.stanford.edu/data/
Dataset Collection Open Graph Benchmark	Open Graph Benchmark (OGB) is a collection of benchmark datasets, data-loaders and evaluators for graph machine learning in PyTorch.	https://ogb.stanford.edu

Appendix B. Implementations

We ﬁrst list several platforms that provide codes for graph computing in Table B.6.
我们首先在表B.6中列出了几个提供图计算代码的平台。

Table B.6

Popular platforms for graph computing.

图计算的流行平台。

Platform

Link

Reference

PyTorch Geometric Deep Graph Library AliGraph

GraphVite

Paddle Graph Learning Euler

Plato

CogDL

OpenNE

https://github.com/rusty1s/pytorch_geometric https://github.com/dmlc/dgl

https://github.com/alibaba/aligraph

https://github.com/DeepGraphLearning/graphvite https://github.com/PaddlePaddle/PGL

https://github.com/alibaba/euler

https://github.com/tencent/plato

https://github.com/THUDM/cogdl/

https://github.com/thunlp/OpenNE/tree/pytorch

Fey and Lenssen (2019) Wang et al. (2019b)

Zhu et al. (2019a)

Zhu et al. (2019b)

Next we list the hyperlinks of the current open source implementations of some famous GNN models in Table B.7:

接下来，我们在表B.7中列出了一些著名GNN模型的当前开源实现的超链接:

Table B.7

Source code of the models mentioned in the survey.

调查中提到的模型的源代码。

Model

Link

GGNN (2015)

Neurals FPs (2015)

ChebNet (2016)

DNGR (2016)

SDNE (2016)

GAE (2016)

DRNE (2016)

Structural RNN (2016)

DCNN (2016)

GCN (2017)

CayleyNet (2017)

GraphSage (2017)

GAT (2017)

CLN (2017)

ECC (2017)

MPNNs (2017)

MoNet (2017)

https://github.com/yujiali/gGNN

https://github.com/HIPS/neural-ﬁngerprint

https://github.com/mdeff/cnn_graph

https://github.com/ShelsonCao/DNGR

https://github.com/suanrong/SDNE

https://github.com/limaosen0/Variational-Graph-Auto-Encoders https://github.com/tadpole/DRNE

https://github.com/asheshjain399/RNNexp

https://github.com/jcatw/dcnn

https://github.com/tkipf/gcn

https://github.com/amoliu/CayleyNet

https://github.com/williamleif/GraphSAGE

https://github.com/PetarV-/GAT

https://github.com/trangptm/Column_networks

https://github.com/mys007/ecc

https://github.com/brain-research/mpnn

https://github.com/pierrebaque/GeometricConvolutionsBench

Table B.7 (continued )

Model

Link

JK-Net (2018)

SSE (2018)

LGCN (2018)

FastGCN (2018)

DiffPool (2018)

GraphRNN (2018)

MolGAN (2018)

NetGAN (2018)

DCRNN (2018)

ST-GCN (2018)

RGCN (2018)

AS-GCN (2018)

DGCN (2018)

GaAN (2018)

DGI (2019)

GraphWaveNet (2019)

HAN (2019)

https://github.com/ShinKyuY/Representation_Learning_on_Graphs_with_Jumping_Knowledge_Networks https://github.com/Hanjun-Dai/steady_state_embedding

https://github.com/divelab/lgcn/

https://github.com/matenure/FastGCN

https://github.com/RexYing/diffpool

https://github.com/snap-stanford/GraphRNN

https://github.com/nicola-decao/MolGAN

https://github.com/danielzuegner/netgan

https://github.com/liyaguang/DCRNN

https://github.com/yysijie/st-gcn

https://github.com/tkipf/relational-gcn

https://github.com/huangwb/AS-GCN

https://github.com/ZhuangCY/DGCN

https://github.com/jennyzhang0215/GaAN

https://github.com/PetarV-/DGI

https://github.com/nnzhan/Graph-WaveNet

https://github.com/Jhy1993/HAN

As the research ﬁled grows rapidly, we recommend our readers the paper list published by our team, GNNPapers (https://github.com/thunlp/GNNpapers), for recent papers.
随着研究领域的迅速发展，我们向读者推荐我们的团队GNNPapers (https://github.com/thunlp/GNNpapers)发布的最新论文列表。