【论文笔记(2)】图卷积网络介绍 Understanding Convolutions on Graphs

最新推荐文章于 2024-05-30 17:12:07 发布

该账户已不存在

最新推荐文章于 2024-05-30 17:12:07 发布

阅读量1.1k

点赞数

分类专栏：论文阅读笔记文章标签：网络神经网络深度学习机器学习人工智能

本文链接：https://blog.csdn.net/qq_42901861/article/details/121841077

版权

论文阅读笔记专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Ⅰ 论文信息

本文链接：Understanding Convolutions on Graphs

本文是2021年9月发表在distill上的有关图卷积神经网络的文章，它的介绍图神经网络的姊妹篇A Gentle Introduction to Graph Neural Networks已经在B站听沐神讲过了，决定再看看这篇介绍图卷积的加深对图的认识。

Ⅱ 论文框架

1 Introduction

2 The Challenges of Computation on Graphs

3 Problem Setting and Notation

4 Extending Convolutions to Graphs

5 Polynomial Filters on Graphs

5.1 The Graph Laplacian

The graph Laplacian $L$ is the square $n \times n$ matrix defined as: $L = D - A$ .
- 其中 $D$ 为diagonal degree matrix， $D_v=\sum_{u}A_{vu}$ ， $A$ 为0-1领接矩阵
- Laplacian $L$ 仅与图的结构有关，与节点特征无关
graph Laplacian具有很多有趣的性质，出现在许多与图有关的数学问题中，在后面的章节中会有所介绍

5.2 Polynomials of the Laplacian

polynomials of the Laplacian: $p_w(L) = w_0I_n+w_1L+w_2L^2+...+w_dL^d=\sum_{i=0}^dw_iL^i$
- 每个此形式的多项式都可以用一组系数向量表示 $w=[w_0,w_1,...,w_d]$
These polynomials can be thought of as the equivalent of ‘filters’ in CNNs, and the coefficients $w$ as the weights of the ‘filters’.
Once we have constructed the feature vector $x$ , we can define its convolution with a polynomial filter $p_w$ as: $x^′=p_w(L)x$
根据推导，当仅有 $w 1 = 1$ ,其余均为0时

Q：为什么 $L_vx$ 就直接推到第二行邻居 $u$ 了呢， $x_u$ 怎么跑出来了呢
A：第二行的 $u$ 是图中所有节点，是经过第三行的计算之后，由于 $D$ 、 $A$ 的特性才推导至第四行的 $u$ 为 $v$ 的邻居节点
每个节点 $v$ 的特征都是与它的邻居 $u$ 们的特征有关(combination)
- 邻居与节点的最大距离代表 the degree of localization，由 $d$ 表示

5.3 ChebNet

ChebNet中对 polynomial filters进行了重定义：
$p_w(L) = \sum_{i=1}^dw_iT_i(\widetilde{L})$

$T_i$ 是 degree-i Chebyshev polynomial of the first kind
$\widetilde{L}$ 是 normalized Laplacian ： $\widetilde{L}=\frac{2L}{\lambda_{max}(L)-I_n}$

这样进行重定义的意义：

$L$ 是半正定的， $\widetilde{L}$ 是 $L$ 的 scale-down version， $\widetilde{L}$ 的值在-1到1之间，从而防止 $\widetilde{L}$ 的幂的输入爆炸
Chebyshev polynomials 具有一些有趣的性质，使插值在数值上更加稳定

5.4 Polynomial Filters are Node-Order Equivariant（节点顺序等价）

The polynomial filters we considered here are actually independent of the ordering of the nodes. 这些多项式filters与图中节点的顺序是无关的。
A similar proof follows for higher degree polynomials: the entries in the powers of LL are equivariant to the ordering of the nodes.

5.5 Embedding Computation

现在可以像CNN一样将 ChebNet 层堆叠起来，中间加入非线性层。
在这里插入图片描述
在这个网络中，对每个结点使用的filter weights都是相同的、共享的，这点也和CNN中的卷积核权重共享一样。

6 Modern Graph Neural Networks

如5.2中的公式图所示，我们可以认为这卷积由两步组成：

Aggregating over immediate neighbour features $x_u$
Combining with the node’s own feature $x_v$

KEY IDEA: What if we consider different kinds of ‘aggregation’ and ‘combination’ steps, beyond what are possible using polynomial filters?

这些卷积可以被认为是相邻节点之间的 meassage-passing ：每一步之后，每个阶段都会收到来自邻居们的一些信息。

6.1 Embedding Computation

Message-passing 构成了如今许多GNN结构的脊柱，我们在此描述几种常用的GNN框架：

Graph Convoluntional Networks (GCN)
Graph Attention Networks (GAT)
Graph Sample and Aggregate (GraphSAGE)
Graph Isomorphism Network (GIN)

GCN

每一步 $k$ 的时候，可学习参数 $W, B$ 以及函数 $f$ 对所有节点都是共享的。这样使得GCN模型能够 scale well，因为参数的数量和图的size无关。
GAT

每一步 $k$ 的时候，可学习参数 $W, B$ 以及函数 $f$ 对所有节点都是共享的。这样使得GAT模型能够 scale well，因为参数的数量和图的size无关。

这里我们只用了单头注意力机制，多头注意力机制也是类似的。
在这里插入图片描述
3. GraphSAGE

原GraphSAGE论文中 ${AGG}_{u\in{\mathcal{N}}_v}({h_u^{(k-1)}})$ 有三种choices：

mean （与GCN相似）
dimension-wise Maximum
LSTM（在给邻居们排序之后）

在此，我们决定使用 RNN aggregator，因为它比LSTM更易于解释但是两者理念相似。

此外，原论文中还是用了 ‘neighbourhood sampling’ ：无论一个节点的neighbourhood有多大，都从中进行固定大小的随机采样。这样可以增大映射的多样性，同时使得算法可以在大图上使用。

可学习参数对于所有节点共享。这样使得GraphSAGE模型能够 scale well，因为参数的数量和图的size无关。
在这里插入图片描述
4. GIN

可学习参数对于所有节点共享。这样使得GIN模型能够 scale well，因为参数的数量和图的size无关。

6.2 Thoughts

评估不同的 aggregation functions是很有趣的事情，论文How Powerful are Graph Neural Networks中通过他们怎样保留邻居节点的特征进行了比较。

我们只讨论了只在节点上进行运算的GNN，现在还要新的也在边上进行计算的GNN，但message passing的概念是相同的。

7 Interactive Graph Neural Networks

8 From Local to Global Convolutions

目前我们所讲到的方法都进行的是 ‘local’ convolutions：每个节点的特征都是用它 local neighbors 的特征的函数来更新的。

尽管经过足够步数的meassage-passing能够最终保证图中所有节点的信息都能被传递，但我们想要更直接的进行 ‘global’ convolutions 的方法。

8.1 Spectral Convolutions

KEY IDEA：Given a feature vector $x$ , the Laplacian $L$ allows us to quantify how smooth $x$ is, with respect to $G$ .
在这里插入图片描述

8.2 Spectral Representations of Natural Images

These visualizations should convince you that the first eigenvectors are indeed smooth, and the smoothness correspondingly decreases as we consider later eigenvectors.

For any image $x$ , we can think of the initial entries of the spectral representation $\hat{x}$ as capturing ‘global’ image-wide trends, which are the low-frequency components, while the later entries as capturing ‘local’ details, which are the high-frequency components.

8.3 Embedding Computation

convolution in the spectral domain 需要的参数比 direct convolution in the natural domain 要少挺多。
此外，由于图中拉普拉斯特征向量的平滑度，使用光谱表示（spectral representation）会自动对相邻节点强制执行归纳偏差（inductive bias）以获得相似的表示。

Spectral Convolution 的执行过程：
在这里插入图片描述

8.4 Spectral Convolutions are Node-Order Equivariant

与拉普拉斯多项式filters相似，Spectral Convolutions 也是与节点顺序无关的。

Spectral Convolutions的缺点：

必须要从 $L$ 中计算特征矩阵 $U_m$ ，这对于大图而言是不可行的
即使算出了 $U_m$ ，计算效率也是很低的，因为 $U_m$ 和 $U_m^T$ 的重复乘法
学习到的filters是针对于所输入的图的，这意味着对于新的结构不同的图，它们并不适用

8.5 Global Propagation via Graph Embeddings

一个更简单的结合graph-level信息的方法是：compute embeddings of the entire graph by pooling node (and possibly edge) embeddings, and then using the graph embedding to update node embeddings, following an iterative scheme。但是这种方法忽略了图中潜在的拓扑。

9 Learning GNN Parameters

我们所讨论的embedding computations不论是spectral还是spatial的都是完全可微分的，这允许GNN以端到端的方式进行训练，只要设置一个合适的损失函数 $\mathcal{L}$ ：

Node Classification
传统的categorical cross-entropy：

GNN也适用于半监督的设置，可以只计算有标记的节点的loss：
Graph Classification
通过聚合点的表征，可以为整个图构建一个向量表征。这种图表征可被用于做包括分类在内的各种graph-level task。
Link Prediction
从相邻和不相邻的节点中采样节点对，并用这些节点对作为输入预测边是否存在。采用类似logistic regression的损失函数：
Node Clustering
仅仅把学习到的节点表征进行聚类。

另一种self-supervised方法是强制相邻的节点得到相似的embeddings，模仿random-walk方法如node2vec和DeepWalk：
在这里插入图片描述

10 Conclusion and Further Reading

推荐两篇图神经网络的综述：[29] [30]

GNNs in Practice
GCN中提出将update公式改为这种形式，一边在GPU上有效进行GNN的向量化实现：

Regularization技术如Dropout等也可以直接用于GNN中，此外还有图专用的正则化技术如DropEdge。
Different Kinds of Graphs
本文关注的是无向图，但是还有一些spatial convolution的简单变体可以用于有向图、时序图、异构图。
Pooling
为了graph-level task，pooling可用于学习图表征。
简单的方式就是将最后的节点表征聚合起来通过一个predict函数：
此外，还有医学更有力的POOLING的方式：SortPool，DiffPool，SAGPool。