图机器学习基础知识——CS224W（17-scalable）

最新推荐文章于 2024-08-23 11:11:03 发布

ZreviaX

最新推荐文章于 2024-08-23 11:11:03 发布

阅读量821

点赞数 28

分类专栏：图机器学习基础知识文章标签：机器学习人工智能深度学习图卷积神经网络图机器学习

本文链接：https://blog.csdn.net/WindGrin_/article/details/137894645

版权

图机器学习基础知识专栏收录该内容

22 篇文章 1 订阅

订阅专栏

CS224W: Machine Learning with Graphs

Stanford / Winter 2021

17-scalable

考虑SGD训练GNN（以节点分类为例）
- 计算损失时，随机选取一些节点进行反向传播
- 这些节点互相之间是孤立的无关联
- 但GNN节点之间是互相有关联的不独立的，这种损失计算方式不能有效地训练GNNs
考虑Full-batch训练
- GPU Memory不够

GraphSAGE Neighbor Sampling: Scaling up GNNs

GraphSAGE Neighbor Sampling: Scaling up GNNs

Stochastic Training of GNNs
- Key Insight: To compute embedding of a single node, all we need is the K-hop neighborhood
- Given a set of M different nodes in a mini-batch, we can generate their embeddings using M computational graphs
- 这样仅需加载一些节点计算图内的节点即可，省GPU Memory
- 但需要得到整个K-hop邻域信息并且进行聚合等变换，计算消耗大
Neighbor Sampling
- For $k = 1, 2, ..., k$
  - For each node in K-hop neighborhood
  - (Randomly) sample at most $H_k$ neighbors
How to sample the nodes
- Random Sampling
  - fast but many times not optimal (may sample many “unimportant” nodes)
- Random Walk with Restarts
  - Natural graphs are “scale free”, sampling random neighbors, samples many low degree “leaf” nodes
  - Strategy to sample important nodes
    - Compute Random Walk with Restarts score $R_i$ starting at the green node
    - At each level sample $H$ neighbors $i$ with the highest $R_i$
  - This strategy works much better in practice

Subgraph Sampling

Subgraph Sampling

Issues with Neighbor Sampling
- The size of computational graph becomes exponentially large w.r.t. the #GNN layers
- Computation is redundant, especially when nodes in a mini-batch share many neighbors
Key Idea
- We can sample a small subgraph of the large graph and then perform the efficient layer-wise node embeddings update over the subgraph
- Subgraphs should retain edge connectivity structure of the original graph as much as possible
- This way, the GNN over the subgraph generates embeddings closer to the GNN over the original graph

Cluster-GCN

Cluster-GCN

Key Insight
- Sample a community as a subgraph. Each subgraph retains essential local connectivity pattern of the original graph
Steps
- Pre-processing
  
  Given a large graph, partition it into groups of nodes (i.e., node-induced subgraphs)
  - We can use any scalable community detection methods, e.g., Louvain, METIS
  - Notice: Between-group edges are not included in $G_1, ..., G_C$
- Mini-batch training
  
  Sample one node group at a time. Apply GNN’s message passing over the induced subgraph
Issues with Cluster-GCN
- The induced subgraph removes between-group links
  - As a result, messages from other groups will be lost during message passing, which could hurt the GNN’s performance
- Graph community detection algorithm puts similar nodes together in the same group
  - Sampled node group tends to only cover the small-concentrated portion of the entire data
- Sampled nodes are not diverse enough to be represent the entire graph structure
  - As a result, the gradient averaged over the sampled nodes becomes unreliable
  - Fluctuates a lot from a node group to another
  - In other words, the gradient has high variance

Advanced Cluster-GCN

Advanced Cluster-GCN

Key Idea
- Aggregate multiple node groups per mini-batch
- Partition the graph into relatively-small groups of nodes
- For each mini-batch
  - Sample and aggregate multiple node groups
  - Construct the induced subgraph of the aggregated node group
  - The rest is the same as vanilla Cluster-GCN
Steps
- Pre-processing
  - The partition needs to be small so that even if multiple of them are aggregated, the resulting group would not be too large
- Mini-batch training
  - randomly sample a set of $q$ node groups
  - Aggregate all nodes across the sampled node groups
  - Extract the induced subgraph (多个groups组成一个induce graph从而使多个聚簇有连边，梯度更稳定)

Simplifying GNNs

Simplifying GNNs

Simplify GCN by removing ReLU non-linearity

$\boldsymbol{H}^{(k+1)}=\widetilde{\boldsymbol{A}} \boldsymbol{H}^{(k)} \boldsymbol{W}_{\boldsymbol{k}}^{\mathrm{T}}$
- Removing ReLU significantly simplifies GCN!
  
  $\boldsymbol{H}^{(K)}=\widetilde{\boldsymbol{A}}^{K} \boldsymbol{X} \boldsymbol{W}^{\mathrm{T}}$
  - $\widetilde{\boldsymbol{A}}^{K} \boldsymbol{X}$ does not contain any learnable hence, it can be pre-computed
- Let $\widetilde{\boldsymbol{X}}=\widetilde{\boldsymbol{A}}^{K} \boldsymbol{X}$ be pre-computed matrix
- Simplified GCN’s final embedding is
  
  $\boldsymbol{H}^{(K)}=\widetilde{\boldsymbol{X}} \boldsymbol{W}^{\mathrm{T}}$
  - It’s just a linear transformation of pre-computed matrix
- Back to the node embedding form
  
  $h_{v}^{(K)}=\boldsymbol{W} \widetilde{\boldsymbol{X}}_{v}$
  - Embedding of node $v$ only depends on its own (pre-processed) feature
Potential Issue of Simplified GCN
- Compared to the original GNN models, simplified GCN’s expressive power is limited due to the lack of non-linearity in generating node embeddings
Performance of Simplified GCN
- Surprisingly, in semi-supervised node classification benchmark, simplified GCN works comparably to the original GNNs despite being less expressive.
When does Simplified GCN Work?
- Many node classification tasks exhibit homophily structure, i.e., nodes connected by edges tend to share the same target labels (由于simplified的预计算过程就是不断聚合节点邻居的信息，所以对于节点分类任务来说表现得还行（因为相邻的节点趋向拥有相同的label，而这种邻居聚合方式就是传播label信息的途径）)