图机器学习基础知识——CS224W(17-scalable)

CS224W: Machine Learning with Graphs

Stanford / Winter 2021

17-scalable

  • 考虑SGD训练GNN(以节点分类为例)

    • 计算损失时,随机选取一些节点进行反向传播

    • 这些节点互相之间是孤立的无关联

    • 但GNN节点之间是互相有关联的不独立的,这种损失计算方式不能有效地训练GNNs

  • 考虑Full-batch训练

    • GPU Memory不够

GraphSAGE Neighbor Sampling: Scaling up GNNs

GraphSAGE Neighbor Sampling: Scaling up GNNs

  • Stochastic Training of GNNs

    • Key Insight: To compute embedding of a single node, all we need is the K-hop neighborhood

    • Given a set of M different nodes in a mini-batch, we can generate their embeddings using M computational graphs

    • 这样仅需加载一些节点计算图内的节点即可,省GPU Memory

    • 但需要得到整个K-hop邻域信息并且进行聚合等变换,计算消耗大

    在这里插入图片描述

    在这里插入图片描述

  • Neighbor Sampling

    • For k = 1 , 2 , . . . , k k=1,2,...,k k=1,2,...,k

      • For each node in K-hop neighborhood

      • (Randomly) sample at most H k H_k Hk neighbors

    在这里插入图片描述

  • How to sample the nodes

    • Random Sampling

      • fast but many times not optimal (may sample many “unimportant” nodes)
    • Random Walk with Restarts

      • Natural graphs are “scale free”, sampling random neighbors, samples many low degree “leaf” nodes

      • Strategy to sample important nodes

        • Compute Random Walk with Restarts score R i R_i Ri starting at the green node

        • At each level sample H H H neighbors i i i with the highest R i R_i Ri

      • This strategy works much better in practice

      在这里插入图片描述

Subgraph Sampling

Subgraph Sampling

  • Issues with Neighbor Sampling

    • The size of computational graph becomes exponentially large w.r.t. the #GNN layers

    • Computation is redundant, especially when nodes in a mini-batch share many neighbors

    在这里插入图片描述

  • Key Idea

    • We can sample a small subgraph of the large graph and then perform the efficient layer-wise node embeddings update over the subgraph

    在这里插入图片描述

    • Subgraphs should retain edge connectivity structure of the original graph as much as possible

    • This way, the GNN over the subgraph generates embeddings closer to the GNN over the original graph

    在这里插入图片描述

Cluster-GCN

Cluster-GCN

  • Key Insight

    • Sample a community as a subgraph. Each subgraph retains essential local connectivity pattern of the original graph

    在这里插入图片描述

  • Steps

    • Pre-processing

      Given a large graph, partition it into groups of nodes (i.e., node-induced subgraphs)

      • We can use any scalable community detection methods, e.g., Louvain, METIS

      • Notice: Between-group edges are not included in G 1 , . . . , G C G_1, ..., G_C G1,...,GC

      在这里插入图片描述

    • Mini-batch training

      Sample one node group at a time. Apply GNN’s message passing over the induced subgraph

      在这里插入图片描述

      在这里插入图片描述

    在这里插入图片描述

  • Issues with Cluster-GCN

    • The induced subgraph removes between-group links

      • As a result, messages from other groups will be lost during message passing, which could hurt the GNN’s performance

      在这里插入图片描述

    • Graph community detection algorithm puts similar nodes together in the same group

      • Sampled node group tends to only cover the small-concentrated portion of the entire data

      在这里插入图片描述

    • Sampled nodes are not diverse enough to be represent the entire graph structure

      • As a result, the gradient averaged over the sampled nodes becomes unreliable

      • Fluctuates a lot from a node group to another

      • In other words, the gradient has high variance

Advanced Cluster-GCN

Advanced Cluster-GCN

  • Key Idea

    • Aggregate multiple node groups per mini-batch

    • Partition the graph into relatively-small groups of nodes

    • For each mini-batch

      • Sample and aggregate multiple node groups

      • Construct the induced subgraph of the aggregated node group

      • The rest is the same as vanilla Cluster-GCN

    在这里插入图片描述

  • Steps

    • Pre-processing

      • The partition needs to be small so that even if multiple of them are aggregated, the resulting group would not be too large
    • Mini-batch training

      • randomly sample a set of q q q node groups

      • Aggregate all nodes across the sampled node groups

      • Extract the induced subgraph (多个groups组成一个induce graph从而使多个聚簇有连边,梯度更稳定)

Simplifying GNNs

Simplifying GNNs

  • Simplify GCN by removing ReLU non-linearity

    H ( k + 1 ) = A ~ H ( k ) W k T \boldsymbol{H}^{(k+1)}=\widetilde{\boldsymbol{A}} \boldsymbol{H}^{(k)} \boldsymbol{W}_{\boldsymbol{k}}^{\mathrm{T}} H(k+1)=A H(k)WkT

    在这里插入图片描述

    • Removing ReLU significantly simplifies GCN!

      H ( K ) = A ~ K X W T \boldsymbol{H}^{(K)}=\widetilde{\boldsymbol{A}}^{K} \boldsymbol{X} \boldsymbol{W}^{\mathrm{T}} H(K)=A KXWT

      • A ~ K X \widetilde{\boldsymbol{A}}^{K} \boldsymbol{X} A KX does not contain any learnable hence, it can be pre-computed
    • Let X ~ = A ~ K X \widetilde{\boldsymbol{X}}=\widetilde{\boldsymbol{A}}^{K} \boldsymbol{X} X =A KX be pre-computed matrix

    • Simplified GCN’s final embedding is

      H ( K ) = X ~ W T \boldsymbol{H}^{(K)}=\widetilde{\boldsymbol{X}} \boldsymbol{W}^{\mathrm{T}} H(K)=X WT

      • It’s just a linear transformation of pre-computed matrix
    • Back to the node embedding form

      h v ( K ) = W X ~ v h_{v}^{(K)}=\boldsymbol{W} \widetilde{\boldsymbol{X}}_{v} hv(K)=WX v

      • Embedding of node v v v only depends on its own (pre-processed) feature

    在这里插入图片描述

  • Potential Issue of Simplified GCN

    • Compared to the original GNN models, simplified GCN’s expressive power is limited due to the lack of non-linearity in generating node embeddings
  • Performance of Simplified GCN

    • Surprisingly, in semi-supervised node classification benchmark, simplified GCN works comparably to the original GNNs despite being less expressive.
  • When does Simplified GCN Work?

    • Many node classification tasks exhibit homophily structure, i.e., nodes connected by edges tend to share the same target labels (由于simplified的预计算过程就是不断聚合节点邻居的信息,所以对于节点分类任务来说表现得还行(因为相邻的节点趋向拥有相同的label,而这种邻居聚合方式就是传播label信息的途径))

    在这里插入图片描述

    在这里插入图片描述

  • 28
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值