Notes: Survey on GNN Acceleration


Survey on Graph Neural Network Acceleration: An Algorithmic Perspective
Xin Liu, Mingyu Yan, Lei Deng, Guoqi, Xiaochun, Dongrui Fan, Shirui Pan and Yuan Xie

1 Graph-level Optimization

Graph Sampling

Graph sampling methods generally select partial nodes in one node’s neighborhood or one layer in a graph to acquire subgraphs for subsequent training, which makes efficient computation for model training.

  • Node-wise sampling methods

    • Node-wise sampling is a fundamental sampling method that focuses on each node and its neighbors in a training graph
    • Existing works
      • GraphSAGE - randomly selects 2-hop neighbors
      • VR-GCN - restricts the number of neighbors
  • Layer-wise sampling methods

    • Layer-wise sampling samples a fixed number of nodes in each layer based on pre-computed probability
    • Existing works
      • FastGCN - samples a certain number of nodes per layer and reconstructs connections between two layers
      • AS-GCN - samples nodes according to the parent nodes sampled in the upper layer
  • Subgraph-based sampling methods

    • Subgraph-based sampling generates subgraphs for training in a two-step manner: sampling nodes and constructing connections (edges)
    • Moreover, subgraph sampling can be parallelized at the processing unit level
    • Existing works
      • GraphSAINT - utilized node sampler, edge sampler, and random walk sampler
  • Heterogeneous sampling methods

    • Heterogeneous sampling is designed to accelerate the training and handle the graph heterogeneity, i.e., imbalanced number and type of neighbors in a node’s neighborhood
    • Existing works
      • HetGNN - traverses and collects neighbors, which are then grouped by type and further sampled based on the visit frequency, in a random walk manner
      • HGT - samples different types of nodes orderly to achieve balanced neighboring distribution

Graph Sparsification

Graph sparsification methods typically remove task-irrelevant edges in a graph by designing a specific optimization goal.

  • Heuristic sparsification

    • DropEdge - randomly removes edges in each training epoch
    • FastGAT - presents a resistance-based spectral graph sparsification solution to remove useless edges, reducing the number of attention coefficients in a GAT model
  • Learnable module for sparsification

    • NeuralSparse - utilizes a DNN to learn a sparsification strategy based on the feedback of downstream tasks in training, thus casts graph sparsification as an optimization problem
    • GAUG: utilizes an edge predictor to acquire probabilities of edges in a graph

Graph Partition

Graph partition has been an NP-hard problem with the goal of reducing the original graph to smaller subgraphs that are constructed by mutually exclusive node groups.

  • Graph partition for generic GNN training

    • DistGNN - applies a vertex-cut graph partition method, which preserves replicas for nodes segmented with their neighbors
    • DistDGL - partitions a graph using METIS, and further colocates nodes/edges features with graph partitions
  • Graph partition for sampling-based GNN training

    • BGL - utilizes a graph partition method to minimize cross-partition communication in sampling subgraphs during GNN training, in which multisource BFS and greedy assignment heuristics are fully leveraged
    • Cluster-GCN - partitions a graph into multiple clusters using METIS, after which these clusters are randomly sampled to construct subgraphs
    • Pagraph - proposes a GNN-aware graph partition method to ensure balanced partitions in workload and avoid cross-partition visits in the sampling process

2 Model-level Optimizations

GNN Simplification

The forward propagation in the l l l-th layer in GCN training can be formulated as:
H l = σ ( D ~ − 1 / 2 A ~ D ~ − 1 / 2 H l − 1 W l − 1 ) , \textbf{H}^l=\sigma(\tilde{\textbf{D}}^{-1/2}\tilde{\textbf{A}}\tilde{\textbf{D}}^{-1/2}\textbf{H}^{l-1}\textbf{W}^{l-1}), Hl=σ(D~1/2A~D~1/2Hl1Wl1),in which linear aggregation of neighboring information and nonlinear activation are combined to update node representations. However, recent literature has argued that the efficiency of GNNs can be further promoted by simplifying redundant operations.

  • Generic simplified GNN

    • SGC - removes the nonlinear activation, i.e., ReLU, between each layer, with only the final Softmax function preserved
      O u t p u t = Softmax ( S ⋯ SSX W 1 W 2 ⋯ W l ) Output=\text{Softmax}(\textbf{S}\cdots\textbf{S}\textbf{S}\textbf{X}\textbf{W}^1\textbf{W}^2\cdots\textbf{W}^l) Output=Softmax(SSSXW1W2Wl)
      • Text and graph classification tasks
  • Special simplified GNN with tasks related

    • LightGCN and UltraGCN aim to simplify the GNN model for learning embeddings from user-item interaction graphs in a recommender system
    • Existing works
      • LightGCN - abandons feature transformation and nonlinear activation, and drops self-loops in A \textbf{A} A to simplify the GCN model used in CF
      • UltraGCN - proposes to abandon explicit message passing among multiple layers in LightGCN and get approximate ultimate embeddings directly

GNN Compression

Model compression is a technique that compresses a complicated model, such as a DNN, to a lightweight one with typically fewer parameters preserved.

  • Model quantification

    • Binarized DGCNN and BiGCN - similarly introduce binarization strategies, which compresses model parameters and graph features in GNNs to yield significant acceleration in inference, into GNNs
    • Degree-quant - proposes a quantization-aware training method for GNNs
  • Knowledge distillation

    • Knowledge distillation (KD) is a technique that extracts knowledge from a teacher model and injects it into a smaller one with similar performance maintained, which at the same time, yields acceleration on model inference
    • Existing works
      • Yang et al. (2020) - propose to preserve the local structure of a teacher model via a special-designed module
      • Yang et al. (2021) - presents an effective KD framework where a specially built student model can jointly benefit from a teacher model and prior knowledge
      • TinyGNN - bridges the gap of extracting neighbor information between a teacher mode and a student model via a combined use of a peer aware module and a neighbor distillation strategy
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值