[论文精读]BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks

v2版本,于2024.4.28 remastered

论文网址:BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks | IEEE Journals & Magazine | IEEE Xplore

论文代码:GitHub - HennyJie/BrainGB: Officially Accepted to IEEE Transactions on Medical Imaging (TMI, IF: 11.037) - Special Issue on Geometric Deep Learning in Medical Imaging.


英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用!

1. 省流版

1.1. 论文总结图

2. 论文逐段精读

2.1. Abstract

        ①At present, there is still a lack of systematic research on brain network analysis

        ②They proposed Brain Graph Neural Network Benchmark (BrainGB) to construct pipelines and modularize its implementation

2.2. Introduction

        ①The interactions between brain regions are decisive factors of analysing neurology and diseases

        ②Their contributions are: a) establishing a unified framework and evaluation criteria, b) summarizing the reprocessing and building pipeline of fMRI and sMRI, c) setting baselines as node features, message passing mechanisms, attention mechanisms, and pooling strategies

        ③Overall framework:


motif  n.(文学作品或音乐的)主题;装饰图案;动机;主旨

2.3. Preliminaries

2.3.1. Brain Network Analysis

        ①Brain network dataset is \mathcal{D}=\{\mathcal{G}_{n},y_{n}\}_{n=1}^{N} with N subjects, where \mathcal{G}_{n}=\{\mathcal{V}_{n},\mathcal{E}_{n}\},  y_{n} is the true label, \mathcal{V}_n=\mathcal{V}=\{v_i\}_{i=1}^M denotes M nodes (ROIs), \mathcal{E}_{n} denotes edges. The output of model is prediction \hat{y}_n

        ②Graph kernels and tensor factorization are too shallow to analyse the complicate brain structure

        ③The adjacency matrix W_{n} \in \mathbb{R}^{M\times M} is weighted (不知道会不会直接替换邻接矩阵A,不过其实也可以根据W去生成A)

aberration  n.异常行为;反常现象;脱离常规

2.3.2. Graph Neural Networks

        ①There are 3 differences between brain network and other graph: a) brain network is lack of node features, b) weights of connection can be positive or negative, c) ROI is fixed

2.4. Brain Network Dataset Construction

2.4.1. Background: Diverse Modalities of Brain Imaging

        There is a lot of scanning technology: Magnetic-Resonance Imaging (MRI), Electroencephalography (EEG) and Magnetoencephalography (MEG), Positron Emission Tomography (PET), Single-Photon Emission Computed Tomography (SPECT), and X-ray Computed Tomography (CT) etc.

(1)MRI Data

        ①Functional MRI (fMRI) indicates changes in blood oxygen and blood flow and reveals the functional activities

        ②Diffusion-weighted MRI (dMRI) fits brain structure through molecular (usually water) motion trajectories

trajectory  n.轨迹;(射体在空中的)轨道;弹道

(2)Challenges in MRI Preprocessings

        ①There are preprocessing tools like SPM, AFNI and FSL. However, it really takes time to learn them or use them

        ②None of a tool contains all the preprocessing functions of dMRI

        ③The publicity of datasets is also a big problem

        ④For different modalities, they need different methods of preprocessing

2.4.2. Brain Network Construction From Raw Data

(1)Functional Brain Network Construction

        ①Some preprocessing functions in different tools:

        ②There are partial correlations, mutual information, coherence, Granger causality etc. as the pairwise correlations between ROIs

(2)Structural Brain Network Construction

        ①Some preprocessing functions in different tools:

2.4.3. Discussions

        The combination of sMRI and fMRI might be more effective than single modality

metabolic  adj.代谢的;新陈代谢的

2.5. GNN Baselines for Brain Network Analysis

2.5.1. Node Feature Construction

        ①Identity: give one hot feature vector for each node

        ②Eigen: similar to PCA...

        ③Degree: a one dimension vector that records the degree of one node

        ④Degree profile:

 \begin{gathered} \boldsymbol{x}_i= [\deg(v_i)\parallel\min\text{ }(\mathcal{D}_i)\parallel\max\text{ }(\mathcal{D}_i) \\ \|\text{mean }(\mathcal{D}_i)\parallel\text{std }(\mathcal{D}_i)] \end{gathered}

        ⑤Connection profile: each row of one node is the original node feature

2.5.2. Message Passing Mechanisms

        ①The node feature h_i^{l} in layer l firstly get message from neighbors through sum operation:


where \mathcal{N}_{i} represents all the neighbors of node v_iw_{ij} denotes the edge weights between node v_i and v_jM_l denotes the message function

        ②They secondly update with:


where U_l can be any differentiable function

        ③They \boldsymbol{m}_{ij} might be influenced on:

egde wightsAggregation as in GCN, m_{ij}=\boldsymbol{h}_{j}\cdot w_{ij}, clearly reflects that the value of m_{ij} is related to the edge weight value
bin concatSet T buckets, trying it in [5, 10, 15, 20]. Each bucket possesses its own expression \mathbf{b}_t. Ranking all the edge weights and dividing them into T buckets in ascending order. Then, followed by an MLP: m_{ij}=\mathrm{MLP}(h_{j}\parallel b_{t}). It helps to find the similar connections.
edge weight concatm_{ij}=\mathrm{MLP}(\boldsymbol{h}_{j}\parallel d\cdot w_{ij}), where the value of d is the dimension of node feature. Such scaling extends the impact of edge feature 
node edge concatm_{ij}=\mathrm{MLP}(h_{i}\parallel h_{j}\parallel w_{ij}). It can reduce the over smoothing problem because “从每个中心节点的本地邻居传递的每条消息都使用其上一个时间步长的表示进行强化”(?我没太能理解,这不是两个节点之间的concat吗,和上一步有什么关系?
node concatm_{ij}=\mathrm{MLP}(h_{i}\parallel h_{j})

2.5.3. Attention-Enhanced Message Passing

        ①Attention mechanism is useful in collecting of important information

        ②Different from traditional graph attention mechanisms as in molecule, brain graph needs the edge features more and node features less

        ③So the attention will be:

Attention weighted

original GAT without edge features m_{ij}=\boldsymbol{h}_{j}\cdot\alpha_{ij} where a_{ij} denotes the  corresponding attention score and is come from nonlinear LeakyReLU in single-layer feed-forward neural network:


作者没说learnable linear transformation matrix \Theta, weight vector \boldsymbol{a} 的值诶

\sigma作者说是LeakyReLU nonlinearity,这是一个操作(function)还是说是个值啊

Edge weighted w/ attn

enhanced version of "egde wights" in GCN:

m_{ij}=h_{j}\cdot\alpha_{ij}\cdot w_{ij}

Attention edge sum

another enhanced version of "egde wights" in GCN:


Node edge concat w/ attn

enhanced version of "edge weight concat" in GCN:

m_{ij}=\mathrm{MLP}(h_{i}\parallel(h_{j}\cdot\alpha_{ij})\parallel w_{ij})

Node concat w/ attn

enhanced version of "node weight concat" in GCN:

m_{ij}=\mathrm{MLP}(h_i\parallel(h_j\cdot a_{ij}))

2.5.4. Pooling Strategies

        ①The pooling operator is like:

g_{n}=R\left(\{h_{k}\mid v_{k}\in\mathcal{G}_{n}\}\right)

        ②Provided pooling methods:

mean poolingg_{n}=\frac{1}{M}\sum_{k=1}^{M}h_{k}
sum poolingg_{n}=\sum_{k=1}^{M}\boldsymbol{h}_{k}
concat poolingg_{n}=\parallel_{k=1}^{M}h_{k}=h_{1}\parallel h_{2}\parallel\ldots\parallel h_{M}

        ③They think other complex pooling like hierarchical pooling, learnable pooling, clustering readout are usually regarded as independent GNN architecture rather than combinative modules. Therefore they did not provide them.

2.6. Experimental Analysis and Insights

2.6.1. Experimental Settings


        ①Four basic datasets: fMRI (HIV, PNC, ABCD) and dMRI(PPMI)

        ②Tasks: disease classification in HIV and PPMI, sex classification in PNC and ABCD

        ③Overall information of datasets:

        ④Human Immunodeficiency Virus Infection (HIV): 35 early HIV patients and 35 seronegative controls. Preprocessing procedures are: a) realignment to the first volume, b) slice timing correction, c) normalization, d) patial smoothness, e) band-pass filtering, f) linear trend removal of the time series.(很神奇的是ROI数量是116个但是size只包含90个大脑区域诶,怎么筛选的也没说

        ⑤Philadelphia Neuroimaging Cohort (PNC): 289 (57.46%) female. Preprocessing procedures are: a) slice timing correction, b) motion correction, c) registration, d) normalization, e) removal of linear trends, f) bandpass filtering, g) spatial smoothing. Also, they just choose 232 of 264.

        ⑥Parkinson’s Progression Markers Initiative (PPMI): 596 Parkinson’s
disease patients and 158 HC. Preprocessing procedures are: a) aligned to correct for head motion and eddy current distortions, b) remove the non-brain tissue and linearly align and register the skull-stripped images. Number of ROI is 84. Reconstructing the brain network by deterministic 2nd-order Runge-Kutta (RK2) wholebrain tractography algorithm.

        ⑦Adolescent Brain Cognitive Development Study (ABCD): subjects are 9-10 years old children from 21 sites. 3961 (50.1%) are female. Preprocessed by ABCD-HCP BIDS fMRI Pipeline12.

        ⑧⭐For sMRI, standardizing each edge weight by dividing by the maximum edge weight in one sample to ensure all the values are in [0,1]. For fMRI, they delete negative value in GCN and remain them in GAT (GCN can not handel them).

seronegative  adj. 血清反应阴性的     therapeutics  n. 疗法,治疗学


        ①Shallow models: M2E, MPCA and MK-SVM followed by logistic regression classification

        ②Deep models: BrainGNN and BrainNetCNN

(3)Implementation Details

        ①Optimizer: Adam

        ②Epoch: 20

        ③Learning rate: 1e-3

        ④Weight decay: 1e-4 for regularization

        ⑤Sample split: 80% training set and 20% test set

        ⑥Cross validation: 10 fold

        ⑦The mean performance of each model in each dataset:

2.6.2. Performance Report

(1)Node Feature

        ①⭐Adopting the row of node as the node feature perfoms best. 

        ②They think this method captures the overall information of brain network...(虽然我真的觉得这个可解释性差到极致了...

(2)Message Passing

        Generally discuss these methods and their performances.

(3)Attention Enhanced Message Passing

        ①⭐Attention performs better than without

        ②Generally discuss these methods and their performances.

(4)Pooling Strategies

        Generally discuss these methods and their performances.

(5)Other Baselines

        ①Deep models performs better than shallow models

        ②The BrainGNN might be out-of-memory (OOM) in large dataset

(6)Insights on Density Levels

        ①fMRI graphs are fully connected but sMRI graphs are not. There are about 22.64% edges in PPMI

        ②⭐They find that the more complex the models are, the more the hidden layers needed.

2.7. Open Source Benchmark Platform

        Briefly introduce BrainGB.

2.8. Discussion and Extensions


        ①They did not provide the graph-level module

        ②They are restricted due to the small sample size of the dataset

(2)Future prospects


        ②Better pretraining

        ③Sharing information of different diseases(好像看到过一篇文章是把ADHD和AD比较吗,说这俩玩意儿共同脑区的)

3. BrainGB库/代码

        参见另一篇文章:[代码复现]BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks-CSDN博客

4. 知识补充

4.1. Out-of-memory (OOM)












4.2. Weight decay

Weight Decay是一个正则化技术,其作用是抑制模型的过拟合,从而提高模型的泛化性。它是通过给损失函数增加模型权重L2范数的惩罚(penalty)来让模型权重不要太大,以此来减小模型的复杂度,从而抑制模型的过拟合。Weight Decay参数是在优化器上,而不是在Loss上。在损失函数中,weight decay是放在正则项(regularization)前面的一个系数,正则项一般指示模型的复杂度,所以weight decay的作用是调节模型复杂度对损失函数的影响,若weight decay很大,则复杂的模型损失函数的值也就大。

4.3. 2nd-order Runge-Kutta (RK2)


(2)参考学习1:Runge-Kutta(龙格-库塔)方法 | 基本思想 + 二阶格式 + 四阶格式-CSDN博客

(3)参考学习2:8.03: Runge-Kutta 2nd-Order Method for Solving Ordinary Differential Equations - Mathematics LibreTexts

5. Reference List

Cui H. et al. (2023) 'BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks', IEEE Transactions on Medical Imaging, 42 (2), pp. 493-506. doi" 10.1109/TMI.2022.3218745





