[论文精读]BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks

夏莉莉iy

已于 2024-04-29 01:13:23 修改

阅读量1.2k

点赞数 17

分类专栏：论文精读文章标签：深度学习人工智能机器学习学习数据库 python

于 2024-01-11 22:16:59 首次发布

本文链接：https://blog.csdn.net/Sherlily/article/details/135524994

版权

论文精读专栏收录该内容

76 篇文章 9 订阅

订阅专栏

v2版本，于2024.4.28 remastered

论文网址：BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks | IEEE Journals & Magazine | IEEE Xplore

论文代码：GitHub - HennyJie/BrainGB: Officially Accepted to IEEE Transactions on Medical Imaging (TMI, IF: 11.037) - Special Issue on Geometric Deep Learning in Medical Imaging.

BrainGB网站：https://braingb.us

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用！

1. 省流版

1.1. 论文总结图

2. 论文逐段精读

2.1. Abstract

①At present, there is still a lack of systematic research on brain network analysis

②They proposed Brain Graph Neural Network Benchmark (BrainGB) to construct pipelines and modularize its implementation

2.2. Introduction

①The interactions between brain regions are decisive factors of analysing neurology and diseases

②Their contributions are: a) establishing a unified framework and evaluation criteria, b) summarizing the reprocessing and building pipeline of fMRI and sMRI, c) setting baselines as node features, message passing mechanisms, attention mechanisms, and pooling strategies

③Overall framework:

（不过主干部分只有GCN和GAT可选呢，其实还有一堆Conv都可以涵盖进去，GIN和GraphSAGE啥的）

motif n.(文学作品或音乐的)主题;装饰图案;动机;主旨

2.3. Preliminaries

2.3.1. Brain Network Analysis

①Brain network dataset is $\mathcal{D}=\{\mathcal{G}_{n},y_{n}\}_{n=1}^{N}$ with $N$ subjects, where $\mathcal{G}_{n}=\{\mathcal{V}_{n},\mathcal{E}_{n}\}$ , $y_{n}$ is the true label, $\mathcal{V}_n=\mathcal{V}=\{v_i\}_{i=1}^M$ denotes $M$ nodes (ROIs), $\mathcal{E}_{n}$ denotes edges. The output of model is prediction $\hat{y}_n$

②Graph kernels and tensor factorization are too shallow to analyse the complicate brain structure

③The adjacency matrix $W_{n} \in \mathbb{R}^{M\times M}$ is weighted （不知道会不会直接替换邻接矩阵A，不过其实也可以根据W去生成A）

aberration n.异常行为;反常现象;脱离常规

2.3.2. Graph Neural Networks

①There are 3 differences between brain network and other graph: a) brain network is lack of node features, b) weights of connection can be positive or negative, c) ROI is fixed

2.4. Brain Network Dataset Construction

2.4.1. Background: Diverse Modalities of Brain Imaging

There is a lot of scanning technology: Magnetic-Resonance Imaging (MRI), Electroencephalography (EEG) and Magnetoencephalography (MEG), Positron Emission Tomography (PET), Single-Photon Emission Computed Tomography (SPECT), and X-ray Computed Tomography (CT) etc.

（1）MRI Data

①Functional MRI (fMRI) indicates changes in blood oxygen and blood flow and reveals the functional activities

②Diffusion-weighted MRI (dMRI) fits brain structure through molecular (usually water) motion trajectories

trajectory n.轨迹;(射体在空中的)轨道;弹道

（2）Challenges in MRI Preprocessings

①There are preprocessing tools like SPM, AFNI and FSL. However, it really takes time to learn them or use them

②None of a tool contains all the preprocessing functions of dMRI

③The publicity of datasets is also a big problem

④For different modalities, they need different methods of preprocessing

2.4.2. Brain Network Construction From Raw Data

（1）Functional Brain Network Construction

①Some preprocessing functions in different tools:

②There are partial correlations, mutual information, coherence, Granger causality etc. as the pairwise correlations between ROIs

（2）Structural Brain Network Construction

①Some preprocessing functions in different tools:

2.4.3. Discussions

The combination of sMRI and fMRI might be more effective than single modality

metabolic adj.代谢的;新陈代谢的

2.5. GNN Baselines for Brain Network Analysis

2.5.1. Node Feature Construction

①Identity: give one hot feature vector for each node

②Eigen: similar to PCA...

③Degree: a one dimension vector that records the degree of one node

④Degree profile:

$\begin{gathered} \boldsymbol{x}_i= [\deg(v_i)\parallel\min\text{ }(\mathcal{D}_i)\parallel\max\text{ }(\mathcal{D}_i) \\ \|\text{mean }(\mathcal{D}_i)\parallel\text{std }(\mathcal{D}_i)] \end{gathered}$

⑤Connection profile: each row of one node is the original node feature

2.5.2. Message Passing Mechanisms

①The node feature $h_i^{l}$ in layer $l$ firstly get message from neighbors through sum operation:

$\boldsymbol{m}_i^l=\sum_{j\in\mathcal{N}_i}\boldsymbol{m}_{ij}=\sum_{j\in\mathcal{N}_i}M_l\left(\boldsymbol{h}_i^l,\boldsymbol{h}_j^l,w_{ij}\right)$

where $\mathcal{N}_{i}$ represents all the neighbors of node $v_i$ , $w_{ij}$ denotes the edge weights between node $v_i$ and $v_j$ , $M_l$ denotes the message function

②They secondly update with:

$h_i^{l+1}=U_l\left(\boldsymbol{h}_i^l,\boldsymbol{m}_i^l\right)$

where $U_l$ can be any differentiable function

③They $\boldsymbol{m}_{ij}$ might be influenced on:

egde wights	Aggregation as in GCN, $m_{ij}=\boldsymbol{h}_{j}\cdot w_{ij}$ , clearly reflects that the value of $m_{ij}$ is related to the edge weight value
bin concat	Set $T$ buckets, trying it in [5, 10, 15, 20]. Each bucket possesses its own expression $\mathbf{b}_t$ . Ranking all the edge weights and dividing them into $T$ buckets in ascending order. Then, followed by an MLP: $m_{ij}=\mathrm{MLP}(h_{j}\parallel b_{t})$ . It helps to find the similar connections.
edge weight concat	$m_{ij}=\mathrm{MLP}(\boldsymbol{h}_{j}\parallel d\cdot w_{ij})$ , where the value of $d$ is the dimension of node feature. Such scaling extends the impact of edge feature
node edge concat	$m_{ij}=\mathrm{MLP}(h_{i}\parallel h_{j}\parallel w_{ij})$ . It can reduce the over smoothing problem because “从每个中心节点的本地邻居传递的每条消息都使用其上一个时间步长的表示进行强化”（？我没太能理解，这不是两个节点之间的concat吗，和上一步有什么关系？）
node concat	$m_{ij}=\mathrm{MLP}(h_{i}\parallel h_{j})$

2.5.3. Attention-Enhanced Message Passing

①Attention mechanism is useful in collecting of important information

②Different from traditional graph attention mechanisms as in molecule, brain graph needs the edge features more and node features less

③So the attention will be:

Attention weighted	original GAT without edge features $m_{ij}=\boldsymbol{h}_{j}\cdot\alpha_{ij}$ where $a_{ij}$ denotes the corresponding attention score and is come from nonlinear LeakyReLU in single-layer feed-forward neural network: $\alpha_{ij}=\frac{\exp\left(\sigma\left(\boldsymbol{a}^{\top}\left[\boldsymbol{\Theta}x_{i}\parallel\boldsymbol{\Theta}x_{j}\right]\right)\right)}{\sum_{k\in\mathcal{N}(i)\cup\{i\}}\exp\left(\sigma\left(\boldsymbol{a}^{\top}\left[\boldsymbol{\Theta}x_{i}\parallel\boldsymbol{\Theta}x_{k}\right]\right)\right)}$ （作者没说learnable linear transformation matrix $\Theta$ , weight vector $\boldsymbol{a}$ 的值诶）（ $\sigma$ 作者说是LeakyReLU nonlinearity，这是一个操作（function）还是说是个值啊）
Edge weighted w/ attn	enhanced version of "egde wights" in GCN: $m_{ij}=h_{j}\cdot\alpha_{ij}\cdot w_{ij}$
Attention edge sum	another enhanced version of "egde wights" in GCN: $m_{ij}=h_{j}\cdot(a_{ij}+w_{ij})$
Node edge concat w/ attn	enhanced version of "edge weight concat" in GCN: $m_{ij}=\mathrm{MLP}(h_{i}\parallel(h_{j}\cdot\alpha_{ij})\parallel w_{ij})$
Node concat w/ attn	enhanced version of "node weight concat" in GCN: $m_{ij}=\mathrm{MLP}(h_i\parallel(h_j\cdot a_{ij}))$

2.5.4. Pooling Strategies

①The pooling operator is like:

$g_{n}=R\left(\{h_{k}\mid v_{k}\in\mathcal{G}_{n}\}\right)$

②Provided pooling methods:

mean pooling	$g_{n}=\frac{1}{M}\sum_{k=1}^{M}h_{k}$
sum pooling	$g_{n}=\sum_{k=1}^{M}\boldsymbol{h}_{k}$
concat pooling	$g_{n}=\parallel_{k=1}^{M}h_{k}=h_{1}\parallel h_{2}\parallel\ldots\parallel h_{M}$

③They think other complex pooling like hierarchical pooling, learnable pooling, clustering readout are usually regarded as independent GNN architecture rather than combinative modules. Therefore they did not provide them.

2.6. Experimental Analysis and Insights

2.6.1. Experimental Settings

（1）Datasets

①Four basic datasets: fMRI (HIV, PNC, ABCD) and dMRI(PPMI)

②Tasks: disease classification in HIV and PPMI, sex classification in PNC and ABCD

③Overall information of datasets:

④Human Immunodeficiency Virus Infection (HIV): 35 early HIV patients and 35 seronegative controls. Preprocessing procedures are: a) realignment to the first volume, b) slice timing correction, c) normalization, d) patial smoothness, e) band-pass filtering, f) linear trend removal of the time series.（很神奇的是ROI数量是116个但是size只包含90个大脑区域诶，怎么筛选的也没说）

⑤Philadelphia Neuroimaging Cohort (PNC): 289 (57.46%) female. Preprocessing procedures are: a) slice timing correction, b) motion correction, c) registration, d) normalization, e) removal of linear trends, f) bandpass filtering, g) spatial smoothing. Also, they just choose 232 of 264.

⑥Parkinson’s Progression Markers Initiative (PPMI): 596 Parkinson’s
disease patients and 158 HC. Preprocessing procedures are: a) aligned to correct for head motion and eddy current distortions, b) remove the non-brain tissue and linearly align and register the skull-stripped images. Number of ROI is 84. Reconstructing the brain network by deterministic 2nd-order Runge-Kutta (RK2) wholebrain tractography algorithm.

⑦Adolescent Brain Cognitive Development Study (ABCD): subjects are 9-10 years old children from 21 sites. 3961 (50.1%) are female. Preprocessed by ABCD-HCP BIDS fMRI Pipeline12.

⑧⭐For sMRI, standardizing each edge weight by dividing by the maximum edge weight in one sample to ensure all the values are in [0,1]. For fMRI, they delete negative value in GCN and remain them in GAT (GCN can not handel them).

seronegative adj. 血清反应阴性的 therapeutics n. 疗法，治疗学

（2）Baselines

①Shallow models: M2E, MPCA and MK-SVM followed by logistic regression classification

②Deep models: BrainGNN and BrainNetCNN

（3）Implementation Details

①Optimizer: Adam

②Epoch: 20

③Learning rate: 1e-3

④Weight decay: 1e-4 for regularization

⑤Sample split: 80% training set and 20% test set

⑥Cross validation: 10 fold

⑦The mean performance of each model in each dataset:

2.6.2. Performance Report

（1）Node Feature

①⭐Adopting the row of node as the node feature perfoms best.

②They think this method captures the overall information of brain network...（虽然我真的觉得这个可解释性差到极致了...）

（2）Message Passing

Generally discuss these methods and their performances.

（3）Attention Enhanced Message Passing

①⭐Attention performs better than without

②Generally discuss these methods and their performances.

（4）Pooling Strategies

Generally discuss these methods and their performances.

（5）Other Baselines

①Deep models performs better than shallow models

②The BrainGNN might be out-of-memory (OOM) in large dataset

（6）Insights on Density Levels

①fMRI graphs are fully connected but sMRI graphs are not. There are about 22.64% edges in PPMI

②⭐They find that the more complex the models are, the more the hidden layers needed.

2.7. Open Source Benchmark Platform

Briefly introduce BrainGB.

2.8. Discussion and Extensions

（1）Limitations

①They did not provide the graph-level module

②They are restricted due to the small sample size of the dataset

（2）Future prospects

①“神经学驱动的GNN设计:基于对预测性大脑信号，特别是疾病特异性信号的神经学理解，设计GNN架构。”（这是中翻，我没太能理解。信号这东西，得有这数据集吧？）

②Better pretraining

③Sharing information of different diseases（好像看到过一篇文章是把ADHD和AD比较吗，说这俩玩意儿共同脑区的）

3. BrainGB库/代码

参见另一篇文章：[代码复现]BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks-CSDN博客

4. 知识补充

4.1. Out-of-memory (OOM)

（1）跑深度学习模型时，如果遇到内存不足的问题，可能有以下几个原因：

①模型复杂度高：深度神经网络通常包含大量的参数和层数，这需要大量的内存来存储和计算。

②数据量大：训练深度学习模型需要大量的数据，这些数据需要在内存中存储和处理。

③批次处理大小：在训练过程中，每次输入一批次的数据进行处理，如果批次处理的大小设置得过大，会导致内存不足。

④缓存需求：在深度学习模型训练过程中，中间计算结果需要被缓存，以便在反向传播时使用，这也会占用大量内存。

（2）为了解决内存不足的问题，可以采取以下几种方法：

①降低批次处理大小：减小批次处理的大小可以减少内存的使用量，但同时也会降低模型训练的效率。

②采用更小的模型：通过采用更小的模型，减少模型的参数数量和层数，可以降低内存的使用量。

③使用更高效的数据格式：根据实际需求选择更高效的数据格式，例如float16或float32等，可以减少内存的占用。

④优化模型结构：优化模型的结构和参数，减少不必要的计算和参数，可以降低内存的使用量。

⑤使用显存优化库：使用显存优化库可以更高效地管理内存和显存的分配，从而避免内存不足的问题。

4.2. Weight decay

Weight Decay是一个正则化技术，其作用是抑制模型的过拟合，从而提高模型的泛化性。它是通过给损失函数增加模型权重L2范数的惩罚（penalty）来让模型权重不要太大，以此来减小模型的复杂度，从而抑制模型的过拟合。Weight Decay参数是在优化器上，而不是在Loss上。在损失函数中，weight decay是放在正则项（regularization）前面的一个系数，正则项一般指示模型的复杂度，所以weight decay的作用是调节模型复杂度对损失函数的影响，若weight decay很大，则复杂的模型损失函数的值也就大。

4.3. 2nd-order Runge-Kutta (RK2)

（1）介绍：Runge-Kutta是一种在工程上广泛应用的高精度单步算法，基于数学支持。对于一阶精度的欧拉公式，Runge-Kutta方法通过在区间内预估多个点上的斜率值，并用它们的加权平均数作为平均斜率的近似值，能够构造出具有很高精度的高阶计数公式。这种方法既避免了求高阶导数，又提高了计算方法的精度。具体地，如果使用四个点处的斜率加权平均作为平均斜率的近似值，便构成一系列四阶Runge-Kutta公式，具有四阶精度。该方法的推导基于Taylor展开方法，要求所求的解具有较好的光滑性。如果解的光滑性差，那么使用四阶Runge-Kutta方法求得的数值解的精度可能反而不如改进的欧拉方法。在实际计算时，应针对问题的具体特点选择适合的算法。

（2）参考学习1：Runge-Kutta（龙格-库塔）方法 | 基本思想 + 二阶格式 + 四阶格式-CSDN博客

（3）参考学习2：8.03: Runge-Kutta 2nd-Order Method for Solving Ordinary Differential Equations - Mathematics LibreTexts

5. Reference List

Cui H. et al. (2023) 'BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks', IEEE Transactions on Medical Imaging, 42 (2), pp. 493-506. doi" 10.1109/TMI.2022.3218745