[论文精读]BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks

v2版本,于2024.4.28 remastered

论文网址:BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks | IEEE Journals & Magazine | IEEE Xplore

论文代码:GitHub - HennyJie/BrainGB: Officially Accepted to IEEE Transactions on Medical Imaging (TMI, IF: 11.037) - Special Issue on Geometric Deep Learning in Medical Imaging.

BrainGB网站:https://braingb.us

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用!

1. 省流版

1.1. 论文总结图

2. 论文逐段精读

2.1. Abstract

        ①At present, there is still a lack of systematic research on brain network analysis

        ②They proposed Brain Graph Neural Network Benchmark (BrainGB) to construct pipelines and modularize its implementation

2.2. Introduction

        ①The interactions between brain regions are decisive factors of analysing neurology and diseases

        ②Their contributions are: a) establishing a unified framework and evaluation criteria, b) summarizing the reprocessing and building pipeline of fMRI and sMRI, c) setting baselines as node features, message passing mechanisms, attention mechanisms, and pooling strategies

        ③Overall framework:

(不过主干部分只有GCN和GAT可选呢,其实还有一堆Conv都可以涵盖进去,GIN和GraphSAGE啥的)

motif  n.(文学作品或音乐的)主题;装饰图案;动机;主旨

2.3. Preliminaries

2.3.1. Brain Network Analysis

        ①Brain network dataset is \mathcal{D}=\{\mathcal{G}_{n},y_{n}\}_{n=1}^{N} with N subjects, where \mathcal{G}_{n}=\{\mathcal{V}_{n},\mathcal{E}_{n}\},  y_{n} is the true label, \mathcal{V}_n=\mathcal{V}=\{v_i\}_{i=1}^M denotes M nodes (ROIs), \mathcal{E}_{n} denotes edges. The output of model is prediction \hat{y}_n

        ②Graph kernels and tensor factorization are too shallow to analyse the complicate brain structure

        ③The adjacency matrix W_{n} \in \mathbb{R}^{M\times M} is weighted (不知道会不会直接替换邻接矩阵A,不过其实也可以根据W去生成A)

aberration  n.异常行为;反常现象;脱离常规

2.3.2. Graph Neural Networks

        ①There are 3 differences between brain network and other graph: a) brain network is lack of node features, b) weights of connection can be positive or negative, c) ROI is fixed

2.4. Brain Network Dataset Construction

2.4.1. Background: Diverse Modalities of Brain Imaging

        There is a lot of scanning technology: Magnetic-Resonance Imaging (MRI), Electroencephalography (EEG) and Magnetoencephalography (MEG), Positron Emission Tomography (PET), Single-Photon Emission Computed Tomography (SPECT), and X-ray Computed Tomography (CT) etc.

(1)MRI Data

        ①Functional MRI (fMRI) indicates changes in blood oxygen and blood flow and reveals the functional activities

        ②Diffusion-weighted MRI (dMRI) fits brain structure through molecular (usually water) motion trajectories

trajectory  n.轨迹;(射体在空中的)轨道;弹道

(2)Challenges in MRI Preprocessings

        ①There are preprocessing tools like SPM, AFNI and FSL. However, it really takes time to learn them or use them

        ②None of a tool contains all the preprocessing functions of dMRI

        ③The publicity of datasets is also a big problem

        ④For different modalities, they need different methods of preprocessing

2.4.2. Brain Network Construction From Raw Data

(1)Functional Brain Network Construction

        ①Some preprocessing functions in different tools:

        ②There are partial correlations, mutual information, coherence, Granger causality etc. as the pairwise correlations between ROIs

(2)Structural Brain Network Construction

        ①Some preprocessing functions in different tools:

2.4.3. Discussions

        The combination of sMRI and fMRI might be more effective than single modality

metabolic  adj.代谢的;新陈代谢的

2.5. GNN Baselines for Brain Network Analysis

2.5.1. Node Feature Construction

        ①Identity: give one hot feature vector for each node

        ②Eigen: similar to PCA...

        ③Degree: a one dimension vector that records the degree of one node

        ④Degree profile:

 \begin{gathered} \boldsymbol{x}_i= [\deg(v_i)\parallel\min\text{ }(\mathcal{D}_i)\parallel\max\text{ }(\mathcal{D}_i) \\ \|\text{mean }(\mathcal{D}_i)\parallel\text{std }(\mathcal{D}_i)] \end{gathered}

        ⑤Connection profile: each row of one node is the original node feature

2.5.2. Message Passing Mechanisms

        ①The node feature h_i^{l} in layer l firstly get message from neighbors through sum operation:

\boldsymbol{m}_i^l=\sum_{j\in\mathcal{N}_i}\boldsymbol{m}_{ij}=\sum_{j\in\mathcal{N}_i}M_l\left(\boldsymbol{h}_i^l,\boldsymbol{h}_j^l,w_{ij}\right)

where \mathcal{N}_{i} represents all the neighbors of node v_iw_{ij} denotes the edge weights between node v_i and v_jM_l denotes the message function

        ②They secondly update with:

h_i^{l+1}=U_l\left(\boldsymbol{h}_i^l,\boldsymbol{m}_i^l\right)

where U_l can be any differentiable function

        ③They \boldsymbol{m}_{ij} might be influenced on:

egde wightsAggregation as in GCN, m_{ij}=\boldsymbol{h}_{j}\cdot w_{ij}, clearly reflects that the value of m_{ij} is related to the edge weight value
bin concatSet T buckets, trying it in [5, 10, 15, 20]. Each bucket possesses its own expression \mathbf{b}_t. Ranking all the edge weights and dividing them into T buckets in ascending order. Then, followed by an MLP: m_{ij}=\mathrm{MLP}(h_{j}\parallel b_{t}). It helps to find the similar connections.
edge weight concatm_{ij}=\mathrm{MLP}(\boldsymbol{h}_{j}\parallel d\cdot w_{ij}), where the value of d is the dimension of node feature. Such scaling extends the impact of edge feature 
node edge concatm_{ij}=\mathrm{MLP}(h_{i}\parallel h_{j}\parallel w_{ij}). It can reduce the over smoothing problem because “从每个中心节点的本地邻居传递的每条消息都使用其上一个时间步长的表示进行强化”(?我没太能理解,这不是两个节点之间的concat吗,和上一步有什么关系?
node concatm_{ij}=\mathrm{MLP}(h_{i}\parallel h_{j})

2.5.3. Attention-Enhanced Message Passing

        ①Attention mechanism is useful in collecting of important information

        ②Different from traditional graph attention mechanisms as in molecule, brain graph needs the edge features more and node features less

        ③So the attention will be:

Attention weighted

original GAT without edge features m_{ij}=\boldsymbol{h}_{j}\cdot\alpha_{ij} where a_{ij} denotes the  corresponding attention score and is come from nonlinear LeakyReLU in single-layer feed-forward neural network:

\alpha_{ij}=\frac{\exp\left(\sigma\left(\boldsymbol{a}^{\top}\left[\boldsymbol{\Theta}x_{i}\parallel\boldsymbol{\Theta}x_{j}\right]\right)\right)}{\sum_{k\in\mathcal{N}(i)\cup\{i\}}\exp\left(\sigma\left(\boldsymbol{a}^{\top}\left[\boldsymbol{\Theta}x_{i}\parallel\boldsymbol{\Theta}x_{k}\right]\right)\right)}

作者没说learnable linear transformation matrix \Theta, weight vector \boldsymbol{a} 的值诶

\sigma作者说是LeakyReLU nonlinearity,这是一个操作(function)还是说是个值啊

Edge weighted w/ attn

enhanced version of "egde wights" in GCN:

m_{ij}=h_{j}\cdot\alpha_{ij}\cdot w_{ij}

Attention edge sum

another enhanced version of "egde wights" in GCN:

m_{ij}=h_{j}\cdot(a_{ij}+w_{ij})

Node edge concat w/ attn

enhanced version of "edge weight concat" in GCN:

m_{ij}=\mathrm{MLP}(h_{i}\parallel(h_{j}\cdot\alpha_{ij})\parallel w_{ij})

Node concat w/ attn

enhanced version of "node weight concat" in GCN:

m_{ij}=\mathrm{MLP}(h_i\parallel(h_j\cdot a_{ij}))

2.5.4. Pooling Strategies

        ①The pooling operator is like:

g_{n}=R\left(\{h_{k}\mid v_{k}\in\mathcal{G}_{n}\}\right)

        ②Provided pooling methods:

mean poolingg_{n}=\frac{1}{M}\sum_{k=1}^{M}h_{k}
sum poolingg_{n}=\sum_{k=1}^{M}\boldsymbol{h}_{k}
concat poolingg_{n}=\parallel_{k=1}^{M}h_{k}=h_{1}\parallel h_{2}\parallel\ldots\parallel h_{M}

        ③They think other complex pooling like hierarchical pooling, learnable pooling, clustering readout are usually regarded as independent GNN architecture rather than combinative modules. Therefore they did not provide them.

2.6. Experimental Analysis and Insights

2.6.1. Experimental Settings

(1)Datasets

        ①Four basic datasets: fMRI (HIV, PNC, ABCD) and dMRI(PPMI)

        ②Tasks: disease classification in HIV and PPMI, sex classification in PNC and ABCD

        ③Overall information of datasets:

        ④Human Immunodeficiency Virus Infection (HIV): 35 early HIV patients and 35 seronegative controls. Preprocessing procedures are: a) realignment to the first volume, b) slice timing correction, c) normalization, d) patial smoothness, e) band-pass filtering, f) linear trend removal of the time series.(很神奇的是ROI数量是116个但是size只包含90个大脑区域诶,怎么筛选的也没说

        ⑤Philadelphia Neuroimaging Cohort (PNC): 289 (57.46%) female. Preprocessing procedures are: a) slice timing correction, b) motion correction, c) registration, d) normalization, e) removal of linear trends, f) bandpass filtering, g) spatial smoothing. Also, they just choose 232 of 264.

        ⑥Parkinson’s Progression Markers Initiative (PPMI): 596 Parkinson’s
disease patients and 158 HC. Preprocessing procedures are: a) aligned to correct for head motion and eddy current distortions, b) remove the non-brain tissue and linearly align and register the skull-stripped images. Number of ROI is 84. Reconstructing the brain network by deterministic 2nd-order Runge-Kutta (RK2) wholebrain tractography algorithm.

        ⑦Adolescent Brain Cognitive Development Study (ABCD): subjects are 9-10 years old children from 21 sites. 3961 (50.1%) are female. Preprocessed by ABCD-HCP BIDS fMRI Pipeline12.

        ⑧⭐For sMRI, standardizing each edge weight by dividing by the maximum edge weight in one sample to ensure all the values are in [0,1]. For fMRI, they delete negative value in GCN and remain them in GAT (GCN can not handel them).

seronegative  adj. 血清反应阴性的     therapeutics  n. 疗法,治疗学

(2)Baselines

        ①Shallow models: M2E, MPCA and MK-SVM followed by logistic regression classification

        ②Deep models: BrainGNN and BrainNetCNN

(3)Implementation Details

        ①Optimizer: Adam

        ②Epoch: 20

        ③Learning rate: 1e-3

        ④Weight decay: 1e-4 for regularization

        ⑤Sample split: 80% training set and 20% test set

        ⑥Cross validation: 10 fold

        ⑦The mean performance of each model in each dataset:

2.6.2. Performance Report

(1)Node Feature

        ①⭐Adopting the row of node as the node feature perfoms best. 

        ②They think this method captures the overall information of brain network...(虽然我真的觉得这个可解释性差到极致了...

(2)Message Passing

        Generally discuss these methods and their performances.

(3)Attention Enhanced Message Passing

        ①⭐Attention performs better than without

        ②Generally discuss these methods and their performances.

(4)Pooling Strategies

        Generally discuss these methods and their performances.

(5)Other Baselines

        ①Deep models performs better than shallow models

        ②The BrainGNN might be out-of-memory (OOM) in large dataset

(6)Insights on Density Levels

        ①fMRI graphs are fully connected but sMRI graphs are not. There are about 22.64% edges in PPMI

        ②⭐They find that the more complex the models are, the more the hidden layers needed.

2.7. Open Source Benchmark Platform

        Briefly introduce BrainGB.

2.8. Discussion and Extensions

(1)Limitations

        ①They did not provide the graph-level module

        ②They are restricted due to the small sample size of the dataset

(2)Future prospects

        ①“神经学驱动的GNN设计:基于对预测性大脑信号,特别是疾病特异性信号的神经学理解,设计GNN架构。”(这是中翻,我没太能理解。信号这东西,得有这数据集吧?

        ②Better pretraining

        ③Sharing information of different diseases(好像看到过一篇文章是把ADHD和AD比较吗,说这俩玩意儿共同脑区的)

3. BrainGB库/代码

        参见另一篇文章:[代码复现]BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks-CSDN博客

4. 知识补充

4.1. Out-of-memory (OOM)

(1)跑深度学习模型时,如果遇到内存不足的问题,可能有以下几个原因:

        ①模型复杂度高:深度神经网络通常包含大量的参数和层数,这需要大量的内存来存储和计算。

        ②数据量大:训练深度学习模型需要大量的数据,这些数据需要在内存中存储和处理。

        ③批次处理大小:在训练过程中,每次输入一批次的数据进行处理,如果批次处理的大小设置得过大,会导致内存不足。

        ④缓存需求:在深度学习模型训练过程中,中间计算结果需要被缓存,以便在反向传播时使用,这也会占用大量内存。

(2)为了解决内存不足的问题,可以采取以下几种方法:

        ①降低批次处理大小:减小批次处理的大小可以减少内存的使用量,但同时也会降低模型训练的效率。

        ②采用更小的模型:通过采用更小的模型,减少模型的参数数量和层数,可以降低内存的使用量。

        ③使用更高效的数据格式:根据实际需求选择更高效的数据格式,例如float16或float32等,可以减少内存的占用。

        ④优化模型结构:优化模型的结构和参数,减少不必要的计算和参数,可以降低内存的使用量。

        ⑤使用显存优化库:使用显存优化库可以更高效地管理内存和显存的分配,从而避免内存不足的问题。

4.2. Weight decay

Weight Decay是一个正则化技术,其作用是抑制模型的过拟合,从而提高模型的泛化性。它是通过给损失函数增加模型权重L2范数的惩罚(penalty)来让模型权重不要太大,以此来减小模型的复杂度,从而抑制模型的过拟合。Weight Decay参数是在优化器上,而不是在Loss上。在损失函数中,weight decay是放在正则项(regularization)前面的一个系数,正则项一般指示模型的复杂度,所以weight decay的作用是调节模型复杂度对损失函数的影响,若weight decay很大,则复杂的模型损失函数的值也就大。

4.3. 2nd-order Runge-Kutta (RK2)

(1)介绍:Runge-Kutta是一种在工程上广泛应用的高精度单步算法,基于数学支持。对于一阶精度的欧拉公式,Runge-Kutta方法通过在区间内预估多个点上的斜率值,并用它们的加权平均数作为平均斜率的近似值,能够构造出具有很高精度的高阶计数公式。这种方法既避免了求高阶导数,又提高了计算方法的精度。具体地,如果使用四个点处的斜率加权平均作为平均斜率的近似值,便构成一系列四阶Runge-Kutta公式,具有四阶精度。该方法的推导基于Taylor展开方法,要求所求的解具有较好的光滑性。如果解的光滑性差,那么使用四阶Runge-Kutta方法求得的数值解的精度可能反而不如改进的欧拉方法。在实际计算时,应针对问题的具体特点选择适合的算法。

(2)参考学习1:Runge-Kutta(龙格-库塔)方法 | 基本思想 + 二阶格式 + 四阶格式-CSDN博客

(3)参考学习2:8.03: Runge-Kutta 2nd-Order Method for Solving Ordinary Differential Equations - Mathematics LibreTexts

5. Reference List

Cui H. et al. (2023) 'BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks', IEEE Transactions on Medical Imaging, 42 (2), pp. 493-506. doi" 10.1109/TMI.2022.3218745

  • 17
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
HPO-B是一个基于OpenML的大规模可复现的黑盒超参数优化(HPO)基准。超参数优化是机器学习中非常重要的一环,它涉及在给定的模型框架下选择最优的超参数配置,以提高模型的性能和泛化能力。 HPO-B基准的目的是为了提供一个可靠且可复现的平台,用于评估不同HPO方法的效果。通过使用OpenML作为基础数据集和算法库,HPO-B能够提供广泛的机器学习任务和模型,从而覆盖不同领域的实际应用。 HPO-B基准的黑盒性质意味着它仅仅观察模型的输入和输出,而不考虑模型内部的具体实现。这种设置模拟了现实世界中许多机器学习任务的情况,因为在实际应用中,我们通常无法获得关于模型的全部信息。 HPO-B基准旨在解决现有HPO方法的一些挑战,例如难以比较和复制不同方法之间的实验结果。它通过提供标准任务、固定的训练-验证-测试数据分割方式和一致的评估协议,使得不同方法之间的比较更加公平和可靠。 通过使用HPO-B基准,研究人员和从业者可以在统一的实验环境中进行黑盒超参数优化方法的评估和对比。这有助于推动该领域的发展,促进更好的超参数优化算法的提出和运用。 总而言之,HPO-B是一个基于OpenML的大规模可复现的黑盒超参数优化基准,旨在解决现有方法比较困难和结果复现性差的问题,并推动超参数优化算法的发展。它为机器学习任务提供了一个统一的实验平台,以评估不同方法在不同领域的性能。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值