[论文精读]Classification of Developmental and Brain Disorders via Graph Convolutional Aggregation

夏莉莉iy

已于 2024-04-07 09:42:48 修改

阅读量967

点赞数 23

分类专栏：论文精读文章标签：人工智能机器学习深度学习学习计算机视觉分类 python

于 2024-02-26 13:24:01 首次发布

本文链接：https://blog.csdn.net/Sherlily/article/details/136279875

版权

论文精读专栏收录该内容

56 篇文章 8 订阅

订阅专栏

这篇论文介绍了一种名为AN-GCN的模型，通过图卷积聚合和跳跃连接来改善疾病分类，特别是针对过度平滑问题。AN-GCN在处理发育和脑障碍数据时表现出高精度，且对GCN的改进包括特征选择和构建人口图策略。文中还讨论了实验设置、性能评估和参数敏感性分析。

摘要由CSDN通过智能技术生成

论文网址：Classification of Developmental and Brain Disorders via Graph Convolutional Aggregation | Cognitive Computation (springer.com)

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

2.4.1. Preliminaries and Problem Statement

2.4.2. Proposed Model

2.5. Experiments

2.5.1. Experimental Setup

2.5.2. Experimental Results and Analysis

2.5.3. Parameter Sensitivity Analysis

1. 省流版

1.1. 心得

（1）...还挺新鲜热乎的论文

（2）比GCN高50%...放在abstract里面...我...不是很想评价

（3）为什么一直强调GCN啊

（4）是我状态不行吗为啥我一直觉得这篇的related works那些写得有点...不太有区分性

（5）感觉Related works如果疾病分类很小众这样倒是没啥，但是现在做疾病分类的已经很多了，下面也有无数的变体，对其细分一下我觉得反而是更好的选择

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

①They introduced an aggregator normalization graph convolutional network in their model, which contains aggregation and skipping connection and identity map. Identity mapping shows the structural information and skipping connections "enable the direct flow of information from the input features to later layers of the network"（我现在不是那么理解是怎么个跳过，看正文再说）

②They use both image and non-image information

2.2. Introduction

①They put forward an aggregator normalization graph convolutional network (AN-GCN) to overcome the problem of over smoothing

②The feature selection in AGGREGATE eliminates the bias of small batch data and enhances the robustness by reducing the sensitivity of data

proliferation n.激增;增殖;涌现;大量的事物 lieu n.代替；场所，处所

2.3. Related Work

（1）Graph Convolutional Networks

①GCN based models are typical semi-supervised methods

②GCN过渡平滑的原因倒是解释的很清楚，一直在聚合周围的，多来几次就每个点值差不多了

（2）Disease Prediction

①你在下面提出的模型...疾病分类？？？啊这，有点过于新颖了

2.4. Method

2.4.1. Preliminaries and Problem Statement

（1）Basic Notions

①Defining an undireted graph $G=\left ( V,E \right )$ , where $V=\left \{ 1,...,N \right \}$ denotes $N$ nodes, $E\in \mathbb{R}^{V\times V}$ denotes the edges

②Defining $A=\left ( A_{ij} \right ) \in \mathbb{R}^{N \times N}$ as adjacency matrix, where $A_{ij}$ is the weight of edge between $i$ and $j$

③Defining $X=\left ( X_1,...,X_N \right )^T \in \mathbb{R}^{N \times F}$ as feature matrix of nodes

④They aim to map $X=\left ( X_1,...,X_N \right )^T \in \mathbb{R}^{N \times F}$ to $Z=\left ( Z_1,...,Z_N \right )^T \in \mathbb{R}^{N \times P}$ to reduce the dimension of features, where $P\ll N$

（2）Problem Statement

①They define labeled set $D_l=\left \{ \left ( Z_i,y_i \right ) \right \}^{N_l}_{i=1}$ and unlabeled set $D_u=\left \{ Z_i \right \}^{N_l+N_u}_{i=N_l+1}$ , where $N_l+N_u=N$

②The authors intend to find the parameter $\theta$ in $f_\theta :V_l\rightarrow Y_l$ to realize precise classification, where $V_l\subset V$ denotes the set of labeled nodes

③The labeled $Z_i$ can be represented as an one hot encoding with $C$ dimensionality, where $C$ is the number of classes

2.4.2. Proposed Model

A two stage method is proposed that the authors construct population graph first and introduce classification approach later

（1）Population Graph Construction

①The construction of graph:

where the node features are extracted from correlation matrix (image data) and the edge features are non-image data

②Defining $\left \{ M_1,...,M_T \right \}$ is non-imaging phenotypes with $T$ types

③The $A_{ij}$ can be represented as:

$\mathbf{A}_{ij}=K(i,j)\sum_{t=1}^Td(M_t(i),M_t(j))$

where $K(i,j)=\text{similarity}(S_{i},S_{j})$ , $d$ denotes the pairwise distance between phenotypic
measures (qualitative data such as sex, quantitative data such as age).

For qualitative data, the distance can be:

$\left.d(M_t(i),M_t(j))=\left\{\begin{array}{ll}1&\mathrm{if}M_t(i)=M_t(j)\\0&\mathrm{otherwise}.\end{array}\right.\right.$

For quantitative data, the distance can be:

$\left.d(M_t(i),M_t(j))=\left\{\begin{array}{ll}1&\mathrm{if}|M_t(i)-M_t(j)|<\tau\\0&\mathrm{otherwise}.\end{array}\right.\right.$

where $\tau$ denotes the threshold（⭐好奇怪啊为什么性别相等反而有距离，不等就没有距离？为什么距离小于阈值是有距离，大于了反而没有了？）

④The kernel similarity measure:

$K(i,j)=\exp\Big(-\frac{\rho(\mathbf{x}_i,\mathbf{x}_j)^2}{2\sigma^2}\Big)$

where $\sigma$ denotes a smoothing parameter（额？然后还能决定核的宽度？）, $\rho$ denotes the correlation distance between $X_i$ and $X_j$ :

$\rho(\mathbf{x}_{i},\mathbf{x}_{j})=1-\frac{(\mathbf{x}_{i}-\mathbf{\bar{x}}_{i})(\mathbf{x}_{j}-\mathbf{\bar{x}}_{j})^{\mathsf{T}}}{\|\mathbf{x}_{i}-\mathbf{\bar{x}}_{i}\|\|\mathbf{x}_{j}-\mathbf{\bar{x}}_{j}\|}$

⑤Overall framework:

（2）Disease Prediction Model

①Feature Diffusion: for self loops, they define $\tilde{\mathbf{A}}=\mathbf{A}+\mathbf{I}$ . And the layer-wise feature diffusion in an $L$ -layer GCN can be:

$\mathbf{S}^{(\ell)}=\mathbf{\hat{A}}\mathbf{H}^{(\ell)},\quad\ell=0,\ldots,L-1$

where $\hat{\mathbf{A}}=\tilde{\mathbf{D}}^{-\frac{1}{2}}\tilde{\mathbf{A}}\tilde{\mathbf{D}}^{-\frac{1}{2}}$ denotes the normalized adjacency matrix with self-added loops, $\tilde{\mathbf{D}}=diag(\tilde{\mathbf{A}}\mathbf{1})$ denotes the diagonal degree matrix, $\mathbf{H}^{(\ell)}\in\mathbb{R}^{N\times F_{\ell}}$ denotes the input feature matrix in the $l$ -th layer and the first layer is $\mathbf{H}^{(0)}=\mathbf{X}$

②Aggregated Feature Diffusion: a layer-wise aggregated feature diffusion rule for node features in the $l$ -th layer:

$\mathbf{S}^{(\ell)}=(\hat{\mathbf{A}}\odot\Gamma)\mathbf{H}^{(\ell)}$

where $\Gamma =\left ( \gamma _{ij} \right ) \in \mathbb{R}^{N\times N}$ , each $\gamma _{ij}=\frac{C_i}{C_{ij}}$ denotes an aggregator normalization constant. $C_i$ and $C_{ij}$ is the number of times the node $i \in V$ or edge $\left ( i,j \right )\in E$ apperars in the subgraphs of $G=\left ( V,E \right )$ . These subgraphs were obtained by repeatedly running the GraphSaint sampler before the training began

③Learning Node Embeddings: the ptopagation rule is:

$\begin{aligned}\mathbf{H}^{(\ell+1)}&=\sigma\bigg(\big(1-\alpha_\ell\big)(\hat{\mathbf{A}}\odot\Gamma)\mathbf{H}^{(\ell)}\\&+\beta_\ell(\hat{\mathbf{A}}\odot\Gamma)\mathbf{H}^{(\ell)}\big(\mathbf{I}+\mathbf{W}^{(\ell)}\big)\\&+\alpha_\ell\mathbf{X}+\beta_\ell\mathbf{X}\big(\mathbf{I}+\mathbf{W}^{(\ell)}\big)\bigg)\end{aligned}$

where $\alpha _l$ and $\beta _l$ are nonnegative hyper-parameters in $\left [ 0,1 \right ]$ , $\sigma$ is point-wise non-linear activation function such as ReLU. The final representation contains the minimum proportion of feature information from input layer by skip connections, which determined by $\alpha _l$

④Model Prediction: the classifier of the final node representations:

$\hat{\mathbf{Y}}\in \mathbb{R}^{N\times C}=\mathrm{softmax}(\mathbf{Z})$

it is the matrix of predicted labels for graph nodes and $C$ denotes the number of classes

⑤Model Training: the use the minimum of cross-entropy loss function:

$\mathcal{L}=-\sum_{i\in\mathcal{V}_{l}}\sum_{c=1}^{C}\mathbf{Y}_{ic}\log\hat{\mathbf{Y}}_{ic}$

$Y_{ic}=\left\{\begin{matrix} 1,\, \, \, \, \, \, \, if\, \, \, \, \, \, \, \, i\in c\\ 0,\, \, \, \, \, \, \, otherwise \end{matrix}\right.$

where $\hat{\mathbf{Y}}_{ic}$ is the element of i-th row and c-th column in matrix $\hat{\mathbf{Y}}$ , namely the probability that the network associates the i-th node with class c

⑥Optimizer: Adam

2.5. Experiments

They focus on presenting the excellent performance of AN-GCN, alleviating the oversmoothing problem and testing each hyper parameter

2.5.1. Experimental Setup

（1）Datasets

①ABIDE Dataset: 871 of 1112 subjects with 403 ASD and 468 HC

②ADNI Dataset: 573 with 402 HC and 171 MCI

（2）Data Preprocessing

①Pre-processing for ABIDE: Configurable Pipeline for the Analysis of Connectomes (C-PAC), which contains skull stripping, slice timing correction, motion correction, global mean intensity normalization, nuisance signal regression, band-pass filtering (0.01–0.1Hz), and registration of fMRI images to a standard anatomical space

②Atlas for ABIDE: Harvard Oxford atlas with z-score normalization

③FC for ABIDE: Pearson's correlation with Fisher z-transformation

④Phenotypic measures of ABIDE: age, sex and acquisition site

⑤Atlas for ADNI: Automated Anatomical Labeling (AAL)

⑥Phenotypic measures of ADNI: sex ang age

⑦Feature vector: the upper triangular elements of FC

（3）Performance Evaluation Metrics

①Cross validation: 10-fold

②Evaluation metrics: Accuracy (Acc), Area Under Curve (AUC), Recall, Precision, F1 score, Matthews Correlation Coefficient (MCC), and Cohen’s kappa (κ)（你也没必要挨个介绍）

（4）Baseline Methods

①Introducing some baseline models

（5）Implementation Details

①Epochs: 150 for ABIDE and 100 for ADNI

②Learning rate: 1e-3

③Hyperparameter: $\alpha _l=0.1,\beta _l=0.3$ in ABIDE, $\alpha _l=0.1,\beta _l=0.2$ in ADNI

④Number of layers: $L=10$

⑤Training stops when loss stops decreasing after 10 epochs

⑥Training history on ABIDE:

2.5.2. Experimental Results and Analysis

①Comparison table on ABIDE:

②Comparison table on ADNI:

③Comparative box plots on ABIDE:

④Comparative box plots on ADNI:

⑤PR and ROC on ABIDE, while other average metrics are shown in parentheses:

⑥PR and ROC on ADNI, while other average metrics are shown in parentheses:

⑦Time and spatial complexity of AN-GCN are of the same magnitude as GCN（我就不赘述了我觉得在现在分类精度低的情况下也没必要追求极快的速度）

2.5.3. Parameter Sensitivity Analysis

①Comparison table with the change of layers on ABIDE:

②Comparison table with the change of layers on ADNI:

it presents the robustness of overe-smoothing problem for AN-GCN, which mainly brought by residual connections in aggregation scheme.

③Comparison table with different batch size on ABIDE and ADNI:

2.5.4. Limitations

①Less interpretable is brought by skip connections and identity mapping.

②当训练和测试显著不同的时候性能会下降。不是，谁训练ASD预测AD啊？？你在逗我？谁会高吗？？？

2.6. Conclusion

也不用总结得如此全面。其次，别直接就来早期干预好吗谁没事儿做核磁共振啊？

3. 知识补充

3.1. GraphSaint

知识补充：GraphSAINT——基于抽样子图的图神经网络模型 - 知乎 (zhihu.com)

4. Reference List

Salim I. & Hamza B. (2024) 'Classification of Developmental and Brain Disorders via Graph Convolutional Aggregation', Cognitive Computation, 16, pp. 701-716. doi: https://doi.org/10.48550/arXiv.2311.07370

夏莉莉iy

关注

23
点赞
踩
18

收藏

觉得还不错? 一键收藏
5
评论
[论文精读]Classification of Developmental and Brain Disorders via Graph Convolutional Aggregation

计算机-人工智能-脑科学与类脑智能
复制链接

扫一扫