[论文精读]Classification of Developmental and Brain Disorders via Graph Convolutional Aggregation

这篇论文介绍了一种名为AN-GCN的模型,通过图卷积聚合和跳跃连接来改善疾病分类,特别是针对过度平滑问题。AN-GCN在处理发育和脑障碍数据时表现出高精度,且对GCN的改进包括特征选择和构建人口图策略。文中还讨论了实验设置、性能评估和参数敏感性分析。
摘要由CSDN通过智能技术生成

论文网址:Classification of Developmental and Brain Disorders via Graph Convolutional Aggregation | Cognitive Computation (springer.com)

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 省流版

1.1. 心得

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.4. Method

2.4.1. Preliminaries and Problem Statement

2.4.2. Proposed Model

2.5. Experiments

2.5.1. Experimental Setup

2.5.2. Experimental Results and Analysis

2.5.3. Parameter Sensitivity Analysis

2.5.4. Limitations

2.6. Conclusion

3. 知识补充

3.1. GraphSaint

4. Reference List


1. 省流版

1.1. 心得

(1)...还挺新鲜热乎的论文

(2)比GCN高50%...放在abstract里面...我...不是很想评价

(3)为什么一直强调GCN啊

(4)是我状态不行吗为啥我一直觉得这篇的related works那些写得有点...不太有区分性

(5)感觉Related works如果疾病分类很小众这样倒是没啥,但是现在做疾病分类的已经很多了,下面也有无数的变体,对其细分一下我觉得反而是更好的选择

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

        ①They introduced an aggregator normalization graph convolutional network in their model, which contains aggregation and skipping connection and identity map. Identity mapping shows the structural information and skipping connections "enable the direct flow of information from the input features to later layers of the network"(我现在不是那么理解是怎么个跳过,看正文再说)

        ②They use both image and non-image information

2.2. Introduction

        ①They put forward an aggregator normalization graph convolutional network (AN-GCN) to overcome the problem of over smoothing

        ②The feature selection in AGGREGATE eliminates the bias of small batch data and enhances the robustness by reducing the sensitivity of data

proliferation  n.激增;增殖;涌现;大量的事物    lieu  n.代替;场所,处所

2.3. Related Work

(1)Graph Convolutional Networks

        ①GCN based models are typical semi-supervised methods

        ②GCN过渡平滑的原因倒是解释的很清楚,一直在聚合周围的,多来几次就每个点值差不多了

(2)Disease Prediction

        ①你在下面提出的模型...疾病分类???啊这,有点过于新颖了

2.4. Method

2.4.1. Preliminaries and Problem Statement

(1)Basic Notions

        ①Defining an undireted graph G=\left ( V,E \right ), where V=\left \{ 1,...,N \right \} denotes N nodes, E\in \mathbb{R}^{V\times V} denotes the edges

        ②Defining A=\left ( A_{ij} \right ) \in \mathbb{R}^{N \times N} as adjacency matrix, where A_{ij} is the weight of edge between i and j

        ③Defining X=\left ( X_1,...,X_N \right )^T \in \mathbb{R}^{N \times F} as feature matrix of nodes

        ④They aim to map X=\left ( X_1,...,X_N \right )^T \in \mathbb{R}^{N \times F} to Z=\left ( Z_1,...,Z_N \right )^T \in \mathbb{R}^{N \times P} to reduce the dimension of features, where P\ll N

(2)Problem Statement

        ①They define labeled set D_l=\left \{ \left ( Z_i,y_i \right ) \right \}^{N_l}_{i=1} and unlabeled set D_u=\left \{ Z_i \right \}^{N_l+N_u}_{i=N_l+1}, where N_l+N_u=N

        ②The authors intend to find the parameter \theta in f_\theta :V_l\rightarrow Y_l to realize precise classification, where V_l\subset V denotes the set of labeled nodes

        ③The labeled Z_i can be represented as an one hot encoding with C dimensionality, where C is the number of classes

2.4.2. Proposed Model

        A two stage method is proposed that the authors construct population graph first and introduce classification approach later

(1)Population Graph Construction

        ①The construction of graph:

where the node features are extracted from correlation matrix (image data) and the edge features are non-image data

        ②Defining \left \{ M_1,...,M_T \right \} is non-imaging phenotypes with T types

        ③The A_{ij} can be represented as:

\mathbf{A}_{ij}=K(i,j)\sum_{t=1}^Td(M_t(i),M_t(j))

where K(i,j)=\text{similarity}(S_{i},S_{j}) , d denotes the pairwise distance between phenotypic
measures (qualitative data such as sex, quantitative data such as age).

For qualitative data, the distance can be:

\left.d(M_t(i),M_t(j))=\left\{\begin{array}{ll}1&\mathrm{if}M_t(i)=M_t(j)\\0&\mathrm{otherwise}.\end{array}\right.\right.

For quantitative data, the distance can be:

\left.d(M_t(i),M_t(j))=\left\{\begin{array}{ll}1&\mathrm{if}|M_t(i)-M_t(j)|<\tau\\0&\mathrm{otherwise}.\end{array}\right.\right.

where \tau denotes the threshold(⭐好奇怪啊为什么性别相等反而有距离,不等就没有距离?为什么距离小于阈值是有距离,大于了反而没有了?

        ④The kernel similarity measure:

K(i,j)=\exp\Big(-\frac{\rho(\mathbf{x}_i,\mathbf{x}_j)^2}{2\sigma^2}\Big)

where \sigma denotes a smoothing parameter(额?然后还能决定核的宽度?), \rho denotes the correlation distance between X_i and X_j:

\rho(\mathbf{x}_{i},\mathbf{x}_{j})=1-\frac{(\mathbf{x}_{i}-\mathbf{\bar{x}}_{i})(\mathbf{x}_{j}-\mathbf{\bar{x}}_{j})^{\mathsf{T}}}{\|\mathbf{x}_{i}-\mathbf{\bar{x}}_{i}\|\|\mathbf{x}_{j}-\mathbf{\bar{x}}_{j}\|}

        ⑤Overall framework:

(2)Disease Prediction Model

        ①Feature Diffusion: for self loops, they define \tilde{\mathbf{A}}=\mathbf{A}+\mathbf{I}. And the layer-wise feature diffusion in an L-layer GCN can be:

\mathbf{S}^{(\ell)}=\mathbf{\hat{A}}\mathbf{H}^{(\ell)},\quad\ell=0,\ldots,L-1

where \hat{\mathbf{A}}=\tilde{\mathbf{D}}^{-\frac{1}{2}}\tilde{\mathbf{A}}\tilde{\mathbf{D}}^{-\frac{1}{2}} denotes the normalized adjacency matrix with self-added loops, \tilde{\mathbf{D}}=diag(\tilde{\mathbf{A}}\mathbf{1}) denotes the  diagonal degree matrix, \mathbf{H}^{(\ell)}\in\mathbb{R}^{N\times F_{\ell}} denotes the input feature matrix in the l-th layer and the first layer is \mathbf{H}^{(0)}=\mathbf{X}

        ②Aggregated Feature Diffusion: a layer-wise aggregated feature diffusion rule for node features in the l-th layer:

\mathbf{S}^{(\ell)}=(\hat{\mathbf{A}}\odot\Gamma)\mathbf{H}^{(\ell)}

where \Gamma =\left ( \gamma _{ij} \right ) \in \mathbb{R}^{N\times N}, each \gamma _{ij}=\frac{C_i}{C_{ij}} denotes an aggregator normalization constant. C_i and C_{ij} is the number of times the node i \in V or edge \left ( i,j \right )\in E apperars in the subgraphs of G=\left ( V,E \right ). These subgraphs were obtained by repeatedly running the GraphSaint sampler before the training began

        ③Learning Node Embeddings: the ptopagation rule is:

\begin{aligned}\mathbf{H}^{(\ell+1)}&=\sigma\bigg(\big(1-\alpha_\ell\big)(\hat{\mathbf{A}}\odot\Gamma)\mathbf{H}^{(\ell)}\\&+\beta_\ell(\hat{\mathbf{A}}\odot\Gamma)\mathbf{H}^{(\ell)}\big(\mathbf{I}+\mathbf{W}^{(\ell)}\big)\\&+\alpha_\ell\mathbf{X}+\beta_\ell\mathbf{X}\big(\mathbf{I}+\mathbf{W}^{(\ell)}\big)\bigg)\end{aligned}

where \alpha _l and \beta _l are nonnegative hyper-parameters in \left [ 0,1 \right ]\sigma is point-wise non-linear activation function such as ReLU. The final representation contains the minimum proportion of feature information from input layer by skip connections, which determined by \alpha _l

        ④Model Prediction: the classifier of the final node representations: 

\hat{\mathbf{Y}}\in \mathbb{R}^{N\times C}=\mathrm{softmax}(\mathbf{Z})

it is the matrix of predicted labels for graph nodes and C denotes the number of classes

        ⑤Model Training: the use the minimum of cross-entropy loss function:

\mathcal{L}=-\sum_{i\in\mathcal{V}_{l}}\sum_{c=1}^{C}\mathbf{Y}_{ic}\log\hat{\mathbf{Y}}_{ic}

Y_{ic}=\left\{\begin{matrix} 1,\, \, \, \, \, \, \, if\, \, \, \, \, \, \, \, i\in c\\ 0,\, \, \, \, \, \, \, otherwise \end{matrix}\right.

where \hat{\mathbf{Y}}_{ic} is the element of i-th row and c-th column in matrix \hat{\mathbf{Y}}, namely the probability that the network associates the i-th node with class c

        ⑥Optimizer: Adam

2.5. Experiments

        They focus on presenting the excellent performance of AN-GCN, alleviating the oversmoothing problem and testing each hyper parameter

2.5.1. Experimental Setup

(1)Datasets

        ①ABIDE Dataset: 871 of 1112 subjects with 403 ASD and 468 HC

        ②ADNI Dataset: 573 with 402 HC and 171 MCI

(2)Data Preprocessing

        ①Pre-processing for ABIDE: Configurable Pipeline for the Analysis of Connectomes (C-PAC), which contains skull stripping, slice timing correction, motion correction, global mean intensity normalization, nuisance signal regression, band-pass filtering (0.01–0.1Hz), and registration of fMRI images to a standard anatomical space

        ②Atlas for ABIDE: Harvard Oxford atlas with z-score normalization

        ③FC for ABIDE: Pearson's correlation with Fisher z-transformation

        ④Phenotypic measures of ABIDE: age, sex and acquisition site

        ⑤Atlas for ADNI: Automated Anatomical Labeling (AAL)

        ⑥Phenotypic measures of ADNI: sex ang age

        ⑦Feature vector: the upper triangular elements of FC

(3)Performance Evaluation Metrics

        ①Cross validation: 10-fold

        ②Evaluation metrics: Accuracy (Acc), Area Under Curve (AUC), Recall, Precision, F1 score, Matthews Correlation Coefficient (MCC), and Cohen’s kappa (κ)(你也没必要挨个介绍)

(4)Baseline Methods

        ①Introducing some baseline models

(5)Implementation Details

        ①Epochs: 150 for ABIDE and 100 for ADNI

        ②Learning rate: 1e-3

        ③Hyperparameter: \alpha _l=0.1,\beta _l=0.3 in ABIDE, \alpha _l=0.1,\beta _l=0.2 in ADNI

        ④Number of layers: L=10

        ⑤Training stops when loss stops decreasing after 10 epochs

        ⑥Training history on ABIDE:

2.5.2. Experimental Results and Analysis

        ①Comparison table on ABIDE:

        ②Comparison table on ADNI:

        ③Comparative box plots on ABIDE:

        ④Comparative box plots on ADNI:

        ⑤PR and ROC on ABIDE, while other average metrics are shown in parentheses:

        ⑥PR and ROC on ADNI, while other average metrics are shown in parentheses:

        ⑦Time and spatial complexity of AN-GCN are of the same magnitude as GCN(我就不赘述了我觉得在现在分类精度低的情况下也没必要追求极快的速度)

2.5.3. Parameter Sensitivity Analysis

        ①Comparison table with the change of layers on ABIDE:

        ②Comparison table with the change of layers on ADNI:

it presents the robustness of overe-smoothing problem for AN-GCN, which mainly brought by residual connections in aggregation scheme.

        ③Comparison table with different batch size on ABIDE and ADNI:

2.5.4. Limitations

        ①Less interpretable is brought by skip connections and identity mapping.

        ②当训练和测试显著不同的时候性能会下降。不是,谁训练ASD预测AD啊??你在逗我?谁会高吗???

2.6. Conclusion

        也不用总结得如此全面。其次,别直接就来早期干预好吗谁没事儿做核磁共振啊?

3. 知识补充

3.1. GraphSaint

知识补充:GraphSAINT——基于抽样子图的图神经网络模型 - 知乎 (zhihu.com)

4. Reference List

Salim I. & Hamza B. (2024) 'Classification of Developmental and Brain Disorders via Graph Convolutional Aggregation', Cognitive Computation, 16, pp. 701-716. doi: https://doi.org/10.48550/arXiv.2311.07370

  • 23
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
Semi-supervised classification with graph convolutional networks (GCNs) is a method for predicting labels for nodes in a graph. GCNs are a type of neural network that operates on graph-structured data, where each node in the graph represents an entity (such as a person, a product, or a webpage) and edges represent relationships between entities. The semi-supervised classification problem arises when we have a graph where only a small subset of nodes have labels, and we want to predict the labels of the remaining nodes. GCNs can be used to solve this problem by learning to propagate information through the graph, using the labeled nodes as anchors. The key idea behind GCNs is to use a graph convolution operation to aggregate information from a node's neighbors, and then use this aggregated information to update the node's representation. This operation is then repeated over multiple layers, allowing the network to capture increasingly complex relationships between nodes. To train a GCN for semi-supervised classification, we use a combination of labeled and unlabeled nodes as input, and optimize a loss function that encourages the network to correctly predict the labels of the labeled nodes while also encouraging the network to produce smooth predictions across the graph. Overall, semi-supervised classification with GCNs is a powerful and flexible method for predicting labels on graph-structured data, and has been successfully applied to a wide range of applications including social network analysis, drug discovery, and recommendation systems.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值