英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
2.4.1. Preliminaries and Problem Statement
2.5.2. Experimental Results and Analysis
2.5.3. Parameter Sensitivity Analysis
1. 省流版
1.1. 心得
(1)...还挺新鲜热乎的论文
(2)比GCN高50%...放在abstract里面...我...不是很想评价
(3)为什么一直强调GCN啊
(4)是我状态不行吗为啥我一直觉得这篇的related works那些写得有点...不太有区分性
(5)感觉Related works如果疾病分类很小众这样倒是没啥,但是现在做疾病分类的已经很多了,下面也有无数的变体,对其细分一下我觉得反而是更好的选择
1.2. 论文总结图
2. 论文逐段精读
2.1. Abstract
①They introduced an aggregator normalization graph convolutional network in their model, which contains aggregation and skipping connection and identity map. Identity mapping shows the structural information and skipping connections "enable the direct flow of information from the input features to later layers of the network"(我现在不是那么理解是怎么个跳过,看正文再说)
②They use both image and non-image information
2.2. Introduction
①They put forward an aggregator normalization graph convolutional network (AN-GCN) to overcome the problem of over smoothing
②The feature selection in AGGREGATE eliminates the bias of small batch data and enhances the robustness by reducing the sensitivity of data
proliferation n.激增;增殖;涌现;大量的事物 lieu n.代替;场所,处所
2.3. Related Work
(1)Graph Convolutional Networks
①GCN based models are typical semi-supervised methods
②GCN过渡平滑的原因倒是解释的很清楚,一直在聚合周围的,多来几次就每个点值差不多了
(2)Disease Prediction
①你在下面提出的模型...疾病分类???啊这,有点过于新颖了
2.4. Method
2.4.1. Preliminaries and Problem Statement
(1)Basic Notions
①Defining an undireted graph , where
denotes
nodes,
denotes the edges
②Defining as adjacency matrix, where
is the weight of edge between
and
③Defining as feature matrix of nodes
④They aim to map to
to reduce the dimension of features, where
(2)Problem Statement
①They define labeled set and unlabeled set
, where
②The authors intend to find the parameter in
to realize precise classification, where
denotes the set of labeled nodes
③The labeled can be represented as an one hot encoding with
dimensionality, where
is the number of classes
2.4.2. Proposed Model
A two stage method is proposed that the authors construct population graph first and introduce classification approach later
(1)Population Graph Construction
①The construction of graph:
where the node features are extracted from correlation matrix (image data) and the edge features are non-image data
②Defining is non-imaging phenotypes with
types
③The can be represented as:
where ,
denotes the pairwise distance between phenotypic
measures (qualitative data such as sex, quantitative data such as age).
For qualitative data, the distance can be:
For quantitative data, the distance can be:
where denotes the threshold(⭐好奇怪啊为什么性别相等反而有距离,不等就没有距离?为什么距离小于阈值是有距离,大于了反而没有了?)
④The kernel similarity measure:
where denotes a smoothing parameter(额?然后还能决定核的宽度?),
denotes the correlation distance between
and
:
⑤Overall framework:
(2)Disease Prediction Model
①Feature Diffusion: for self loops, they define . And the layer-wise feature diffusion in an
-layer GCN can be:
where denotes the normalized adjacency matrix with self-added loops,
denotes the diagonal degree matrix,
denotes the input feature matrix in the
-th layer and the first layer is
②Aggregated Feature Diffusion: a layer-wise aggregated feature diffusion rule for node features in the -th layer:
where , each
denotes an aggregator normalization constant.
and
is the number of times the node
or edge
apperars in the subgraphs of
. These subgraphs were obtained by repeatedly running the GraphSaint sampler before the training began
③Learning Node Embeddings: the ptopagation rule is:
where and
are nonnegative hyper-parameters in
,
is point-wise non-linear activation function such as ReLU. The final representation contains the minimum proportion of feature information from input layer by skip connections, which determined by
④Model Prediction: the classifier of the final node representations:
it is the matrix of predicted labels for graph nodes and denotes the number of classes
⑤Model Training: the use the minimum of cross-entropy loss function:
where is the element of i-th row and c-th column in matrix
, namely the probability that the network associates the i-th node with class c
⑥Optimizer: Adam
2.5. Experiments
They focus on presenting the excellent performance of AN-GCN, alleviating the oversmoothing problem and testing each hyper parameter
2.5.1. Experimental Setup
(1)Datasets
①ABIDE Dataset: 871 of 1112 subjects with 403 ASD and 468 HC
②ADNI Dataset: 573 with 402 HC and 171 MCI
(2)Data Preprocessing
①Pre-processing for ABIDE: Configurable Pipeline for the Analysis of Connectomes (C-PAC), which contains skull stripping, slice timing correction, motion correction, global mean intensity normalization, nuisance signal regression, band-pass filtering (0.01–0.1Hz), and registration of fMRI images to a standard anatomical space
②Atlas for ABIDE: Harvard Oxford atlas with z-score normalization
③FC for ABIDE: Pearson's correlation with Fisher z-transformation
④Phenotypic measures of ABIDE: age, sex and acquisition site
⑤Atlas for ADNI: Automated Anatomical Labeling (AAL)
⑥Phenotypic measures of ADNI: sex ang age
⑦Feature vector: the upper triangular elements of FC
(3)Performance Evaluation Metrics
①Cross validation: 10-fold
②Evaluation metrics: Accuracy (Acc), Area Under Curve (AUC), Recall, Precision, F1 score, Matthews Correlation Coefficient (MCC), and Cohen’s kappa (κ)(你也没必要挨个介绍)
(4)Baseline Methods
①Introducing some baseline models
(5)Implementation Details
①Epochs: 150 for ABIDE and 100 for ADNI
②Learning rate: 1e-3
③Hyperparameter: in ABIDE,
in ADNI
④Number of layers:
⑤Training stops when loss stops decreasing after 10 epochs
⑥Training history on ABIDE:
2.5.2. Experimental Results and Analysis
①Comparison table on ABIDE:
②Comparison table on ADNI:
③Comparative box plots on ABIDE:
④Comparative box plots on ADNI:
⑤PR and ROC on ABIDE, while other average metrics are shown in parentheses:
⑥PR and ROC on ADNI, while other average metrics are shown in parentheses:
⑦Time and spatial complexity of AN-GCN are of the same magnitude as GCN(我就不赘述了我觉得在现在分类精度低的情况下也没必要追求极快的速度)
2.5.3. Parameter Sensitivity Analysis
①Comparison table with the change of layers on ABIDE:
②Comparison table with the change of layers on ADNI:
it presents the robustness of overe-smoothing problem for AN-GCN, which mainly brought by residual connections in aggregation scheme.
③Comparison table with different batch size on ABIDE and ADNI:
2.5.4. Limitations
①Less interpretable is brought by skip connections and identity mapping.
②当训练和测试显著不同的时候性能会下降。不是,谁训练ASD预测AD啊??你在逗我?谁会高吗???
2.6. Conclusion
也不用总结得如此全面。其次,别直接就来早期干预好吗谁没事儿做核磁共振啊?
3. 知识补充
3.1. GraphSaint
知识补充:GraphSAINT——基于抽样子图的图神经网络模型 - 知乎 (zhihu.com)
4. Reference List
Salim I. & Hamza B. (2024) 'Classification of Developmental and Brain Disorders via Graph Convolutional Aggregation', Cognitive Computation, 16, pp. 701-716. doi: https://doi.org/10.48550/arXiv.2311.07370