论文代码:https://github.com/haojiang1/hi-GCN
目录
2.3.2. Functional connectivity network
2.3.3. Graph convolutional networks
2.4.1. The network architecture of hierarchical GCN
2.5.1. Databases and preprocessing
2.5.2. Performance on hierarchical GCN
2.5.3. Performance on different construction in population network
2.5.4. The influence of the hyperparameters of Hi-GCN
2.5.5. Comparisons with prior works
2.5.6. Ablation study and discussion
1. 省流版
1.1. 心得
(1)它那个图核蛮奇妙的,之后细致了解一下
1.2. 论文框架图
2. 论文逐段精读
2.1. Abstract
2.1.1. Purpose
Low dimensional brain connectivity networks prevail in detecting and predicting diseases using brain structures.
2.1.2. Method
They put forward a end to end hierarchical GCN framework (hi-GCN), while taking topology structure and relationships between subjects into consideration.
2.1.3. Results
①They choose Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and Autism Brain Imaging Data Exchange (ABIDE) dataset.
②The accuracy they achieve in ABIDE is 73.1% and in ADNI is 78.5%.
③The AUC they get in ABIDE is 82.3% and in ADNI is 86.5%.
2.1.4. Conclusion
①They consider the correlation between individual brain structure and global network
②The authors reckon jointly optimizing strategy is faster and easier convergence than pre-trained or two-step supervised hi-GCN.
2.2. Introduction
①Resting state functional MRI (rs-fMRI) presents a default state of brain. And there are change of cortical connectivity of Autism spectrum disorder (ASD) or Brain disorders such as Alzheimer's disease (AD) subjects
②Appropriate feature extraction is extremely essential, methods as clustering coefficients, local clustering coefficients and deep convolutional nerual network are all useful. (但是作者在这里说手工计算的是中级特征或低级特征,没有DNN的高级特征好。emm,以什么来分级的呢?)
③Network embedding (functional connectivity) only consider individual, but without considering the relationship between subjects
④They build another framework to connect subjects where nodes represent subjects with features and edges represent associations (calculated by pairwise similarities between nodes)
⑤They think the challenge is how to combine individual and global. Then they designed f-GCN for local network and p-GCN for population network
2.3. Preliminaries
2.3.1. Problem setup
①They build undirected graph for each one, where represents each node, denotes adjacency matrix, is the number of ROI (they use 116)
②⭐作者说“每个顶点的嵌入是在GCN训练期间学习的,因此设置为1”是什么东西,不是向量吗?
③Finally they get a group of graphs
④In the global population network, they define , where denotes pairwise similarities adjacency matrix between subjects,
2.3.2. Functional connectivity network
①⭐They extract time series and then normalize them into zero mean and unit variance
②Calculating Pearson's correlation (PC) between ROIs:
where denotes the cross covariance between and ;
represents the the standard deviation of node .
③Network construction:
2.3.3. Graph convolutional networks
①GCN is not able to be generalized in irregular graph, so they adopt spectral approach in their graph
②Each layer in GCN is:
where , ;
denotes learnable weight matrix;
denotes node embedding from preceding layers.
③GCN can be seen as Laplacian smoothing operator, each convolutional layer follows a ReLU activation and all the layers share the same adjacency matrix
④To reduce computational complexity, they approximate convolutional kernels by Chebyshev Polynomials
⑤They apply K-order neighborhood to convolution
2.4. Hierarchical GCN
2.4.1. The network architecture of hierarchical GCN
(1)The components of hi-GCN
①f-GCN: representation for each subject
②Network similarity estimation in : they use graph kernel to calculate network similarity between different subjects
③p-GCN: combining f-GCN and graph kernel
④Backpropagation are adopted
(2)The overall framework:
(3)Feature embedding
①The figure of embedding:
②The broad change:
where it is consisted of
2.4.2. F-GCN
①There are coarsened graphs by average pooling after convolutions
②They classify nodes in subnetworks by spectral clustering
③ represents coarsened graph, denotes assignment matrix which judges belonging relationship between nodes and subnetworks and where is the number of ROI
④ represents the adjacency matrix of coarsened subnetworks and represents the adjacency matrix of edges in subnetworks. And,
⑤In aggregating nodes' features and obtaining the representation s of supernodes, they apply:
where is pooling operator, which consists of "all the c-th up-sampled eigenvectors from all the subnetworks";
denotes the pooling results;
denotes the -th embedding dimensionality.
⑥The authors only apply fisrt as the final results, i.e.
2.4.3. p-GCN
①They evaluate topological similarities between functioanl connectivities by graph kernel
②The define graph kernels between networks and are:
where denotes kernel function
③They calculated the distances between instances in the -th kernel function and represented it by where . If adopting RBF kernel function, the distance function will be where is kernel parameter.
④Moreover, they set threshold for distance, when distance is smaller than , otherwise
⑤Similarities between networks are:
where
2.4.4. Training scheme
(1)They provide 3 strategies to maximize optimization
①Two step training, which means the loss functions are independent
②Jointly training: both f-GCN and p-GCN share the same loss function:
③Pre training: also two steps but a little bit more complicated:
2.5. Experiment
2.5.1. Databases and preprocessing
①Task: binary classification in two large datasets, ABIDE and ADNI
(1)ABIDE I
①Samples: they choose 866 of them while 402 are ASD and 464 are healthy controls
②Reason: same as 基于功能连接组的静息态fMRI预测模型基准测试 - ScienceDirect
(2)ADNI
①Samples: 133 in all, 99 with Mild Cognitive Impairment (MCI) and 34 with Alzheimer's Disease (AD)
②Reason: see at 2.5.1. (1) ②, the same article
2.5.2. Performance on hierarchical GCN
(1)Parameter setting
①Common parameter settings:
②Hyper-parameter settings:
;
;
③Validation: 10-fold cross validation for evaluating hi-GCN, 10-fold nested cross validation for selecting best parameters, nested 5-fold CV for adjusting hyper-parameters
④Applying Student's t-test (significance level = 0.05) to test difference between hi-GCN and other models
(2)Comparison
①They compared hi-GCN with connectivity features based method, Eigenpooling GCN and Population GCN
②Ridge classifier is able to select network based features, but they are low level features
③Eigenpooling GCN is "an end-to-end trainable GCN with a pooling operator EigenPooling"
④Node features in Population GCN are automatically learned, and edges in it are similarities between nodes
⑤BrainNetCNN is combined with edge to edge, edge to node and node to node filter
⑥Additionally, they compared hi-GCN with 4 SOTA models, topological Clustering Coefficient (CC) and t-BNE, subgraph based Graph Boosting and Ordinal Pattern. Then, they introduce all of them
⑦Performances in ABIDE I dataset (mean value in 10 experiments):
⑧Performance in ADNI dataset
⑨Through the two graphs, the authors indicate sub-graph based methods is better than topology based to some extend.
⑩They compared the convergence speed of 3 methods:
2.5.3. Performance on different construction in population network
(1)The effect of similarity estimation scheme with auxiliary information
①They add non-image data, such as gender and acquisition site for ABIDE, age and gender for ADNI
②Similarity of non-image data is calculated by:
where is threshold value and is set by 2 in their model
③The integrated similarity degree will be:
they control the similarity weights through . However, in their model, they just set to 0.5.
(2)The effect of different similarity estimation scheme
①They do another experiments with adding non-image information:
in this table, image and non-image construct multi-modalities. Performance in multi-modalities is better than single modality.
②The graph similarity metric is learned by siamese graph convolutional neural network (s-GCN):
③Comparison of netwowrk similarity:
2.5.4. The influence of the hyperparameters of Hi-GCN
①The number of clusters (), the threshold () and the kernel parameter () are three important indices
②In their experiments, they found is the best cluster setting, whereas the best and are unknown.
③They found the increasement of changes the network to sparse
④Moreover, they observed that has more influence on performance than
⑤Performance on different hyperparameter values:
2.5.5. Comparisons with prior works
①Traditional machine learning separate feature extraction from model learning
②Comparison of different classifiers in ABIDE:
③Comparison of different classifiers in ADNI:
2.5.6. Ablation study and discussion
(1)Varying the threshold of the population network
①Population network for ABIDE and ADNI dataset:
②The thresholds selecting table:
(2)Exchanging the strategies of subjects' initial feature and similarity estimation
They provide the comparison of three feature embedding methods (the first indicates initial feature and the second attribute is similarity):
(3)Fusing the embedding with the graph properties as node features
The authors fiind when adding Ordinal Pattern or t-BNE, embedding might learn more appropriate feature expression:
(4)Evaluating the embedding with the various traditional classifiers
They compared SVM, Random Forest and Logistic Regression classifiers with embedding of f-GCN and hi-GCN respectively:
(5)Evaluating the hierarchical embedding learning with various GCN models
They compare different models:
2.6. Conclusion
The authors recognized the effectiveness and broad prospects of the functional connectivity matrix of fMRI in classifying diseases
3. 知识补充
3.1. Graph Kernel
(1)定义:图核(Graph Kernel)是一种有效的图结构相似度度量方式,可以用于比较不同图结构(如加权图、有向图、无向图、带标签的图等)之间的相似性。图核基于图论中的核方法,通过递归地将图分解为原子子结构,并比较这些原子子结构的所有对,从而将每个图表示为向量。然后,使用两个向量的内积来比较两个图的相似性。
(2)在图核中,通常定义了三种类型的原子子结构,包括:
①Graphlet:原图被分解为一组大小为k的非同构子图graphlet,每个子图都表示为一个节点集合,通过比较这些子图的集合来度量两个图的相似性。
②Subtree Patterns:将图分解为一系列大小不同的子图,每个子图都表示为一个树结构。通过比较这些子图的集合来度量两个图的相似性。
③Random Walks:通过随机游走算法在图中生成一系列路径,并将每个路径表示为一个向量。通过比较这些向量的集合来度量两个图的相似性。
(3)方法和应用:使用图核时,可以根据具体任务和数据特点选择不同的图核方法,例如基于谱图理论的核方法、基于随机游走的核方法、基于神经网络的核方法等。图核在计算机视觉、自然语言处理、生物信息学等领域都有广泛的应用,可以用于比较不同数据点之间的相似性、进行分类和聚类等机器学习任务。
(4)因为没有很好的总结博客或者百科,以上皆来自文心一言,谨慎使用
(5)Weisfeiler-Kehman Graph Kernels笔记参考:《Weisfeiler-Lehman Graph Kernels》论文阅读 - 知乎 (zhihu.com)
(6)图核综述:[1903.11835] 图核综述 (arxiv.org)
3.2. Gram Matrix
3.3. Ridge Classifier
Ridge Classifier是一种机器学习模型,属于sklearn.linear_model,它使用Ridge回归模型来建立一个分类器。该分类器将问题视为回归任务,并使用最小二乘损失来适应分类模型。与正则化的Logistic Regression不同,Ridge Classifier的损失函数为RMSE + l2 penalty。对于两分类,大于0为正例,小于0为负例。对于多分类,会使用One-Vs-Rest model对每一个分类进行预测,然后再合起来进行多分类的预测。在实践中,Ridge Classifier使用的惩罚最小二乘损失允许对具有不同计算性能概要的数值求解器进行各自不同的选择,而且比Logistic Regression要快得多,因为它只需要一次计算投影矩阵。
4. Reference List
Jiang, H. et al. (2020) 'Hi-GCN: A hierarchical graph convolution network for graph embedding learning of brain network and brain disorders prediction', Computers in Biology and Medicine, 127. doi: Redirecting