[论文精读]Hi-GCN: A hierarchical graph convolution network for graph embedding learning of brain network

论文原文:Hi-GCN: A hierarchical graph convolution network for graph embedding learning of brain network and brain disorders prediction - ScienceDirect

论文代码:https://github.com/haojiang1/hi-GCN

目录

1. 省流版

1.1. 心得

1.2. 论文框架图

2. 论文逐段精读

2.1. Abstract

2.1.1. Purpose

2.1.2. Method

2.1.3. Results

2.1.4. Conclusion

2.2. Introduction 

2.3. Preliminaries

2.3.1. Problem setup

2.3.2. Functional connectivity network

2.3.3. Graph convolutional networks

2.4. Hierarchical GCN

2.4.1. The network architecture of hierarchical GCN

2.4.2. F-GCN

2.4.3. p-GCN

2.4.4. Training scheme

2.5. Experiment

2.5.1. Databases and preprocessing

2.5.2. Performance on hierarchical GCN

2.5.3. Performance on different construction in population network

2.5.4. The influence of the hyperparameters of Hi-GCN

2.5.5. Comparisons with prior works

2.5.6. Ablation study and discussion

2.6. Conclusion

3. 知识补充

3.1. Graph Kernel

3.2. Gram Matrix

3.3. Ridge Classifier

4. Reference List


1. 省流版

1.1. 心得

(1)它那个图核蛮奇妙的,之后细致了解一下

1.2. 论文框架图

​​​​​​​

2. 论文逐段精读

2.1. Abstract

2.1.1. Purpose

        Low dimensional brain connectivity networks prevail in detecting and predicting diseases using brain structures.

2.1.2. Method

        They put forward a end to end hierarchical GCN framework (hi-GCN), while taking topology structure and relationships between subjects into consideration.

2.1.3. Results

        ①They choose Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and Autism Brain Imaging Data Exchange (ABIDE) dataset. 

        ②The accuracy they achieve in ABIDE is 73.1% and in ADNI is 78.5%.

        ③The AUC they get in ABIDE is 82.3% and in ADNI is 86.5%.

2.1.4. Conclusion

        ①They consider the correlation between individual brain structure and global network

        ②The authors reckon jointly optimizing strategy is faster and easier convergence than pre-trained or two-step supervised hi-GCN.

2.2. Introduction 

        ①Resting state functional MRI (rs-fMRI) presents a default state of brain. And there are change of cortical connectivity of Autism spectrum disorder (ASD) or Brain disorders such as Alzheimer's disease (AD) subjects

        ②Appropriate feature extraction is extremely essential, methods as clustering coefficients, local clustering coefficients and deep convolutional nerual network are all useful. (但是作者在这里说手工计算的是中级特征或低级特征,没有DNN的高级特征好。emm,以什么来分级的呢?

        ③Network embedding (functional connectivity) only consider individual, but without considering the relationship between subjects

        ④They build another framework to connect subjects where nodes represent subjects with features and edges represent associations (calculated by pairwise similarities between nodes)

        ⑤They think the challenge is how to combine individual and global. Then they designed f-GCN for local network and p-GCN for population network

2.3. Preliminaries

2.3.1. Problem setup

        ①They build undirected graph N_{i}=\left \{ R_{i},A_{i} \right \} for each one, where R_{i}=\left \{ r^{1}_{i},...,r^{M}_{i} \right \} represents each node, A_{i}\in R^{M\times M} denotes adjacency matrix, M is the number of ROI (they use 116)

        ②⭐作者说“每个顶点的嵌入R是在GCN训练期间学习的,因此R_i设置为1”是什么东西,R_i不是向量吗?

        ③Finally they get a group of graphs \left \{ N_1,...,N_D \right \}

        ④In the global population network, they define \widehat{N}=\left \{ \widehat{R},\widehat{A} \right \}, where \widehat{A} denotes pairwise  similarities adjacency matrix between subjects,

2.3.2. Functional connectivity network

        ①⭐They extract time series and then normalize them into zero mean and unit variance

        ②Calculating Pearson's correlation (PC) between ROIs:

Q\left(r_i,r_j\right)=\frac{Cov(v_i,v_j)}{\sigma_{v_i}\sigma_{v_j}}

where Cov(v_i,v_j) denotes the cross covariance between v_i and v_j;

\sigma _v represents the the standard deviation of node v.

        ③Network construction:

2.3.3. Graph convolutional networks

        ①GCN is not able to be generalized in irregular graph, so they adopt spectral approach in their graph

        ②Each layer in GCN is:

E^{(l+1)}=\mathbf{ReLu}\left ( \widetilde{D}^{-\frac{1}{2}} \widehat{A} \widetilde{D}^{-\frac{1}{2}}E^{(l)}W^{(l)} \right )

where \widehat{A}=A+\mathbf{I}_{n}\widetilde{D}_{ii}=\sum_{j}\widehat{A}_{ij};

W denotes learnable weight matrix;

E^{(l+1)} denotes node embedding from preceding l layers.

        ③GCN can be seen as Laplacian smoothing operator, each convolutional layer follows a ReLU activation and all the layers share the same adjacency matrix

        ④To reduce computational complexity, they approximate convolutional kernels by Chebyshev Polynomials

        ⑤They apply K-order neighborhood to convolution

2.4. Hierarchical GCN

2.4.1. The network architecture of hierarchical GCN

(1)The components of hi-GCN

        ①f-GCN: representation for each subject

        ②Network similarity estimation in \widehat{N}: they use graph kernel to calculate network similarity between different subjects

        ③p-GCN: combining f-GCN and graph kernel

        ④Backpropagation are adopted

(2)The overall framework:

(3)Feature embedding

        ①The figure of embedding:

        ②The broad change:

\mathbf{hi}-\mathbf{GCN}:\boldsymbol{N}\to\widehat{\boldsymbol{e}}

where it is consisted of \mathbf{f}-\mathbf{GCN}\left(\boldsymbol{N}\right)=\boldsymbol{e};\quad\mathbf{p}-\mathbf{GCN}\left(\boldsymbol{e},\widehat{A}\right)=\left[\widehat{\boldsymbol{e}},\hat{y}\right]

2.4.2. F-GCN

        ①There are coarsened graphs by average pooling after convolutions

        ②They classify nodes in subnetworks by spectral clustering

        ③N_{coar} represents coarsened graph, C\in \mathbb{R}^{M\times N} denotes assignment matrix which judges belonging relationship between nodes and subnetworks and where M is the number of ROI

        ④A_{coar} represents the adjacency matrix of coarsened subnetworks and A_{ext} represents the adjacency matrix of edges in subnetworks. And, 

A_{coar}=\sum_{h=1}^{H}C^{(h)}A_{ext}\left(\boldsymbol{C}^{(h)}\right)^T

        ⑤In aggregating nodes' features and obtaining the representation s of supernodes, they apply:

E_c=\Theta_c^TE

where \Theta _c is pooling operator, which consists of "all the c-th up-sampled eigenvectors from all the subnetworks";

E_c\in\mathbb{R}^{H\times P_l} denotes the pooling results;

P_l denotes the l-th embedding dimensionality.

        ⑥The authors only apply fisrt Z as the final results, i.e. E_coar=[E_0,...,E_Z]

2.4.3. p-GCN

        ①They evaluate topological similarities between functioanl connectivities by graph kernel

        ②The define graph kernels between networks N_i and N_j are:

S(N_i,N_j)=\left \langle \varphi (N_i),\varphi (N_j) \right \rangle

where \varphi (\cdot ) denotes kernel function

        ③They calculated the distances between instances in the q-th kernel function and represented it by K_q(r^i_a,r^i_b) where r^i_a=\sum_{M}^{u}A_i(a,u). If adopting RBF kernel function, the distance function will be K_q(r^i_a,r^i_b)=exp(-\frac{\left \| r^i_a-r^i_b \right \|}{2\sigma }) where \sigma is kernel parameter.

        ④Moreover, they set threshold T for distance, K=1 when distance is smaller than T, otherwise K=0

        ⑤Similarities between networks are:

S_I(N_i,N_j)=\frac{\sum_{a=1}^{M}\sum_{b=1}^{M}w^i_aw^j_bK(x^i_a,x^j_b)}{\sum_{M}^{n_i}w^i_a\sum_{M}^{b=1}w^j_b}

wherew^i_a=\frac{1}{\sum_{u=1}^{M}K(x^i_a,x^i_u)}

2.4.4. Training scheme

(1)They provide 3 strategies to maximize optimization

        ①Two step training, which means the loss functions are independent

        ②Jointly training: both f-GCN and p-GCN share the same loss function:

        ③Pre training: also two steps but a little bit more complicated:

2.5. Experiment

2.5.1. Databases and preprocessing

        ①Task: binary classification in two large datasets, ABIDE and ADNI

(1)ABIDE I

        ①Samples: they choose 866 of them while 402 are ASD and 464 are healthy controls

        ②Reason: same as 基于功能连接组的静息态fMRI预测模型基准测试 - ScienceDirect

(2)ADNI

        ①Samples: 133 in all, 99 with Mild Cognitive Impairment (MCI) and 34 with Alzheimer's Disease (AD)

        ②Reason: see at 2.5.1. (1) ②, the same article

2.5.2. Performance on hierarchical GCN

(1)Parameter setting

        ①Common parameter settings:

        ②Hyper-parameter settings:

T\in\left \{ 0.3,0.45,0.6,0.47,0.9 \right \};

\gamma \in\left \{ 1,2,3,4,5 \right \};

H\in\left [ 0,1 \right ]

        ③Validation: 10-fold cross validation for evaluating hi-GCN, 10-fold nested cross validation for selecting best parameters, nested 5-fold CV for adjusting hyper-parameters

        ④Applying Student's t-test (significance level = 0.05) to test difference between hi-GCN and other models

(2)Comparison 

        ①They compared hi-GCN with connectivity features based method, Eigenpooling GCN and Population GCN

        ②Ridge classifier is able to select network based features, but they are low level features

        ③Eigenpooling GCN is "an end-to-end trainable GCN with a pooling operator EigenPooling"

        ④Node features in Population GCN are automatically learned, and edges in it are similarities between nodes

        ⑤BrainNetCNN is combined with edge to edge, edge to node and node to node filter

        ⑥Additionally, they compared hi-GCN with 4 SOTA models, topological Clustering Coefficient (CC) and t-BNE, subgraph based Graph Boosting and Ordinal Pattern. Then, they introduce all of them

        ⑦Performances in ABIDE I dataset (mean value in 10 experiments):

        ⑧Performance in ADNI dataset

        ⑨Through the two graphs, the authors indicate sub-graph based methods is better than topology based to some extend.

        ⑩They compared the convergence speed of 3 methods:

2.5.3. Performance on different construction in population network

(1)The effect of similarity estimation scheme with auxiliary information

        ①They add non-image data, such as gender and acquisition site for ABIDE, age and gender for ADNI

        ②Similarity of non-image data is calculated by:

\left.S_{NI}\left(M_i,M_j\right)=\left\{\begin{array}{cc}1&if\left|M_i-M_j\right|<\overline{T}\\0&otherwise\end{array}\right.\right.

where \overline{T} is threshold value and is set by 2 in their model

        ③The integrated similarity degree will be:

S=\alpha S_I+(1-\alpha )S_{NI}

they control the similarity weights through \alpha. However, in their model, they just set \alpha to 0.5.

(2)The effect of different similarity estimation scheme

        ①They do another experiments with adding non-image information:

in this table, image and non-image construct multi-modalities. Performance in multi-modalities is better than single modality.

        ②The graph similarity metric is learned by siamese graph convolutional neural network (s-GCN):

K_{pdist}\left(\boldsymbol{u},\boldsymbol{v}\right)=\frac{\left(\boldsymbol{u}-\overline{\boldsymbol{u}}\right)\cdot\left(\boldsymbol{v}-\overline{\boldsymbol{v}}\right)}{\left\|\left(\boldsymbol{v}-\overline{\boldsymbol{v}}\right)\right\|_2\left\|\left(\boldsymbol{v}-\overline{\boldsymbol{v}}\right)\right\|_2}

        ③Comparison of netwowrk similarity:

2.5.4. The influence of the hyperparameters of Hi-GCN

        ①The number of clusters (H), the threshold (T) and the kernel parameter (\gamma) are three important indices

        ②In their experiments, they found H=7 is the best cluster setting, whereas the best T and \gamma are unknown.

        ③They found the increasement of T changes the network to sparse

        ④Moreover, they observed that T has more influence on performance than \gamma

        ⑤Performance on different hyperparameter values:

2.5.5. Comparisons with prior works

        ①Traditional machine learning separate feature extraction from model learning

        ②Comparison of different classifiers in ABIDE:

        ③Comparison of different classifiers in ADNI:

2.5.6. Ablation study and discussion

(1)Varying the threshold of the population network

        ①Population network for ABIDE and ADNI dataset:

        ②The thresholds selecting table:

(2)Exchanging the strategies of subjects' initial feature and similarity estimation

        They provide the comparison of three feature embedding methods (the first indicates initial feature and the second attribute is similarity):

(3)Fusing the embedding with the graph properties as node features

        The authors fiind when adding Ordinal Pattern or  t-BNE, embedding might learn more appropriate feature expression:

(4)Evaluating the embedding with the various traditional classifiers

        They compared SVM, Random Forest and Logistic Regression classifiers with embedding of f-GCN and hi-GCN respectively:

(5)Evaluating the hierarchical embedding learning with various GCN models

        They compare different models:

2.6. Conclusion

        The authors recognized the effectiveness and broad prospects of the functional connectivity matrix of fMRI in classifying diseases

3. 知识补充

3.1. Graph Kernel

(1)定义:图核(Graph Kernel)是一种有效的图结构相似度度量方式,可以用于比较不同图结构(如加权图、有向图、无向图、带标签的图等)之间的相似性。图核基于图论中的核方法,通过递归地将图分解为原子子结构,并比较这些原子子结构的所有对,从而将每个图表示为向量。然后,使用两个向量的内积来比较两个图的相似性。

(2)在图核中,通常定义了三种类型的原子子结构,包括:

        ①Graphlet:原图被分解为一组大小为k的非同构子图graphlet,每个子图都表示为一个节点集合,通过比较这些子图的集合来度量两个图的相似性。

        ②Subtree Patterns:将图分解为一系列大小不同的子图,每个子图都表示为一个树结构。通过比较这些子图的集合来度量两个图的相似性。

        ③Random Walks:通过随机游走算法在图中生成一系列路径,并将每个路径表示为一个向量。通过比较这些向量的集合来度量两个图的相似性。

(3)方法和应用:使用图核时,可以根据具体任务和数据特点选择不同的图核方法,例如基于谱图理论的核方法、基于随机游走的核方法、基于神经网络的核方法等。图核在计算机视觉、自然语言处理、生物信息学等领域都有广泛的应用,可以用于比较不同数据点之间的相似性、进行分类和聚类等机器学习任务。

(4)因为没有很好的总结博客或者百科,以上皆来自文心一言,谨慎使用

(5)Weisfeiler-Kehman Graph Kernels笔记参考:《Weisfeiler-Lehman Graph Kernels》论文阅读 - 知乎 (zhihu.com)

(6)图核综述:[1903.11835] 图核综述 (arxiv.org)

3.2. Gram Matrix

3.3. Ridge Classifier

Ridge Classifier是一种机器学习模型,属于sklearn.linear_model,它使用Ridge回归模型来建立一个分类器。该分类器将问题视为回归任务,并使用最小二乘损失来适应分类模型。与正则化的Logistic Regression不同,Ridge Classifier的损失函数为RMSE + l2 penalty。对于两分类,大于0为正例,小于0为负例。对于多分类,会使用One-Vs-Rest model对每一个分类进行预测,然后再合起来进行多分类的预测。在实践中,Ridge Classifier使用的惩罚最小二乘损失允许对具有不同计算性能概要的数值求解器进行各自不同的选择,而且比Logistic Regression要快得多,因为它只需要一次计算投影矩阵。

4. Reference List

Jiang, H. et al. (2020) 'Hi-GCN: A hierarchical graph convolution network for graph embedding learning of brain network and brain disorders prediction', Computers in Biology and Medicine, 127. doi: Redirecting

  • 19
    点赞
  • 24
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值