[论文精读]TE-HI-GCN: An Ensemble of Transfer Hierarchical Graph Convolutional Networks for Disorder

论文全名:TE-HI-GCN: An Ensemble of Transfer Hierarchical Graph Convolutional Networks for Disorder Diagnosis

论文原文:TE-HI-GCN: An Ensemble of Transfer Hierarchical Graph Convolutional Networks for Disorder Diagnosis | Neuroinformatics (springer.com)

论文代码:GitHub - llt1836/TE-HI-GCN

目录

1. 省流版

1.1. 心得

1.2. 论文框架图

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. Preliminaries of Graph Convolutional Networks (GCN)

2.4. An Ensemble of Transfer Hierarchical Graph Convolutional Networks for Disorder Diagnosis, TE-HI-GCN

2.4.1. Overview of Network Architecture of TE-HI-GCN

2.4.2. Sparse Brain Networks Construction

2.5. Hierarchical GCN

2.5.1. f-GCN: Learning the Brain Network Embedding with MGC

2.5.2. p-GCN: Learning the Brain Network Embedding with Subject Correlation

2.5.3. Transfer Learning for GCN

2.5.4. Implementation Details

2.6. Experiment

2.6.1. Databases and Preprocessing

2.6.2. Evaluating the Effectiveness of our TE-HI-GCN

2.6.3. Experiment on Other Parcellations

2.6.4. Experiment on HCP Dataset

2.7. Interpretability

2.8. Ablation Study and Discussion

2.8.1. The Effectiveness of Transfer Learning

2.8.2. The Effectiveness of Ensemble Learning

2.8.3. Comparisons with Prior Works

2.9. Conclusion

3. 知识补充

3.1. ImageNet

3.2. Structual atlas and functional atlas

3.3. Networks of brain

3.4. Transfer learning

4. Reference List


1. 省流版

1.1. 心得

(1)....你这标题真的不能标个数字吗我自动分好难受

(2)中科院三区的论文,似乎是。可能我就看看咋改的hi-GCN。(看看我的第一篇论文该怎么发(系列。它是不是真改的hi-GCN啊还没看完。可能没有很创新所以分区低吧

(3)作者要揭示老年痴呆和自闭症的相关性...emm...fine。虽然想一想觉得可能没什么相关性,但是可能科学就是...可能...有些时候觉得无关的东西可能有关吧...毕竟世界还是很奇妙的

(4)作者说自闭症是神经发育性疾病,老年痴呆是神经退化性疾病。那..只有...时停!

(5)啊啊,好清晰的challenge部分哈哈哈哈哈适合新手

(6)你老介绍人家GCN干嘛啊?

(7)为什么直接把hi-GCN的图挖过来了啊。看了看是一个团队的...额。我只能说这篇文章确实有点水啊。

1.2. 论文框架图

2. 论文逐段精读

2.1. Abstract

        ①High dimensionality, noise and limited labeled data are all the limitations of GCN

        ②Faced these challenges, the proposed a ensemble framework, which provides transfer learning and hierarchical structure in sparse matrix of brain.

        ③They choose ADNI and ABIDE datasets

        ④⭐The authors declare it is the first time that transfer learning used in AD and ASD.

2.2. Introduction

        ①Autism spectrum disorder (ASD) is neurodevelopmental disorder and Alzheimer’s disease (AD) is neurodegenerative disorder.

        ②Briefly introduced existing models and approaches

(1)Challenge 1: Noisy correlations in the brain network.

        ①Taking all the correlations might cause much noise or fake connections

        ②Notwithstanding Pearson’s Correlation Coefficient (PCC) seems effective, it's over dense to some extend.

        ③⭐Both over correlation and PCC will cause overfitting and increasing computational complexity. Thus, deleting some irrelevant or weak connections is surely necessary

(2)Challenge 2: Limited labeled training data.

        ①Labeled trainable data in clinic application is limited

(3)Challenge 3: Depth limitation of GCN learning.

        ②⭐Deep GCN might cause over smoothing, namely features converging to the same value. Hence, the layers of GCN best does not beyond 3 or 4

(4)Their contributions:

        ①Removing noisy correlations in the brain network: they think multi-scale method could eliminate weak connections (unnecessary). Then they apply multi-graph clustering (MGC) to increase essential connections

        ②Exploiting the association within the subjects for GCN: the same with hi-GCN

        ③Transfer learning across the relevant disorder domains: usually adopting ImageNet framework to get pre-trained weights, then fine-tune it with tasks. Also, previous researches all prove there are corrlation between AD and ASD(都是脑子的病是吧,多少都有关联啊,这样我还能说肺癌和肺结核和肺炎都有关联呢). They utilize the commonality of two domains(哪两个领域啊)to generelize general graph structure features. Additonally, to avoid negative transferring, they adopt transfer learning in original graphs' multiple levels of topological structure.

(5)They put forward an Ensemble of Transfer HIerarchical Graph Convolutional Networks, called TE-HI-GCN. And their model is general, accurate, interpretable, robust

etiology  n.病因;病因学,病原学,致病源

prognosis  n.预后;预测;(对病情的)预断;展望;预言

2.3. Related Work

        They briefly introduced some other relevant works

2.3.1. Preliminaries of Graph Convolutional Networks (GCN)

        ①They define a set N_i=\left \{ R_i,A_i \right \},

where R_i=\left \{ r^1_i,...,r^M_i \right \} is M nodes;

 A_i\in R^{M\times M} is the adjacency matrix;

M denotes the number of nodes, they choose M=116 in their article;

AGAIN! They initialize R_i to 1(所以到底是为什么??怎么又出现了,它不是一个集合吗怎么是1的啊,是那个什么内积值吗

        ②The set of graphs is \left \{ N_1,...,N_D \right \} and D represents the number of subjects. Each subject has a lable y_i

        ③The convolution of GCN is:

\boldsymbol{E}^{(l+1)}=\mathbf{ReLu}(\boldsymbol{\tilde{D}}^{-1/2}\boldsymbol{\tilde{A}\tilde{D}}^{-1/2}\boldsymbol{E}^{(l)}\boldsymbol{W}^{(l)})

where \tilde{\boldsymbol{A}}=\boldsymbol{A}+\boldsymbol{I}_n\mathrm{~,}\tilde{\boldsymbol{D}}_{ii}=\sum_j\tilde{\boldsymbol{A}}_{ij};

W is a trainable weight matrirx;

\boldsymbol{E}^{(l+1)} denotes node embeddings after l steps of GCN

        ④Each GCN layer follows a ReLU

2.4. An Ensemble of Transfer Hierarchical Graph Convolutional Networks for Disorder Diagnosis, TE-HI-GCN

2.4.1. Overview of Network Architecture of TE-HI-GCN

        ①The framework of TE-HI-GCN

it combines HI-GCN and transfer learning units

        ②Different threshold brings different topology, then they define the threshold in their article is \left [ 0.05,0.5 \right ]

2.4.2. Sparse Brain Networks Construction

        ①Extracting time series form fMRI (ROI=116)→normalized to zero mean and unit variance→calculating Pearson’s correlation coefficient (PCC)

        ②Construction of functional connectivity:

(不是大哥你直接偷人家图吗)

        ③The setting of threshold:

Q\left ( r_i,r_j \right )=\left\{\begin{matrix} Q\left ( r_i,r_j \right ),if\, \, \, \, \, Q\left ( r_i,r_j \right )\geq \tau \\ 0\, \, \, \, \, \, \,\, \, \, \, \, \, \, \, \, \, \, \, ,else\, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \end{matrix}\right.

2.5. Hierarchical GCN

        ①They firstly introduced the theory of hi-GCN, the broad function of hi-GCN

        ②Moreover, a new graph (not so new hhh) is attached below:

2.5.1. f-GCN: Learning the Brain Network Embedding with MGC

        ①They apply multi-graph clustering (MGC), which use supergraph, to cluster features. They change the traditional approximation A\approx F^T\boldsymbol{A}^s\boldsymbol{F} to \mathcal{A}^s=\mathcal{F}\mathcal{A}\mathcal{F}^T,

where \mathcal{F}\in\mathbb{R}^{M\times C} ;

\boldsymbol{A}^s\in\mathbb{R}^{C\times C} and it is symmetric;

C denotes the cluster number;

\mathcal{A}^s denotes a weighted adjacency matrix of a supergraph;

\left \{ S_1,S_2,...,S_C \right \} is the set of supernodes(说实话这一段没有很看懂啦哈哈哈,超级节点是超图的内容吗,但是这玩意儿怎么表示呢,三维矩阵?

        ②They stacked 3 convolutional layers

        ③“与将相似节点分组的传统聚类不同,MGC 的目的是通过将噪声连接分组到一个超级节点中来隐藏噪声连接,从而突出显示连接超级节点的指示性边缘。换言之,跨不同集群连接节点的功能连接权重增强,而集群内的节点及其连接被移除。”(我翻译的)

        ④Most of the tasks separate clustering and classification, but they want to combine the two progresses.

2.5.2. p-GCN: Learning the Brain Network Embedding with Subject Correlation

        ①They use samples and similarity between samples to construct new graph

        ②Graph kernel adopts Gram Matrix, and RBF kernel function is:

S_I(\boldsymbol{N}_i,\boldsymbol{N}_j)=\frac{\sum_{a=1}^M\sum_{b=1}^Mw_a^iw_b^j\mathcal{K}(\boldsymbol{x}_a^i,\boldsymbol{x}_b^j)}{\sum_M^{n_i}w_a^i\sum_{b=1}^Mw_b^j}

where w_{a}^{i}=\frac{1}{\sum_{u=1}^{M}\mathcal{K}(\boldsymbol{x}_{a}^{i},\boldsymbol{x}_{u}^{i})}

        ③The final loss function is:

\min_{\boldsymbol{W}_f,\boldsymbol{W}_p,\mathcal{F}}\boldsymbol{L}=\boldsymbol{L}_{CE}(\boldsymbol{W}_f,\boldsymbol{W}_p,\boldsymbol{F})+\lambda_1\boldsymbol{L}_{otho}(\boldsymbol{F})+\lambda_2\boldsymbol{L}_{bal}(\boldsymbol{F})+\lambda_3\boldsymbol{L}_{pos}(\boldsymbol{F})

where \boldsymbol{W}_f and \boldsymbol{W}_p represent the weight parameters of f-GCN and p-GCN respectively;

\lambda_1\lambda_2\lambda_3 denotes positive parameters;

\boldsymbol{L}_{otho}=\left\|\boldsymbol{F}^T\boldsymbol{F}-diag(diag(\boldsymbol{F}^T\boldsymbol{F}))\right\|_F is orthogonal regularization, which is for penalizing the off-diagonal elements;

\boldsymbol{L}_{bal}=Var(diag(\boldsymbol{F}^T\boldsymbol{F})) is balancing regularization to balance clustering;

Var\left ( \cdot \right ) denotes variance;

L_{pos} is to ensure the positivity of \textbf{F}.

2.5.3. Transfer Learning for GCN

        ①Pre-training schemes:

2.5.4. Implementation Details

        ①They use fully supervised manner to optimize parameters

        ②They divide corelation into positive and negative networks and perform clustering and graph convolution independently. Then flatten the two to fully connected layer.

        ③Parameter settings:

2.6. Experiment

2.6.1. Databases and Preprocessing

(1)ABIDE

        ①Pipline: Configurable Pipeline for the Analysis of Connectomes (CPAC) with bandpass filtering (0.01–10 Hz) and global signal correction

        ②They finally remain 871 of 1112, with 403 ASD and 468 NC

(2)ADNI

        ①They choose 133 subjects with 99 MCI and 34 AD

pharmaceutical  adj.制药的;配药的;卖药的  n.药物

2.6.2. Evaluating the Effectiveness of our TE-HI-GCN

        ①They regard MCI as negative class and AD as positive class

        ②Network connectivity Feature (NCF): Extracting the upper triangle values in the functional connectivity matrix and flattening them to feature vector. Moreover, they choose GCN as baseline, define transfer learning method as T-HI-GCN and ensemble hi-GCN as E-HI-GCN

        ③Methods comparison in ABIDE:

        ④Methods comparison in ADNI:

(应该是自己又跑过的实验,和hi-GCN自己提供的倒是不太一样但是差不太多。这么看来TE是真的加了好多性能啊...怎么能加到这么多的)

        ⑤⭐GCN in ADNI performs bad, the authors think it do not eliminate interference noise

        ⑥Pre-training always brings better results, reduces overfitting problem, accelerates the training time and improve the generalization capability. Especially when data is limited, pre-training is more useful

        ⑦Full correlation method (PCC) is better than partial correlation method with GLASSO (Sparse inverse covariance):

2.6.3. Experiment on Other Parcellations

        ①They evaluate their model in different atlases such as structural atlases TT, HO and EZ, functional atlases CC200 and DOS160. Then draw a conclusion that CC200 gets the best result

        ②They develop ensemble learning strategy named multi-atlas (MA) method to separately predict then integrate them together

        ③Accuracy of E-HI-GCN in different atlases:

        ④Accuracy of other models in different atlases:

        ⑤Comparison of different models in ACC:

2.6.4. Experiment on HCP Dataset

        ①They eliminate 5 subjects of 1096 in that their frames are less than 1200. Then get 1091 samples with 498 females and 593 males.

        ②Pipeline: minimal processing pipeline of HCP, fMRISurface

        ③Comparison in HCP:

2.7. Interpretability

        ①The score of the p-th subnetwork is calculated as:

Score_p=\frac1{n_p^2}\sum_{i,j\in SN_p\text{and }m(i)\neq m(j)}f_{i,m(i)}f_{j,m(j)}

where n_p is the number of ROI in the p-th subnetwork

        ②The correlation score of two subnetworks are:

CorrScore_{p,q}=\frac1{n_p^2n_q^2}\sum_{i\in SN_p,j\in SN_q\mathrm{and~}m(i)\neq m(j)}f_{i,m(i)}f_{j,m(j)}

        ③"5 Top 3 intra-subnetworks and the top 5 cross inter-subnetworks selected and weights optimized by our model": the three are considered meaningful of analysing ASD

        ④The importance of ROI is as essential as classification accuracy.

        ⑤The choose and evaluate 8 networks, including DMN (default mode network), DAN (dorsal attention), AN (auditory network), CN (core network), SN (salience network), SMN (somato-motor network), VN (visual network), CEN (central executive network)

        ⑥The top-3 subnetworks predicted by E-HI-GCN:

        ⑦The top 5 inter-subnetworks predicted by E-HI-GCN:

endogenous  adj.内源性的;内生的    exogenous  adj.外源性的

dysfunction  n.(关系、行为等的)不正常,异常;机能障碍

posit  vt.假定;假设;认定;认为…为实  n.安置;论断

2.8. Ablation Study and Discussion

2.8.1. The Effectiveness of Transfer Learning

        ①ASD→AD: The GCN module of f-GCN might cause negative transfer(意思就是说ASD里大脑节点的学习直接搬到AD是不行的,虽然有相似性但是很多都不一样,会损害性能)

        ②AD→ASD: it seems better... but the sample of AD is too small

        ③Different pre-training strategies in ABIDE:

        ④Different pre-training strategies in ADNI:

2.8.2. The Effectiveness of Ensemble Learning

        ①Different threshold numbers in E-HI-GCN:

        ②Table of threshold values change:

2.8.3. Comparisons with Prior Works

        ①Comparison in ABIDE:

        ②Comparison in ADNI:

2.9. Conclusion

        They did relatively good in transfer learning

3. 知识补充

3.1. ImageNet framework

(1)ImageNet framework

        ImageNet框架是一个用于计算机视觉任务的工具和算法集合,它包括数据集的下载、预处理、训练和评估等功能。ImageNet框架为研究者提供了一个统一的平台,用于比较和评估各种算法的性能。

此外,ImageNet还提供了一些基准模型,如AlexNet、VGG、GoogLeNet等,这些模型在ImageNet数据集上进行了训练和评估,并被广泛用于计算机视觉领域的研究和开发。

总的来说,ImageNet框架是计算机视觉领域的重要工具和资源,它为研究者提供了便利的数据集和算法工具,促进了计算机视觉技术的不断发展。

(2)文中的预训练

        我哪里知道它用的哪个啊???什么玩意儿啊

3.2. Structual atlas and functional atlas

(1)定义:structural atlases是基于结构的大脑网络包裹图谱,而functional atlases是基于功能的大脑网络包裹图谱。

(2)基础:structural atlases的基础是大脑的结构信息,而functional atlases的基础是大脑的功能信息。

(3)特点:structural atlases的特点是具有较高的空间分辨率,可以清晰地显示大脑的结构细节。而functional atlases的特点是具有较高的时间分辨率,可以清晰地显示大脑的功能活动。

3.3. Networks of brain

        ①突显网络(Salience Network):负责检测环境中的突出和显著刺激,帮助大脑过滤和选择重要的信息。

        ②听觉网络(Auditory Network):处理与听觉相关的信息,包括声音的识别、定位和理解。

        ③基底神经节网络(Basal Ganglia Network):参与运动控制和习惯形成,与奖赏和动机有关。

        ④高级视觉网络(High-level Visual Network):处理复杂的视觉信息,如物体和场景的识别。

        ⑤视觉空间网络(Visuospatial Network):处理与空间和视觉导航相关的信息,如方向感和空间记忆。

        ⑥默认模式网络(Default Mode Network):在休息状态下活跃,参与自我反思、自传体记忆和社交认知等。

        ⑦语言网络(Language Network):处理与语言相关的信息,包括语音、语法和语义的理解与产生。

        ⑧执行网络(Executive Network):负责高级认知功能,如决策、计划、问题解决和认知灵活性。

        ⑨楔前叶网络(Precuneus Network):参与视觉空间处理、情景记忆和自我相关的认知活动。

        ⑩感觉运动网络(Sensorimotor Network):负责感知物理输入,并将其转换为在整个脑网络中传播的电子信号,然后启动物理反应。

这些网络在大脑中相互连接和交互,共同实现各种复杂的认知和行为功能。不同的任务和信息会在不同的网络中产生特定的活动模式,这些活动模式反映了大脑如何处理、整合和响应外部输入的信息。

3.4. Transfer learning

(1)此处讨论迁移学习模型和一个模型用在两个数据集上有什么差别

在迁移学习中,模型的参数在第二个数据集中是否会改变,取决于多个因素,包括迁移学习策略、模型结构、数据集的特性等。

一般来说,迁移学习的目的是利用在源领域学到的知识来帮助目标领域的学习。因此,在迁移学习过程中,通常会冻结源领域模型的某些层,只训练目标领域模型的最后几层(通常是全连接层),以避免过拟合并加快训练速度。在这种情况下,模型的参数主要是在全连接层上进行微调,而冻结的层保持不变。

然而,如果目标数据集与源数据集存在较大差异,或者目标任务与源任务不完全相同,那么可能需要更复杂的迁移学习策略。例如,可能需要解冻并微调一些之前冻结的层,或者训练一个新的模型来适应目标数据集和任务。在这种情况下,模型的参数可能会在第二个数据集中发生改变。

此外,一些迁移学习策略(如Extract Feature Vector)可能会在源模型的基础上,训练一个新的模型来适应目标数据集和任务。在这种情况下,模型的参数也会在第二个数据集中发生改变。

总之,迁移学习的参数在第二个数据集中是否会改变,取决于迁移学习策略、模型结构、数据集的特性等多个因素。需要根据具体情况进行判断和调整。

4. Reference List

Li L. et al. (2021) 'TE-HI-GCN: An Ensemble of Transfer Hierarchical Graph Convolutional Networks for Disorder Diagnosis', Neuroinformatics, 20, pp. 353-375. 

  • 18
    点赞
  • 28
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值