[论文精读]TE-HI-GCN: An Ensemble of Transfer Hierarchical Graph Convolutional Networks for Disorder

③⭐Both over correlation and PCC will cause overfitting and increasing computational complexity. Thus, deleting some irrelevant or weak connections is surely necessary

（2）Challenge 2: Limited labeled training data.

①Labeled trainable data in clinic application is limited

（3）Challenge 3: Depth limitation of GCN learning.

②⭐Deep GCN might cause over smoothing, namely features converging to the same value. Hence, the layers of GCN best does not beyond 3 or 4

（4）Their contributions:

①Removing noisy correlations in the brain network: they think multi-scale method could eliminate weak connections (unnecessary). Then they apply multi-graph clustering (MGC) to increase essential connections

②Exploiting the association within the subjects for GCN: the same with hi-GCN

③Transfer learning across the relevant disorder domains: usually adopting ImageNet framework to get pre-trained weights, then fine-tune it with tasks. Also, previous researches all prove there are corrlation between AD and ASD（都是脑子的病是吧，多少都有关联啊，这样我还能说肺癌和肺结核和肺炎都有关联呢）. They utilize the commonality of two domains（哪两个领域啊）to generelize general graph structure features. Additonally, to avoid negative transferring, they adopt transfer learning in original graphs' multiple levels of topological structure.

（5）They put forward an Ensemble of Transfer HIerarchical Graph Convolutional Networks, called TE-HI-GCN. And their model is general, accurate, interpretable, robust

etiology n.病因;病因学，病原学，致病源

prognosis n.预后;预测;(对病情的)预断;展望;预言

2.3. Related Work

They briefly introduced some other relevant works

2.3.1. Preliminaries of Graph Convolutional Networks (GCN)

①They define a set $N_i=\left \{ R_i,A_i \right \}$ ,

where $R_i=\left \{ r^1_i,...,r^M_i \right \}$ is $M$ nodes;

$A_i\in R^{M\times M}$ is the adjacency matrix;

$M$ denotes the number of nodes, they choose $M=116$ in their article;

AGAIN! They initialize $R_i$ to 1（所以到底是为什么？？怎么又出现了，它不是一个集合吗怎么是1的啊，是那个什么内积值吗）

②The set of graphs is $\left \{ N_1,...,N_D \right \}$ and $D$ represents the number of subjects. Each subject has a lable $y_i$

③The convolution of GCN is:

$\boldsymbol{E}^{(l+1)}=\mathbf{ReLu}(\boldsymbol{\tilde{D}}^{-1/2}\boldsymbol{\tilde{A}\tilde{D}}^{-1/2}\boldsymbol{E}^{(l)}\boldsymbol{W}^{(l)})$

where $\tilde{\boldsymbol{A}}=\boldsymbol{A}+\boldsymbol{I}_n\mathrm{~,}\tilde{\boldsymbol{D}}_{ii}=\sum_j\tilde{\boldsymbol{A}}_{ij}$ ;

$W$ is a trainable weight matrirx;

$\boldsymbol{E}^{(l+1)}$ denotes node embeddings after $l$ steps of GCN

④Each GCN layer follows a ReLU

2.4. An Ensemble of Transfer Hierarchical Graph Convolutional Networks for Disorder Diagnosis, TE-HI-GCN

2.4.1. Overview of Network Architecture of TE-HI-GCN

①The framework of TE-HI-GCN

it combines HI-GCN and transfer learning units

②Different threshold brings different topology, then they define the threshold in their article is $\left [ 0.05,0.5 \right ]$

2.4.2. Sparse Brain Networks Construction

①Extracting time series form fMRI (ROI=116)→normalized to zero mean and unit variance→calculating Pearson’s correlation coefficient (PCC)

②Construction of functional connectivity:

（不是大哥你直接偷人家图吗）

③The setting of threshold:

$Q\left ( r_i,r_j \right )=\left\{\begin{matrix} Q\left ( r_i,r_j \right ),if\, \, \, \, \, Q\left ( r_i,r_j \right )\geq \tau \\ 0\, \, \, \, \, \, \,\, \, \, \, \, \, \, \, \, \, \, \, ,else\, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \end{matrix}\right.$

2.5. Hierarchical GCN

①They firstly introduced the theory of hi-GCN, the broad function of hi-GCN

②Moreover, a new graph (not so new hhh) is attached below:

2.5.1. f-GCN: Learning the Brain Network Embedding with MGC

①They apply multi-graph clustering (MGC), which use supergraph, to cluster features. They change the traditional approximation $A\approx F^T\boldsymbol{A}^s\boldsymbol{F}$ to $\mathcal{A}^s=\mathcal{F}\mathcal{A}\mathcal{F}^T$ ,

where $\mathcal{F}\in\mathbb{R}^{M\times C}$ ;

$\boldsymbol{A}^s\in\mathbb{R}^{C\times C}$ and it is symmetric;

$C$ denotes the cluster number;

$\mathcal{A}^s$ denotes a weighted adjacency matrix of a supergraph;

$\left \{ S_1,S_2,...,S_C \right \}$ is the set of supernodes（说实话这一段没有很看懂啦哈哈哈，超级节点是超图的内容吗，但是这玩意儿怎么表示呢，三维矩阵？）

②They stacked 3 convolutional layers

③“与将相似节点分组的传统聚类不同，MGC 的目的是通过将噪声连接分组到一个超级节点中来隐藏噪声连接，从而突出显示连接超级节点的指示性边缘。换言之，跨不同集群连接节点的功能连接权重增强，而集群内的节点及其连接被移除。”（我翻译的）

④Most of the tasks separate clustering and classification, but they want to combine the two progresses.

2.5.2. p-GCN: Learning the Brain Network Embedding with Subject Correlation

①They use samples and similarity between samples to construct new graph

②Graph kernel adopts Gram Matrix, and RBF kernel function is:

$S_I(\boldsymbol{N}_i,\boldsymbol{N}_j)=\frac{\sum_{a=1}^M\sum_{b=1}^Mw_a^iw_b^j\mathcal{K}(\boldsymbol{x}_a^i,\boldsymbol{x}_b^j)}{\sum_M^{n_i}w_a^i\sum_{b=1}^Mw_b^j}$

where $w_{a}^{i}=\frac{1}{\sum_{u=1}^{M}\mathcal{K}(\boldsymbol{x}_{a}^{i},\boldsymbol{x}_{u}^{i})}$

③The final loss function is:

$\min_{\boldsymbol{W}_f,\boldsymbol{W}_p,\mathcal{F}}\boldsymbol{L}=\boldsymbol{L}_{CE}(\boldsymbol{W}_f,\boldsymbol{W}_p,\boldsymbol{F})+\lambda_1\boldsymbol{L}_{otho}(\boldsymbol{F})+\lambda_2\boldsymbol{L}_{bal}(\boldsymbol{F})+\lambda_3\boldsymbol{L}_{pos}(\boldsymbol{F})$

where $\boldsymbol{W}_f$ and $\boldsymbol{W}_p$ represent the weight parameters of f-GCN and p-GCN respectively;

$\lambda_1$ , $\lambda_2$ , $\lambda_3$ denotes positive parameters;

$\boldsymbol{L}_{otho}=\left\|\boldsymbol{F}^T\boldsymbol{F}-diag(diag(\boldsymbol{F}^T\boldsymbol{F}))\right\|_F$ is orthogonal regularization, which is for penalizing the off-diagonal elements;

$\boldsymbol{L}_{bal}=Var(diag(\boldsymbol{F}^T\boldsymbol{F}))$ is balancing regularization to balance clustering;

$Var\left ( \cdot \right )$ denotes variance;

$L_{pos}$ is to ensure the positivity of $\textbf{F}$ .

2.5.3. Transfer Learning for GCN

①Pre-training schemes:

2.5.4. Implementation Details

①They use fully supervised manner to optimize parameters

②They divide corelation into positive and negative networks and perform clustering and graph convolution independently. Then flatten the two to fully connected layer.

③Parameter settings:

2.6. Experiment

2.6.1. Databases and Preprocessing

（1）ABIDE

①Pipline: Configurable Pipeline for the Analysis of Connectomes (CPAC) with bandpass filtering (0.01–10 Hz) and global signal correction

②They finally remain 871 of 1112, with 403 ASD and 468 NC

（2）ADNI

①They choose 133 subjects with 99 MCI and 34 AD

pharmaceutical adj.制药的;配药的;卖药的 n.药物

2.6.2. Evaluating the Effectiveness of our TE-HI-GCN

①They regard MCI as negative class and AD as positive class

②Network connectivity Feature (NCF): Extracting the upper triangle values in the functional connectivity matrix and flattening them to feature vector. Moreover, they choose GCN as baseline, define transfer learning method as T-HI-GCN and ensemble hi-GCN as E-HI-GCN

③Methods comparison in ABIDE:

④Methods comparison in ADNI:

（应该是自己又跑过的实验，和hi-GCN自己提供的倒是不太一样但是差不太多。这么看来TE是真的加了好多性能啊...怎么能加到这么多的）

⑤⭐GCN in ADNI performs bad, the authors think it do not eliminate interference noise

⑥Pre-training always brings better results, reduces overfitting problem, accelerates the training time and improve the generalization capability. Especially when data is limited, pre-training is more useful

⑦Full correlation method (PCC) is better than partial correlation method with GLASSO (Sparse inverse covariance):

2.6.3. Experiment on Other Parcellations

①They evaluate their model in different atlases such as structural atlases TT, HO and EZ, functional atlases CC200 and DOS160. Then draw a conclusion that CC200 gets the best result

②They develop ensemble learning strategy named multi-atlas (MA) method to separately predict then integrate them together

③Accuracy of E-HI-GCN in different atlases:

④Accuracy of other models in different atlases:

⑤Comparison of different models in ACC:

2.6.4. Experiment on HCP Dataset

①They eliminate 5 subjects of 1096 in that their frames are less than 1200. Then get 1091 samples with 498 females and 593 males.

②Pipeline: minimal processing pipeline of HCP, fMRISurface

③Comparison in HCP:

2.7. Interpretability

①The score of the p-th subnetwork is calculated as:

$Score_p=\frac1{n_p^2}\sum_{i,j\in SN_p\text{and }m(i)\neq m(j)}f_{i,m(i)}f_{j,m(j)}$

where $n_p$ is the number of ROI in the p-th subnetwork

②The correlation score of two subnetworks are:

$CorrScore_{p,q}=\frac1{n_p^2n_q^2}\sum_{i\in SN_p,j\in SN_q\mathrm{and~}m(i)\neq m(j)}f_{i,m(i)}f_{j,m(j)}$

③"5 Top 3 intra-subnetworks and the top 5 cross inter-subnetworks selected and weights optimized by our model": the three are considered meaningful of analysing ASD

④The importance of ROI is as essential as classification accuracy.

⑤The choose and evaluate 8 networks, including DMN (default mode network), DAN (dorsal attention), AN (auditory network), CN (core network), SN (salience network), SMN (somato-motor network), VN (visual network), CEN (central executive network)

⑥The top-3 subnetworks predicted by E-HI-GCN:

⑦The top 5 inter-subnetworks predicted by E-HI-GCN:

endogenous adj.内源性的;内生的 exogenous adj.外源性的

dysfunction n.(关系、行为等的)不正常，异常;机能障碍

posit vt.假定;假设;认定;认为…为实 n.安置;论断

2.8. Ablation Study and Discussion

2.8.1. The Effectiveness of Transfer Learning

①ASD→AD: The GCN module of f-GCN might cause negative transfer（意思就是说ASD里大脑节点的学习直接搬到AD是不行的，虽然有相似性但是很多都不一样，会损害性能）

②AD→ASD: it seems better... but the sample of AD is too small

③Different pre-training strategies in ABIDE:

④Different pre-training strategies in ADNI:

2.8.2. The Effectiveness of Ensemble Learning

①Different threshold numbers in E-HI-GCN:

②Table of threshold values change:

2.8.3. Comparisons with Prior Works

①Comparison in ABIDE:

②Comparison in ADNI:

2.9. Conclusion

They did relatively good in transfer learning

3. 知识补充

3.1. ImageNet framework

（1）ImageNet framework

ImageNet框架是一个用于计算机视觉任务的工具和算法集合，它包括数据集的下载、预处理、训练和评估等功能。ImageNet框架为研究者提供了一个统一的平台，用于比较和评估各种算法的性能。

此外，ImageNet还提供了一些基准模型，如AlexNet、VGG、GoogLeNet等，这些模型在ImageNet数据集上进行了训练和评估，并被广泛用于计算机视觉领域的研究和开发。

总的来说，ImageNet框架是计算机视觉领域的重要工具和资源，它为研究者提供了便利的数据集和算法工具，促进了计算机视觉技术的不断发展。

（2）文中的预训练

我哪里知道它用的哪个啊？？？什么玩意儿啊

3.2. Structual atlas and functional atlas

（1）定义：structural atlases是基于结构的大脑网络包裹图谱，而functional atlases是基于功能的大脑网络包裹图谱。

（2）基础：structural atlases的基础是大脑的结构信息，而functional atlases的基础是大脑的功能信息。

（3）特点：structural atlases的特点是具有较高的空间分辨率，可以清晰地显示大脑的结构细节。而functional atlases的特点是具有较高的时间分辨率，可以清晰地显示大脑的功能活动。

3.3. Networks of brain

①突显网络（Salience Network）：负责检测环境中的突出和显著刺激，帮助大脑过滤和选择重要的信息。

②听觉网络（Auditory Network）：处理与听觉相关的信息，包括声音的识别、定位和理解。

③基底神经节网络（Basal Ganglia Network）：参与运动控制和习惯形成，与奖赏和动机有关。

④高级视觉网络（High-level Visual Network）：处理复杂的视觉信息，如物体和场景的识别。

⑤视觉空间网络（Visuospatial Network）：处理与空间和视觉导航相关的信息，如方向感和空间记忆。

⑥默认模式网络（Default Mode Network）：在休息状态下活跃，参与自我反思、自传体记忆和社交认知等。

⑦语言网络（Language Network）：处理与语言相关的信息，包括语音、语法和语义的理解与产生。

⑧执行网络（Executive Network）：负责高级认知功能，如决策、计划、问题解决和认知灵活性。

⑨楔前叶网络（Precuneus Network）：参与视觉空间处理、情景记忆和自我相关的认知活动。

⑩感觉运动网络（Sensorimotor Network）：负责感知物理输入，并将其转换为在整个脑网络中传播的电子信号，然后启动物理反应。

这些网络在大脑中相互连接和交互，共同实现各种复杂的认知和行为功能。不同的任务和信息会在不同的网络中产生特定的活动模式，这些活动模式反映了大脑如何处理、整合和响应外部输入的信息。

3.4. Transfer learning

（1）此处讨论迁移学习模型和一个模型用在两个数据集上有什么差别

在迁移学习中，模型的参数在第二个数据集中是否会改变，取决于多个因素，包括迁移学习策略、模型结构、数据集的特性等。

一般来说，迁移学习的目的是利用在源领域学到的知识来帮助目标领域的学习。因此，在迁移学习过程中，通常会冻结源领域模型的某些层，只训练目标领域模型的最后几层（通常是全连接层），以避免过拟合并加快训练速度。在这种情况下，模型的参数主要是在全连接层上进行微调，而冻结的层保持不变。

然而，如果目标数据集与源数据集存在较大差异，或者目标任务与源任务不完全相同，那么可能需要更复杂的迁移学习策略。例如，可能需要解冻并微调一些之前冻结的层，或者训练一个新的模型来适应目标数据集和任务。在这种情况下，模型的参数可能会在第二个数据集中发生改变。

此外，一些迁移学习策略（如Extract Feature Vector）可能会在源模型的基础上，训练一个新的模型来适应目标数据集和任务。在这种情况下，模型的参数也会在第二个数据集中发生改变。

总之，迁移学习的参数在第二个数据集中是否会改变，取决于迁移学习策略、模型结构、数据集的特性等多个因素。需要根据具体情况进行判断和调整。

4. Reference List

Li L. et al. (2021) 'TE-HI-GCN: An Ensemble of Transfer Hierarchical Graph Convolutional Networks for Disorder Diagnosis', Neuroinformatics, 20, pp. 353-375.