[论文精读]Graph-in-Graph (GiG): Learning interpretable latent graphs in non-Euclidean domain for bio-_euclidean space is insufficient for comprehensivel-CSDN博客

③Brain graph classification (multi-graph/multi-modal classification): MICNet, GNN selection (RG-Select), DMBN, rs-fMRI-GIN, Graph Isomorphism Network (GIN), BrainNetCNN, ElasticNet, ST-GCN, DECENNT

（3）Molecular toxicity prediction (drug discovery)

①DNNs

②Tree-based ensemble methods

③Simplified Molecular-Input Line-Entry (SMILES)

④Bridging works

⑤Graph Multiset Transformer (GMT)

（4）GCNNs in different areas:

①Social sciences

②CV and CG

③Physical

④Medical/biological sciences

（5）Methods of GiG:

①Fully inductive

②Not neccesary for test data

toxicology n.毒理学;毒物学 enzyme n.酶 proliferation n.增殖;激增;涌现;大量的事物

2.4. Method

There are $N$ graphs, $G=G_{1},G_{2},...,G_{N}$ and each $G_{i}=\left ( V_{i},E_{i},X_{i} \right )$ ;

$V$ denotes node/verticle, $E$ denotes edge, $X_{i}\in \mathbb{R}^{\left \| V_{i}\times D \right \|}$ denotes node feature matrix, $D$ denotes the number of features;

the output for each graph is a vector $p\in \mathbb{R}^{C}$ , which indicates the propability of prediction;

$C$ denotes the number of possible classes;

2.4.1. Graph-in-graph model

GiG framework:

where $F_{1}$ caculates graph features, $F_{2}$ learns latent connections between graphs, $F_{3}$ combines $F_{1}$ and $F_{2}$ to predict

（1）Node-level module F_1

①Graph feature vector $h_{i}=F_{1}\left ( G_{i} \right )$ , where $F_{1}$ denotes GCNN and pooling operators

②Graph feature matrix $h=\left [ h_{1},...,h_{N} \right ]\in \mathbb{R}^{N\times H}$ （我倒是一直没有get到这种玩意儿，文中说 $h_{i}$ 是1*H维的，那不是行向量吗， $h$ 又横向拼接行向量吗？我这里竖着拼接了，我不知道对不对我现在还没看到代码。我猜1*H是列向量，因为下面的N维是行向量）

（2）Population-level module F_2

①The input of $F_{2}$ is the output of $F_{1}$

②Function in this layer: $A_{p}=F_{2}\left ( h \right )$ ,

where each $a_{ij}=\frac{1}{1+e^{-t\left \| \tilde{h}_{i}-\tilde{h}_{j} \right \|_{2}+\theta }}$ , $\tilde{h}_{i}=MLP\left ( h_{i} \right )$ , $\theta$ and $t$ are learnable soft-threshold and temperature parameters

③ $A_{p}\in \left ( 0,1 \right )^{N\times N}$ represents the weighted adjacency matrix

（3）GNN classifier F_3

①The final function:

$p=F_{3}\left ( h,A_{p} \right )=\left [ p_{1},...,p_{N} \right ]\\\\=shared\,\, weights\, \, and\, \, ReLU(GCNs\left ( h,A_{p} \right ))$

（这我自己写的，他竟然只有语言描述啊）

2.4.2. Node degree distribution loss (NDDL)

①They proposed a regularizer, Node Degree Distribution Loss (NDDL) based on Kullback–Leibler divergence between computed degree distribution and target distribution (Gaussian distribution is chosen). Divergence of value between LGL and LGL+NDD:

②The overall steps:

③ $A$ denotes a adjacency matrix of undirected graph and $A_{p}$ denotes a weighted fully connected graph. They only retain edges with which weights are greater than 0.5:

$\bar{A}=A_{p}\, with\, \left ( A_{p}> 0.5 \right )\in \mathbb{R}^{N\times N}$

where each node degree vector in it (sum of each row combines a new row vector):

$\bar{d}_{i}=\sum_{i=1}^{N}\bar{A}_{i,j}\in \mathbb{R}^{N}$

④Computing the soft assignment matrix $S$ :

$S_{i,j}=\frac{e^{\frac{-\Delta_{i,j}^2}{\sigma^2}}}{\sum_ke^{\frac{\Delta_{k,j}^2}{\sigma^2}}}$

where $\sigma$ is a hyperparameter which set to 0.6;

$\Delta_{i,j}=c_i-\bar{d}_j$ ;

$c_{i},i\in \left \{ 1,...,N \right \}$ denotes possible degree

⑤Node degree distribution $q=\left [ q_{1},...,q_{N} \right ]$ is calculated by:

$q_i=\frac{\sum_jS_{i,j}}{\sum_{k,j}S_{k,j}}$

（他这个下面表达式我觉得有点小问题啊，算出来其实应该是两个Σ，因为他明显是把矩阵所有值求和了。还是说这样表述在数学里面是对的呢？）

⑥The final DNNL step:

$DNNL=D_{KL}\left ( q,r \right )$

where $r$ denotes target normal discrete distribution with learnable parameters

⑦The loss function:

$loss=CE_{loss}+\alpha NDDL$

where $CE_{loss}=-\sum_{c=1}^Clabel_c*log\left(p_c\right)$ is the cross entropy loss;

$label_{c}$ represents the class membership indicator (1 represents belonging to, 0 is not);

$p_{c}$ denotes the probability of predicted class

2.5. Experiments and results

（1）Dataset information

①Statistics of dataset:

②Class distribution:

（2）Experiment settings

①They test GiG in biological, medical and chemical domains

②Each sample is a graph

③They test 2 variants, LGL and LGL+NDD

2.5.1. Datasets

①Predict sex from brain fMRI data in the Human Connectome Project (HCP)

②Classify proteins as enzymes or non-enzymes in PROTEINS

③Predict binary value 0/1 as not active/active of toxicity

（这里我没有写很详细，作者其实写了很多）

2.5.2. Implementation details

（1）Settings

①Optimizer: Adam

②Activation: ReLU, except KNN in HCP which adopts Sigmoid following with batch norm

③DL framework: PyTorch and PyTorch Geometric

（2）Dataset split

①Training set is 72%, validation set is 8%, test set is 20% in HCP

②90% training set and 10% test set in PROTEINS_29, and training set is divided into training and validation sets through 10-folds

③Predefined scaffold splits are used for Tox21

（3）Importance of batch size

①They tested 1 batch and all batches

②Larger batch brings better performance, hence they suggest to combine test and traning samples

scaffold n.脚手架;断头台; 鹰架;绞刑架;建筑架

2.5.3. Quantitative results

①Comparison of classifying models:

②Comparison of recent models on HCP:

where G is GroupICA, S is Shaefer, M is multi-modal parcellation

③Comparison of different folds:

④They test the relationship between number of test sets and accuracy:

where triangles denotes mean value and the data comes from PROTEINS_3

⑤Distribution of degrees before and after adopting 0.5 threshold:

where the x-axis is the node degree, and the y-axis denotes occurrence of nodes in its x-axis degree in PROTEIN_29

2.5.4. Knowledge discovery analysis

①Population graphs comparison in HCP, with red outline representing misclassified:

it is easy to see GLG+NDD clusters better

②Population graphs comparison in PROTEINS_29:

where threshold for LGL is 0.01, for LGL+NDD is 0.5. In addition, (b) and (d) implement CATH classes, i.e. classifying "mainly belongs to a", "mainly belongs to b", "combination of a and b"

③Population graphs comparison of GiG LGL+NDD in different datasets:

where threshold for LGL+NDD is 0.5

④The influence of $\theta$ where the first row is LGL and the second row is LGL+NDD on PROTEINS_3:

⑤

⑥Population graph evaluation:

⑦Evaluation of learned $\theta$ :

⑧Hyperparameters optimizing with "bs" represents batch size, "k" represents number of KNN graph, "S" represents scheduler, DECL denotes DynamicEdgeConv(ReLu(Linear(2*–,–))), "-" denotes dimension from the previous or subsequent layer, "P" represents Reduce Learning Rate on Plateau, "C" represents Cosine Annealing

⑨Hyperparameters selection ranges

2.6. Discussion

①They designed two learnable parameters: $temp$ and $\theta$ , which $\theta$ can significantly influence the classifying results

②Proper input-graph representations also greatly impact

③Limitations: 1) target distribution in NDDL, 2) different $F_{1}$ and $F_{2}$

2.7. Conclusion

They proposed a graph structure learning method includes node-level, population-level, and GCN classifier

3. 知识补充

3.1. Upstream and downstream

（1）Upstream tasks mainly represent pre-training

（2）Downstream tasks usually denote the rest model part

3.2. Soft threshold

相关链接：软阈值(Soft Thresholding)函数解读-CSDN博客

3.3. Soft assignment and hard assignment

（1）Soft assignment: only presents the probability of classifying

（2）Hard assignment: gives the specific cluster of one data

3.4. CATH

（1）Explanation: Class(C), Architecture(A), Topology(T) and Homologous superfamily (H) of protein

（2）CATH classes:

4. Reference List

Orengo, C. et al. (1997) 'CATH – a hierarchic classification of protein domain structures', Structure, vol. 5, issue 8, pp. 1093-1109. doi: Redirecting

Zaripova, K. et al. (2023) 'Graph-in-Graph (GiG): Learning interpretable latent graphs in non-Euclidean domain for biological and healthcare applications', Medical Image Analysis, vol. 88, 102839. doi: Redirecting