[论文精读]BrainVGAE: End-to-End Graph Neural Networks for Noisy fMRI Dataset

最新推荐文章于 2024-06-15 11:56:23 发布

夏莉莉iy

最新推荐文章于 2024-06-15 11:56:23 发布

阅读量990

点赞数 27

分类专栏：论文精读文章标签：人工智能计算机视觉深度学习机器学习图论分类笔记

本文链接：https://blog.csdn.net/sherlily/article/details/137026279

版权

论文精读专栏收录该内容

53 篇文章 8 订阅

订阅专栏

论文网址：BrainVGAE: End-to-End Graph Neural Networks for Noisy fMRI Dataset | IEEE Conference Publication | IEEE Xplore

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

2.3.1. Deep Neural Networks (DNNs) for fRMI analysis

2.3.2. Variational Graph Auto-Encoders (VGAE)

2.4. Brainvgae

2.4.1. Preliminaries

2.4.2. BrainVGAE

2.5. Experimental Setup and Results

2.5.1. ABIDE Dataset

2.5.2. Experimental Setup

3.2. Top-k sparsification

3.3. Graph Diffuse Convolution

4. Reference List

1. 省流版

1.1. 心得

（1）完了疑问都夹在笔记里面了啊

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

①People pays little attention on node connectivity (edges), which causing performance decreasing in dealing with noisy dataset

②They designed a Variational Graph Auto-Encoders (VGAE)-based end-to-end framework

2.2. Introduction

①By adopting the time series as feature, deep learning (DL) and machine learning (ML) can be used to classify patients and healthy controls (HC). Also, modelling brain as graph can also used for classification

②⭐作者直接否认了选择绝对值top 10%节点的做法，但没有解释原因，只是说“可能会损害嘈杂数据集的性能”

③A plug-and-play edge predictor, BrainVGAE:

（作者说左边是输入图。为什么是不对称的，是有向图吗？后面说根据概率添加或删除边，得到了“邻接矩阵”，仍然是不对称的。所以是有向？？）

④“我们建议 BrainVGAE 可以用作基于 GNN 的 fMRI 分析的通用框架，而不是试图超越 SOTA。”

2.3. Related Work

2.3.1. Deep Neural Networks (DNNs) for fRMI analysis

①CNN, RNN, AE are all be used in fMRI analysis

②They adopt each ROI as node

2.3.2. Variational Graph Auto-Encoders (VGAE)

①Edge augmentation enhances the performance on node classification tasks

②Do not change the number of "connectives" of each node（什么意思啊，边数目？上面那个图明明改了啊。对，下面说了边集合就是连接集合。你意思每个图的总数不变？有必要吗）

2.4. Brainvgae

2.4.1. Preliminaries

（1）Brain-to-Graph construction

①Atlas: e.g. CC200（你都无向图了你到底为什么不对称啊！！）

②Defining the graph $\mathcal{G}=(V,\mathcal{E})$ , where $N$ denotes the number of nodes

③ $\mathrm{X}\in\mathbb{R}^{N\times F}$ is the feature matrix, $F$ denotes the feature dimension

④ $\mathrm{A}\in\{0,1\}^{N\times N}$ denotes adjacency matrix

⑤Node feature: Pearson correlation

⑥Edge feature: partial correlation among vertices

（2）Graph Convolutional Network (GCN)

①They choose GCN as the GNN:

$\begin{aligned} \mathbf{H}^{(l+1)}& =\mathrm{GCN}\left(\mathbf{H}^{(l)},\mathbf{A}\right) \\ &=\mathrm{ReLU}\left(\tilde{\mathbf{D}}^{-\frac12}\tilde{\mathbf{AD}}^{-\frac12}\mathbf{H}^{(l)}\mathbf{W}^{(l)}\right) \end{aligned}$

2.4.2. BrainVGAE

Overall framework:

（1）VGAE-based Edge Predictor

①Given original adjacency matrix $\mathbf{A}$ and feature matrix $\mathbf{H}$ , the outputs construct the prediction result:

$\mathbf{A}^{pred}=\sigma(\mathbf{Z}\mathbf{Z}^T)$

where the latent variable:

$\mathbf{Z}=\mathrm{GCN}((\mathrm{GCN}(\mathbf{X},\mathbf{A})),\mathbf{A})$

②They apply Binary Cross Entropy (BCE) loss on edge prediction loss:

$\mathcal{L}_{ep}=\text{BCE}(\mathbf{A}^{pred},\mathbf{A})$

⭐which can eliminate potential distribution shift

③"The predicted inter-ROIs connectivity $\mathbf{A}^{pred}$ is then further sampled to improve out-of-distribution robustness:"（为什么邻接矩阵也要pred？）

$\mathbf{A}_m=f_{sampling}(\mathbf{A}^{pred},\mathbf{A})$

④⭐The number of edges need to be added or removed is a hyperparameter. Also, the edge number of each ROI keeps the same

⑤Training edge predictor by unsupervised method:

$\mathcal{L}_{pre}=\mathbb{E}_{q(\mathbf{Z}\mid\mathbf{X},\mathbf{A})}[\log p(\mathbf{A}\mid\mathbf{Z})]-D_{KL}[q(\mathbf{Z}\mid\mathbf{X},\mathbf{A})\|p(\mathbf{Z})]$

where $p(\mathbf{Z})$ denotes the Gaussian prior $\mathcal{N}(0,I)$ ;

$p(\mathbf{A}\mid\mathbf{Z})$ represents the adjacency matrix likelihood conditioned on the latent variable（可能前面有Log吧）

（2）GNN-based Feature Extractor

①Convolutional layer: GCN

②Node pooling layer: top-k pooling with $k=0.5$ . What is more, the pooling loss can be:

$\mathcal{L}_{pool}^{(l)}=\mathrm{BCE}(s_k^{(l)},s_N^{(l)})$

where $s_k$ denotes the total score of $k$ remaining nodes;

$s_N$ denotes the total nodes at $l$ -th layer

③Applying global average pooling and global max pooling in readout layer:

$\mathbf{z}^{(l)}=\max H^{(l)}\|\max H^{(l)}$

and the final aggregation vector is sumed up by all the readout layer:

$\mathbf{z}=\sum_{l=1}^{L}\mathbf{z}^{(l)}$

④Overview: 2 conv-pool-read layers

⑤Schematic feature extractor:

（3）Label Classifier

①The label prediction loss:

$\mathcal{L}_{label}=\mathrm{BCE}(\mathbf{y},\hat{\mathbf{y}})$

②The final weighted sum loss:

$\mathcal{L}=\mathcal{L}_{label}+\lambda_1\mathcal{L}_{ep}+\lambda_2\sum_{l=1}^{L}\mathcal{L}_{pool}^{(l)}$

2.5. Experimental Setup and Results

2.5.1. ABIDE Dataset

①Dataset: ABIDE I

②Samples: 1035

2.5.2. Experimental Setup

（1）Graph Construction

①Atlas: Craddock 200

②⭐Graph sparsification: top-k sparsification by Graph Diffuse Convolution with $k=20$ (top 10% nodes)

（2）Experimental Setup

①Dimension:

	0-th layer	1-th layer
VGAE	64×200	200×64
GCN-feature extractor	64×200	64×64

②MLP dimension: 512→1024→2

③Loss coefficients: $\lambda _1=0.3,\lambda _2=0.1$

④Epoch: 500 with patience of 100

⑤Learning rate:0.001, halving in every 20 epochs

⑥Weight decay:0.0005

⑦Batch size: 100（好难得见到不是2的倍数的）

⑧Epoch of unsupervised edge predictor: 100 with learning rate 0.0005

2.5.3. Results

①Comparison table:

2.6. Conclusion

They proposed a novel framework BrainVGAE

3. 知识补充

3.1. Margin Sampling

（1）定义：

Margin Sampling（边界采样）是一种主动学习策略，主要用于选择模型分类边界附近的样本进行标记。这些样本对于模型来说更具挑战性，因为它们位于不同类别之间的边界区域，模型对这些样本的分类往往具有较高的不确定性。通过选择这些样本进行标记并加入训练集，可以帮助模型更好地学习分类边界，从而提高模型的分类性能。

具体来说，Margin Sampling通常通过计算模型对每个未标记样本的预测概率或得分来实现。然后，选择那些预测概率或得分最接近模型分类阈值（通常是0.5）的样本，这些样本就是位于分类边界附近的样本。将它们标记并加入训练集后，模型可以通过学习这些样本的特征来更好地确定分类边界。

在实际应用中，Margin Sampling可以与其他主动学习策略结合使用，以实现更高效的样本选择。例如，可以先使用不确定性采样策略选择一批具有高不确定性的样本，然后在这些样本中进一步使用Margin Sampling来选择位于分类边界附近的样本进行标记。这样可以确保选择的样本既具有代表性又具有挑战性，从而提高模型的训练效果。

需要注意的是，Margin Sampling的效果受到模型性能和数据分布的影响。如果模型本身性能较差或数据分布不均匀，可能会导致选择的样本不具有代表性或过于集中在某些特定区域。因此，在使用Margin Sampling时，需要结合实际情况进行调整和优化，以确保其有效性。

（2）公式：

$x_M^*=argmin_x(P_\theta(\hat{y}_1|x)-P_\theta(\hat{y}_2|x))$