[论文精读]Spatio-temporal directed acyclic graph learning with attention mechanisms on brain functional

夏莉莉iy

已于 2023-11-10 13:32:08 修改

阅读量311

点赞数

分类专栏：论文精读文章标签：人工智能深度学习机器学习学习笔记图论

于 2023-11-08 15:10:02 首次发布

本文链接：https://blog.csdn.net/Sherlily/article/details/134287930

版权

论文精读专栏收录该内容

152 篇文章

订阅专栏

本文提出了一种结合了注意力机制的时空导向无环图模型（ST-DAG-Att），用于分析脑功能磁共振成像数据中的信号和连接性。模型利用深度学习方法，如ST-graph-conv和FC-conv，结合功能连接性注意力池化，对青少年认知发展和年龄预测任务进行了有效预测。研究结果表明，该模型在预测精度上优于其他模型，如BrainNetCNN和SVR。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

论文网址：Spatio-temporal directed acyclic graph learning with attention mechanisms on brain functional time series and connectivity - ScienceDirect

论文全名：Spatio-temporal directed acyclic graph learning with attention mechanisms on brain functional time series and connectivity

2.3.1. Deep learning on functional time series

2.3.2. Deep learning on functional connectivity

2.4. Methods

2.4.1. Spatio-temporal graph convolutional network (ST-graph-conv)

2.4.2. Functional connectivity convolutional (FC-conv) network

2.4.3. Functional connectivity based spatial attention (FC-SAtt)

2.4.4. Spatial attention graph pooling

2.4.5. Directed acyclic graph for multi-scale analysis on functional signals and connectivity

2.4.6. Implementation

2.4.7. Evaluation metrics and cross-validation

2.5. Datasets and MRI preprocessing

2.5.1. Adolescent brain cognitive development (ABCD)

2.5.2. Open access series of imaging study-3 (OASIS-3)

2.5.3. MRI preprocessing

2.6. Results

2.6.1. Spatio-temporal directed acyclic graph learning

2.6.2. Fluid intelligence prediction via leave-one-site-out cross-validation

2.6.3. Age prediction

2.6.4. Comparisons with BrainNetCNN and SVR in the prediction of fluid intelligence and age

2.6.5. Comparisons with elastic net’s mixture with random forest, spatio-temporal graph convolution, and BrainNetCNN

2.7. Discussion

3. Reference List

1. 省流版

1.1. 心得

①结合了时间序列和功能矩阵，但是整个模型显得太复杂太乱了。是个很不便于学习和借鉴的模型

②知识+数据驱动型文章

1.2. 论文框架图

2. 论文逐段精读

2.1. Astract

①They developed a spatio-temporal directed acyclic graph with attention mechanisms (ST-DAG-Att)

②The authors adopt this model in functional magnetic resonance imaging (fMRI)

③ST-DAG-Att includes two parts as feed-forward structure in directed acyclic graphs (DAGs), spatio-temporal graph convolutional network (ST-graph-conv) and functional connectivity convolutional network (FC-conv)

④This framework also contains functional connectivity-based spatial attention (FC-SAtt)

⑤They used two large datasets: Adolescent Brain Cognitive Development (ABCD, n=7693) and Open Access Series of Imaging Study-3 (OASIS-3, n=1786)

⑥Task: generalizing from ognition prediction to age prediction

2.2. Introduction

①Briefly introduce fMRI and its disease diagnosis, individual demographic information and cognitive ability

②Models like RNN, LSTM, GRU are for temporal analysis. (They list others as well)

③Information and connections between brain regions might be ignored when use functional time series only

④Their model is based on directed acyclic graph (DAG)

⑤ST-DAG-Att outperforms other models in accuracy

⑥This model contains a) signal and network processing, b) spatial, temporal and functional connectivity information, c) spatial attention pooling

schizophrenia n.精神分裂症

2.3. Related work

2.3.1. Deep learning on functional time series

①RNN, LSTM and GRU all include time series data

②GCNs can also be used in fMRI analysis. However, time series in it is structure and functional connections (FC) are the graph

（盲猜这两个的区别是①用ROI*time points的玩意儿当输入，然后②是根据ROI*ROI建图，ROI*time points的作为结构）

2.3.2. Deep learning on functional connectivity

①In CNN and DNN, FCs are considered as images

②BrainNetCNN includes edge-to-edge layers,edge-to-node layers, and node-to-graph layers to present topological relationships

③GCN sets node as subject and edge as similarity

2.4. Methods

①ST-DAG-Att framework where the blue blocks are ST-graph-conv and the green blocks are FC-conv networks

②They define the node as ROI and edge as functional connection

2.4.1. Spatio-temporal graph convolutional network (ST-graph-conv)

① $G=\left \{ V,E \right \}$ is the brain graph, where $V$ represents the set of nodes and $E$ represents the sets of edges

②For $x$ nodes and $t$ time points, $f\left ( x,t \right )$ is the functional time series（为什么要用函数表示？）

③ST-graph-conv figure:

where $n$ denotes the number of ROI;

$T$ denotes time points;

a series of $C{}'$ represent the number of filter channels;

$w$ denotes the kernel size of filters;

$p_{t}$ and $p_{s}$ denote the temporal and spatial pooling strides respectively

④There are "8 filters in the temporal convolution and 8 spectral filters designed by the Chebyshev polynomials of order 4 in the spectral graph convolution" in each ST-graph-conv layer

⑤The straide of temporal and spatial pooling is 2

（1）Temporal convolution

①The function of temporal convolution:

$f_j^{\prime}\left(x,\frac{t-w+1}{p_t}\right)=Tpool\left(\sigma\left(\sum_{i=1}^Ch_j\left(t,i\right)*f_i\left(x,t\right)\right)\right)$

where $w$ denotes the temporal filter size;

$Tpool$ denotes temporal average pooling;

$h_{j}$ represents the $j$ -th filter with the size of $1\times w$ ;

$\sigma$ denotes leaky ReLU

②Then the $\mathbf{f}=\{f_i\left.(x,t)\right\}_{i=1,2,\ldots,C}\in\mathbb{R}^{n\times T\times C}$ is changed to $\mathbf{f}^{\prime}=\left\{f_i^{\prime}\left(x,t\right)\right\}_{i=1,2,\ldots,C^{\prime}}\in\mathbb{R}^{n\times\frac{T-w+1}{p_t}\times C}$

entangle vt.纠缠;使卷入;缠住;使陷入;套住

（2）Spatial graph convolution

①They adopt spectral filter $g$ in the graph Fourier domain:

$g\left(\lambda\right)=\sum_{k=0}^{K-1}\theta_{k}T_{k}\left(\lambda\right)$

where $K$ is the order of Chebyshev polynomials;

$\lambda$ denotes the eigenvalue of the graph Laplacian $\Delta$ , which represents the brain functional network of graph $G$ ;

$\theta _{k}$ denotes the shape parameter;

$T_{k}$ denotes the Chebyshev polynomial $T_k\left(\lambda\right)=\cos\left(k\cos^{-1}\lambda\right)$ ;

②Then adopt temporal convolution:

$f_j''\left(x,t\right)=\sum_{i=1}^{C'}\sum_{k=0}^{K-1}\theta_k^{ij}T_k\left(\Delta\right)f_i'\left(x,t\right)$

where all time points share the same filters;

$\Delta=I-D^{-\frac12}AD^{-\frac12}$ and $I$ denotes identity matrix, $D$ denotes degree matrices, $A$ denotes adjacency matrices （？？有向图的度矩阵和邻接矩阵不是同一个东西吗？）

③作者说 $\sum_{k=0}^{K-1}\theta_{k}^{ij}T_{k}\left(\Delta\right)$ 通过 $f_{i}^{\prime}(x,t)$ 到了傅里叶域，然后据傅里叶域中切比雪夫多项式的形状对其进行滤波，并将其变换回时域（不明觉厉）

④Then, there is a transform from $\mathbf{f'}=\begin{Bmatrix}f'_i\left(x,t\right)\end{Bmatrix}\in\mathbb{R}^{n\times\frac{T-w+1}{p_t}\times C'}$ to $\mathbf{f}^{\prime\prime}\in\mathbb{R}^{n\times\frac{T-w+1}{p_t}\times C^{\prime\prime}}$

⑤Ulteriorly, take consider on spatial pooling:

$\mathbf{f}^{^{\prime\prime\prime}}=Spool\left(\mathbf{f}^{\prime\prime}\otimes\mathbf{s}\right)$

where $\bigotimes$ represents element-wise multiplication;

$\mathbf{s}\in{\mathbb{R}}^{n\times1\times1}$ is the spatial attention map（？？这是哪里来的）;

$Spool$ denotes a computational unit;

⑥Lastly, $\mathbf{f}^{\prime\prime}\in\mathbb{R}^{n\times\frac{T-w+1}{p_t}\times C^{\prime\prime}}$ convert to $\mathbf{f}^{\prime\prime\prime}\in\mathbb{R}^{\frac n{p_s}\times\frac{T-w+1}{p_t}\times C^{\prime\prime}}$

（3）Spatio-temporal aggregation

①To expand the area from local to global, they aggregate global spatial and temporal information:

$\left.\mathbf{y}=\sigma\left(h_s*\begin{bmatrix}Tavg(\mathbf{f}^{'''})\\Tsd(\mathbf{f}^{'''})\end{bmatrix}\right.\right)$

where $Tavg$ is the abbreviation of temporal global average, $Tsd$ is temporal standard deviation;

$h_{s}$ denotes spatial filters which its kernel size is $\frac{n}{p_{s}}\times1$ ;

$\sigma$ denotes leaky ReLU;

②Then, the $\mathbf{y}\in{\mathbb{R}}^{1\times1\times C^{\prime\prime}}$

2.4.2. Functional connectivity convolutional (FC-conv) network

①FC-conv network figure:

where the input is functional time series $\mathbf{f}\in\mathbb{R}^{n\times T\times C}$ ;

after Pearson's correlation getting unctional connectivity matrix $\mathbf{F}\in\mathbb{R}^{n\times n\times C}$ ;

the edge conv is $\left.\mathbf{Z}=\sigma\left(h_e\right.^*\mathbf{F}\right)$ with $1\times n$ filter $h_{e}$ kernel size;

the node conv is $\mathbf{Z}^{\prime}=\sigma\left(h_n\right.^*\left(\mathbf{Z}\otimes\mathbf{s}\right))$ with $n\times 1$ filter $h_{n}$ kernel size;

the output is $\mathbf{Z}^{\prime}\in\mathbb{R}^{1\times1\times C^{\prime\prime}}$ .

②There are 128 edge filters and 256 node filters in each layer

③The bottleneck ratio in MLP is 4

2.4.3. Functional connectivity based spatial attention (FC-SAtt)

①Functional connectivity based spatial attention (FC-SAtt) figure:

②They firstly adopt channel average poolling layer (Cavg):

$Cavg\left(\mathbf{Z}\right)=\sum_{i=1}^{C^{\prime}}\mathbf{Z}_i\in\mathbb{R}^{n\times1\times1}$

which generates channel-wise statistics

③Then do a series of operators:

$\mathbf{s}=Sigmoid[Fully_2(ReLU(Fully_1(Cavg(\mathbf{Z}))))]$

where $r$ is the drop rate of the fully connected layers

2.4.4. Spatial attention graph pooling

①Spatial attention pooling operation figure:

it may generate a common spatial mask to apply to all the samples

②In binary masks, the top $M=\frac{n}{p_{s}}$ nodes with the highest attetion value are set by 1 and others by 0

③Then the new nodes will be:

$V'=\left\{x:TopM\left(\sum_{i=1}^N\mathbf{m}_i\right)\right\}\subset V$

where $N$ denotes the number of samples

④The new graph is updated to $G{}'=\left \{ V' ,E'\right \}$ . In this new graph, redundant nodes and their corresponding edges are removed, reducing the dimensionality in calculation costs and enhancing the proportion of effective signals

2.4.5. Directed acyclic graph for multi-scale analysis on functional signals and connectivity

They use both ROI signal and FC matricies to learn temporal sequence and connectivity features

2.4.6. Implementation

①⭐For FC, they set the threshold to top 20%

②Filters in the convolutional layers might be selected in $\left \{ 8,16,32,64,128,256 \right \}$

③There are 3 fully connection layers in output layer, which have 256, 256 and 1 hidden node respectively

④Dropout rate: 0.2

⑤Leak rate in Leaky ReLU: 0.33 （作者说因为时间序列和功能连接中的负数都是有意义的。那么问题来了，top 20%的连接是在说绝对值大小吗？如果不是绝对值不是大概率负数都没了吗）

⑥Batch size: 32

⑦Stochastic gradient descent is adopted

⑧Usually convergent after 10 epochs

2.4.7. Evaluation metrics and cross-validation

①Root mean square error (RMSE) is their quantitative comparison method

②Mean absolute error (MAE) and Pearson’s correlation between groud truth and predicted values are adopted as well

2.5. Datasets and MRI preprocessing

①Adolescent Brain Cognitive Development (ABCD) dataset is for predicting fluid intelligence

②Open Access Series of Imaging Study-3 (OASIS-3) dataset is for predicting age

2.5.1. Adolescent brain cognitive development (ABCD)

①Site: ABCD研究 (abcdstudy.org)

②The authors screen out T1 image and rs-fMRI image with 2.4 $mm^{3}$ isotropic voxels and 800 ms TR

③They exclude error scanned images and one site with only 24 subjects, then remain 18 sites with 7693 subjects

④The fluid intelligence score of the 7693 subjects is from 64 to 123, and their mean and standard deviation is 95.3±7.3

isotropic adj.各向同性的；等方性的

2.5.2. Open access series of imaging study-3 (OASIS-3)

①It is a dataset of Alzheimer’s disease (AD)

②Site: OASIS Brains - Open Access Series of Imaging Studies

③Samples: 468

2.5.3. MRI preprocessing

①They use FreeSurfer 5.3.0. to segment brain image to gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF)

②Excluding rs-fMRI that head motion mean framewise displacement (FD) beyonds 0.5 mm

pediatric adj.小儿科的

2.6. Results

①5-fold and leave-one-site-out cross-validation is used in ABCD and via 5-fold cross-validation is used in OASIS-3

②They compared their model with BrainNetCNN, support vector regression (SVR) in both dataset

2.6.1. Spatio-temporal directed acyclic graph learning

①They adopt 5-fold cross-validation, 4 folds for training and 1 fold for validation

②Learning rate: 1e-3

③Employing 5-fold cross-validation by 10 times

④figure of (A) ST-graph-conv network and (B) FC-conv network:

⑤Then, they get scatter plots in fluid intelligence prediction

⑥The accuracy of three models:

2.6.2. Fluid intelligence prediction via leave-one-site-out cross-validation

①The ABCD is used for predicting fluid intelligence via leave-one-site-out cross-validation

②Correlation, MAE and RMSE between actual and predicted fluid intelligence:

③Attention maps of fluid intelligence and age built by the first block after the computation of the spatial attention graph pooling module. It shows the most relevant brain regions of prediction

2.6.3. Age prediction

①OASIS-3 dataset is used for predicting age

②In 5-fold cross-validation, 4 for training and 1 for validation as well

③Learning rate: 1e-2

④ $l_{2}$ -norm regularization rate: 1e-4

⑤Correlation, MAE and RMSE between actual and predicted age:

2.6.4. Comparisons with BrainNetCNN and SVR in the prediction of fluid intelligence and age

Analyse the performances of different models

2.6.5. Comparisons with elastic net’s mixture with random forest, spatio-temporal graph convolution, and BrainNetCNN

①Their model aims to analyse functional time signals

②BrainNetCNN is for FC networks

2.7. Discussion

①They put forward ST-DAG-Att to predict functional time series and connectivity of cognition and age

②They analyzed the brain regions that play a major role in both predictions

3. Reference List

Huang, S. et al. (2022) 'Spatio-temporal directed acyclic graph learning with attention mechanisms on brain functional time series and connectivity', Medical Image Analysis, vol. 77. doi: Redirecting