[论文精读]Characterizing functional brain networks via Spatio-Temporal Attention 4D Convolutional Neural

夏莉莉iy

已于 2023-10-07 18:37:19 修改

阅读量218

点赞数

分类专栏：论文精读文章标签：深度学习机器学习人工智能卷积神经网络笔记学习

于 2023-10-05 21:48:29 首次发布

本文链接：https://blog.csdn.net/Sherlily/article/details/133584102

版权

论文精读专栏收录该内容

77 篇文章 9 订阅

订阅专栏

论文全名：Characterizing functional brain networks via Spatio-Temporal Attention 4D Convolutional Neural Networks (STA-4DCNNs)

论文原文：Characterizing functional brain networks via Spatio-Temporal Attention 4D Convolutional Neural Networks (STA-4DCNNs) - ScienceDirect

论文代码：GitHub - 西江实验室UESTC/STA-4DCNN

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用！

2.3. Method and materials

2.3.1. Method overview

2.3.2. Data description and preprocessing

2.3.3. 4D convolution/deconvolution and attention copy

2.3.4. Model architecture of STA-4DCNN

2.3.5. Model setting and training scheme

2.3.6. Model evaluation and validation

2.4. Results

2.4.1. Spatio-temporal pattern characterization of DMN in emotion T-fMRI

2.4.2. Generalizability of spatio-temporal pattern characterization of DMN in other six T-fMRI and One Rs-fMRI

2.4.3. Effectiveness of spatio-temporal pattern characterization of other FBNs

2.4.4. Ablation study

2.4.5. Characterization of abnormal spatio-temporal patterns of FBNs in ASD patients

2.5. Conclusion

3. 知识补充

3.1. Boltzmann machine

3.2. Deconvolution/Transposed convolution/Fractionally-strided convolution

4. Reference List

1. 省流版

1.1. 心得

（1）写了如同没写的贡献，全给新颖去了倒是感觉不到几个贡献

（2）仅考虑个体不考虑组间

（3）咦所以什么是浅层数据什么是深层数据？

1.2. 论文框架图

2. 论文逐段精读

2.1. Abstract

①There is still a broad space of 4D-fMRI

②They proposed a Spatio-Temporal Attention 4D Convolutional Neural Network (STA-4DCNN) model for functional brain networks (FBNs) which includes Spatial Attention 4D CNN (SA-4DCNN) and Temporal Guided Attention Network (T-GANet) subnetworks.

③Introduce the dataset they adopted

2.2. Introduction

①4D-fMRI is come from the combination of 3D brain image and 1D time series

②Characterizations of FBNs such as general linear model (GLM) in 1D, principal component analysis (PCA), sparse representation (SR) and independent component analysis (ICA) in 2D and 3D convolutionn in 3D are all limited in spatio-temporal analysis

③Briefly introduce how they experimented

④Fine, they think their contributions are combining U-Net and attention mechanism and the model works...

2.3. Method and materials

2.3.1. Method overview

The overview of SA-4DCNN:

where SA-4DCNN receives 4D data and converts it into 3D spatial output and T-GANet receives both 3D spatial output and 4D data and converts it into 1D Temporal output.

2.3.2. Data description and preprocessing

（1）Data selection

①Types of data: seven t-fMRI including emotion, gambling, language, motor, relational, social, and working memory and Rest state fMRI (rs-fMRI)

②Dataset: HCP S900 release (both types) and ABIDE I (only rs-fMRI, only for evaluating generalizability)

③Sample: randomly select 200 in HCP, 64 ASD and 83 typical developing (TD) in ABIDE I

④Preprocessing: FSL FEAT toolbox, normalized to 0–1 distribution, down-sampled to 48*56*48 (spatial size) and 88 (temporal size) for HCP data, standard Configurable Pipeline for the Analysis of Connectomes (CPAC) for ABIDE I

（2）Data processing

①Training labels: dictionary learning and sparse representation (SR)

②Presenting 4D-fMRI as matrix $X^{t\times m}$ , where $t$ denotes time points and $m$ denotes the total number of brain voxels

③Decompose $X^{t\times m}$ to $X^{t\times m}=D^{t\times k}\times\alpha^{k\times m}+\varepsilon$ with dictionary learning and sparse representation (SR) where $\varepsilon$ is error term and $k$ is predefined dictionary size

④Each column in dictionary matrix $D^{t\times k}$ is temporal pattern of the FBN

⑤Each row in sparse coefficient matrix $\alpha^{k\times m}$ is spatial pattern of the FBN

2.3.3. 4D convolution/deconvolution and attention copy

（1）Convolution

①4D convolution (a) and 4D deconvolution (b) figure:

①The authors explain they decomposed 4D filter into different 3D convolution kernels along the temporal dimension. Then use these 3D convolution kernels to perform 3D convolution on the input 4D data. The final output is the sum of 4D data.

②⭐为了把D*H*W*C（长宽高*时间）的4D图变成2D*2H*2W*2C，第一个绷带大方块（2D*2D*2W）是padding一个小方块得来的（但是作者没有具体说怎么padding只是cite了别人的论文）。然后对于每个绷带大方块都padding出一个新的空白方块。这样就得到了2D*2H*2W*2C阵列。（因为我觉得蛮重要的就中文解释惹，这样下次看笔记一眼就看到了）

（2）Attention

①They concatenate shallow and deep layers feature with 5D (D*H*W*C*L where L is the channel dimension):

②The specific operation is:

$\begin{aligned}\boldsymbol{Q}&=unfold\left(conv_Q\left(\boldsymbol{X}\right)\right)\\\boldsymbol{K}&=unfold\left(conv_K\left(\boldsymbol{X}\right)\right)\\\boldsymbol{V}&=unfold\left(conv_V\left(\boldsymbol{X}\right)\right)\end{aligned}$

where $unfold\left ( \cdot \right )$ denotes re-ordering the 4D matrix of size D*H*W*C into a 2D one of size (D*H*W)*C

③The 4D attention output is:

$attention\left(Q,K,V\right)=Softmax\left(\frac{QK^T}{\sqrt{M}}\right)\times V$

where $M$ denotes the number of features

④Then restore the dimension through reverse operations:

$A=\begin{bmatrix} fold\left ( s_{1} \right )\\ ...\\ fold\left ( s_{2} \right )\\ fold\left ( s_{3} \right ) \end{bmatrix}$

where $s_{i}$ is the 4D matrix in s along the last dimension（其实我没太懂这里的表述呢？我觉得就是attention()完的东西？）

2.3.4. Model architecture of STA-4DCNN

The whole framework of STA-4DCNN:

（1）Spatial attention 4D CNN (SA-4DCNN)

①The original chanel is 1, but transfer to 2 after 2 4D convolutional layers

②The red arrows are maxpooling layers

③For the output similar to input with 4D, they adpot 3D CNN to reduce dimension

（2）Temporal guided attention network (T-GANet)

①Firstly they combine spatial pattern and 4D fMRI IN:

$\boldsymbol{Q}=unfold\left(conv_{\boldsymbol{Q}}\left(\boldsymbol{X}\odot\boldsymbol{G}\right)\right)$

showed in the top orange line, where $unfold\left ( \cdot \right )$ transfer input with S*S*S*C to P*C (P=S*S*S)

②Then combine $K$ and $Q$ :

$attention\, output=4D \, attention\left ( \textbf{X},\textbf{Q} \right )$

感觉（2）的公式好迷惑啊，其实图已经很清晰了但是总感觉公式不太对的上

2.3.5. Model setting and training scheme

①The designed two separate loss function to SA-4DCNN and T-GANet

②Loss function in SA-4DCNN:

$\begin{array}{ll}overlap&rate=\frac{sum(min(norm(U),norm(W)))}{(sum(norm(U))+sum(norm(W)))/2}\\spatial&loss=-overlap\quad rate\end{array}$

where $U$ is characterized spatial pattern （那大哥你为啥不用 $G$ 啊？） and $W$ denotes training label

③Loss function in T-GANet:

$\begin{aligned}&PCC=\frac{t\sum_{i=1}^tx_iy_i-\sum_{i=1}^tx_i\sum_{i=1}^ty_i}{\sqrt{\left(t\sum_{i=1}^tx_i{}^2-\left(\sum_{i=1}^tx_i\right){}^2\right)\left(t\sum_{i=1}^ty_i{}^2-\left(\sum_{i=1}^ty_i\right){}^2\right)}}\\&\textit{temporal}\quad loss=-PCC\end{aligned}$

④Training curve:

⑤Training set: 160 for training and 40 for testing

⑥Parameter: the same as their previous paper (Yan et al., 2021)

⑦Learning rate: 0.0001 in the first and 0.0005 in second training stages

⑧Epoch: 150 in spatial and 20 in temporal

⑨Optimizer: Adam in both stages

⑩Convolutional kernals: 3*3*3*3 in 4D and 3*3*3 in 3D

⑪Activation: a batch norm (BN) and a rectified linear unit (ReLU) are followed by each convolutional layer

⑫When $S$ in Fast Down-sampling Block is set by 12, they achieve the hightest accuracy

2.3.6. Model evaluation and validation

①Brain network: default mode network (DMN)

②Evaluate spatio-temporal patterns through overlapping parts:

where blue lines represent STA-4DCNN, red is ST-CNN and green is SR.

2.4. Results

2.4.1. Spatio-temporal pattern characterization of DMN in emotion T-fMRI

①Dataset: 40 of emotion t-fMRI

②Averaged spatial pattern similarity values of DMN（这么低？）:

③Averaged temporal pattern similarity values of DMN:

2.4.2. Generalizability of spatio-temporal pattern characterization of DMN in other six T-fMRI and One Rs-fMRI

①Dataset: 200 in other six different t-fMRI and one rs-fMRI datasets

②Characterized spatio-temporal patterns of DMN in three example subjects of rs-fMRI sample:

③Characterized spatio-temporal patterns of DMN in one example subject of the other six t-fMRI sample:

2.4.3. Effectiveness of spatio-temporal pattern characterization of other FBNs

①Comparison of other three FBNs:

Obviously, there are more similarity and less scattered noise points

②Averaged spatial pattern similarity values table for other 3 FBNs:

2.4.4. Ablation study

①Reduce one layer in 4D U-Net

②Drop 4D convolution layers after 4D convolution

③Remove attention copy operation

2.4.5. Characterization of abnormal spatio-temporal patterns of FBNs in ASD patients

①Averaged spatial and temporal pattern similarity values of ASD and TD in DMN and Auditory:

②Characterized spatio-temporal patterns of DMN with STA-4DCNN in five example ASD and TD:

③Characterized spatio-temporal patterns of auditory network with STA-4DCNN in five example ASD and TD:

2.5. Conclusion

Their trained results are highly simiar to true labels. Moreover, the model is able to be generalized to other situation

3. 知识补充

3.1. Boltzmann machine

过于复杂不做多解释机器学习笔记之深度玻尔兹曼机(一)玻尔兹曼机系列整体介绍_静静的喝酒的博客-CSDN博客

3.2. Deconvolution/Transposed convolution/Fractionally-strided convolution

【深度学习反卷积】反卷积详解, 反卷积公式推导和在Tensorflow上的应用 - 知乎 (zhihu.com)

4. Reference List

Xi, J. et al. (2022) 'Characterizing functional brain networks via Spatio-Temporal Attention 4D Convolutional Neural Networks (STA-4DCNNs)', Neural Networks, vol. 158, pp. 00-110. doi: Redirecting

Yan, J. et al. (2021) 'A Guided Attention 4D Convolutional Neural Network for Modeling Spatio-Temporal Patterns of Functional Brain Networks', The 4th Chinese conference on pattern recognition and computer vision.