[论文精读]A novel 5D brain parcellation approach based on spatio-temporal encoding of resting fMRI data

Thus, they put forward a spatio-temporal-network (5D) brain parcellation scheme through predicting the probability that each voxel in the brain belongs to a certain network by deep residual network. （⭐是的没错注意一下这不是我常写的内容，我以前老记录分类疾病的但这个不是。这是在自制altas我觉得。哈哈哈哈。希望我的老师不要打死我，摸鱼摸到别的地儿去了。我只是真的很好奇这个5D哪里来的）

2.1.3. Conclusion

Their network is independent of individuals and variable of time

2.1.4. Significance

They have... new finding...

2.2. Introduction

①Brain is consist of copious subparts.

②Computational neuroimaging includes visual neuroscience, computer science and psychology, which is a interdisciplinary field

③Altas based methods still have drawbacks, such as relying on given template, do not consider the natural variance of the brain and being expensive in spatial registration

④Existing parcellation methods fail to simulate complete 4D natures of brain. Ergo, the authors proposed a new approach with deep residual network.

⑤⭐作者认为的五维是每个4D脑网络（被试吗？）叠起来？这就五维啊我以为啥呢

copious adj.大量的;丰富的;充裕的

cytoarchitectonic ~cytoarchitectural adj.细胞结构（构筑）的

2.3. Related work

①Predefined templates and mathematical similarity metrics both are effective brain parcellation methods.

②They list some other works.

agglomerative adj.凝聚（结）的；附聚的；烧结的；胶凝的

precuneus n.楔前叶;前楔叶;楔前页;内侧顶叶

2.4. Theoretical background

①They depoy ICA as prior components, and then train residual network in a supervised way

②They define fMRI image $X=AS$ , where $A$ is consist of time points * components and $S$ spatial mapping matrix with component number * voxel size. Hence, the shape of $X$ is time points* voxel size

2.5. Method

①Through encoding patio-temporal dynamics, their model is able to predict the brain parcellation by predicting every single voxel

②The input is time series. They generate deep residual network in a supervised method:

2.5.1. Label generation

①They obtain $k$ time courses and spatial maps, then calculates their outer product:

$N_i=TC_i\otimes SM_i$

where $TC_i$ denotes time coutse, $SM_i$ denotes spatial map and $N_i\in\mathbb{R}^{t\times voxel\, size}$

②Probabilistic maps are produced by:

$PM_i=|N_i|\quad/\quad\sum_{i=1}^k|N_i|$

which change all the elements to $\left [ 0,1 \right ]$

③They they introduce a same shape predefined mask to transfer probabilistic networks to 4D tensor（我也不太知道咋变的）

2.5.2. Model configuration

①Model framework:

②The output channel of Conv 3D in (a) is 64, kernel size is 3, following a Sigmoid function;

the output channels of 3 res-encoding blocks are the same, 32;

the padding of the last Vonv is 1;

every Conv+batch norm layer adopts Sigmoid.

③Each encoding block follows a 3D max pooling with stride=1 and kernel size=3, the output channels are 16 and 8. Output channels of decoding layers are 16, 32 and 64.

④The final transposed 3D Conv layer adopts output channle=1 and kernel size=3.

⑤Mean squared error is the loss function:

$L(h,(x,y))=\frac1n\times \sum_{i=1}^n\left ( h\left ( x_i \right )-y_i \right )^{2}$

where $h$ represents hypothesis

2.5.3. Model input templates

①Datasets: genomics superstructure project (GSP) and human connectome project (HCP)

②Template: independent components (ICs)-> ICA, then identify 53 functionally relevant resting-state networks (RSNs) of each subject. Every RSN is at the size of 53*63*52 and is divided by subcortical (SC), auditory (AU), sensory motor (SM), visual (VI), cognitive control (CC), default mode (DM), and cerebellar (CB) domains.

2.5.4. fMRI data: acquisition and preprocessing

①They test their model on data of UK Biobank study while 6 subjects for evaluating variance and 40 for age variance

②Method of scanning: "Participants were scanned once by a 3-Tesla (3 T) Siemens Skyra scanner with a 32-channel receive head coil, acquired all in one site. A gradient-echo echo planar imaging (GE-EPI) paradigm was used to collect/obtain resting-state fMRI scans. The EPI-based acquisition parameters include multiband acceleration factor of 8 (i.e., eight slices were acquired simultaneously), no iPAT, fat saturation, flip angle (FA)= 52°, spatial resolution = 2.4 × 2.4 × 2.4 mm, repeat time (TR)= 0.735 s, echo time (TE)= 39 ms, and 490 volumes. Subjects were instructed to stare at a crosshair passively and remain relaxed, not thinking about anything, during the six minutes and ten second resting-state scanning period."

③Intra-modal motion correction tool MCFLIRT is used for minimizing the distortions caused by head motion.

④Adopting grand-mean intensity normalization to increase comparability of subjects

⑤Deploying high-pass temporal filter (Gaussian-weighted least squares straight line fitting, with σ = 50.0 s) to remove residual temporal drift

⑥Utilizing FSL’s Topup tool to correct geometric distortions of EPI scans

2.6. Results

①Optimizer: Adam

②Learning rate: 0.00001

③Step size of optimizing the loss function: 5

④Epochs of training phase: 200

⑤Adopting early stopping to avoid overfitting and to improve the generalization

⑥Training, validation and test datasets are 70%, 10% and 20% respectively without overlappig

⑦Mean squared error in training set and validation set and averaged MSE loss in 53 models:

⑧The ability of capturing dynamics of their model presented in sensory-motor network:

the picture illustrate the original differences between 6 subjects in random timepoint

⑨They calculate the average value at time points and apply z-scores to depict the outputs of all 53 networks of a random subject, which better represents brain regions:

2.6.1. Spatial variability

①They use z-scoring, summarize differences between timepoints and capture the variances of voxels.

②The spatial variability for the sensory motor, default mode, auditory, subcortical, cerebellar and visual networks:

They concluded from this picture that the absolute changes in the sensory motor network highlight the sensory motor bands and the default mode network shows very high variability in the posterior cingulate gyrus, but not in the front part of the network (medial frontal)

2.6.2. Age effects

①They choose 20 subjects with age is less than 40 and 20 subjects with age is more than 60

②K-means algorithm is used for analysis, and they find magnitudes of cluster of oung subjects are higher than the elders

③How age influence the young and old person:

2.6.3. Temporal variability

①They encode different time points to research functional dynamicity in brain network

②"oscillations of magnitude (blue), mean (green) and standard deviation (red) of active region in a visual network for a given subject over 490 time points":

③Correlation heatmaps of 53 networks at all time points:

④Calculating the sliding window cross-correlation between all networks within 50 time points, and obtaining 12 heatmaps that vary over time:

2.7. Discussion

Their proposed model can capture the differences between subjects, spatial variability within the network, and temporal coupling between different networks

2.8. Conclusion

The 5D brain parcellation method proposed by the authors can capture the spatiotemporal dynamics of the brain and provide output that is sensitive to individual changes

3. Reference List

Kazemivash B. & Calhoun D. V. (2022) 'A novel 5D brain parcellation approach based on spatio-temporal encoding of resting fMRI data from deep residual learning', Journal of Neuroscience Methods, 369. doi: Redirecting