[ICLR 2025]CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding

论文网址:2412.07236

论文代码:github.com

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Method

2.4. Experiments

2.4.1. Pre-training

2.4.2. Experiment Setup of Downstream BCI Tasks

2.4.3. Results

2.5. Conclusion

1. 心得

(1)沉默,是今晚的康桥。

(2)写英文不能当键盘侠,于是我决定写中文

2. 论文逐段精读

2.1. Abstract

        ①The spatial and temporal features of EEG signals are heterogeneous, so they need to be modelled independently

        ②They proposed CBraMod to solve the dependence and different EEG data formats problems

        ③Datasets: 12 public with 10 downstream tasks

 criss  adj. 漂亮的,时髦的 n. (Criss)(美)克里斯(人名)

criss-cross adj. 交错纵横的

2.2. Introduction

        ①Existing EEG processing methods:

        ②The authors state the correlation between channels and time points are different, thus global attention is not suitable for EEG signals

        ③CBraMod is pretrained on Temple University Hospital EEG Corpus (TUEG)

2.3. Method

        ①Overall framework:

(1)Patching & Masking

        ①Input EEG sample: S\in\mathbb{R}^{C\times T} with C channel and T timestamps

        ②Patch segmentation: for window length t, they resize S to X\in{​{\mathbb{R}}}^{C\times n\times t} with n={\lfloor\frac{T}{t}\rfloor} patches of one channel and X=\{x_{i,j}|i\in[1,2,...,C],j\in[1,2,...,n]\}

        ③A representation of a patch: x\in\mathbb{R}^t

        ④Total number of patches: |X|=Cn

        ⑤Mask: \mathcal{M}=\{m_{i,j}|i\in[1,2,...,C],j\in[1,2,...,n]\} with Bernoulli distribution of r propotion, and m_{i,j}\in\{0,1\} is the mask indicator of x_{i,j}

        ⑥Masked EEG patches:

\tilde{x}_{i,j} = \begin{cases} x_{i,j},\quad m_{i,j}=0 \\ x_{M},\quad m_{i,j}=1 \end{cases} \\ \tilde{X}=\{\tilde{x}_{i,j}|i\in[1,2,...,C],j\in[1,2,...,n]\}

where x_M\in\mathbb{R}^t denotes mask token, \tilde{X}\in\mathbb{R}^{C\times n\times t} denotes remaining EEG patches

(2)Time-Frequency Patch Encoding

        ①Time domian processing: they use one-dimensional convolution layer, a group normalization layer, and a GELU activation function to process input \tilde{x}_{i,j} to obtain time domain embedding e_{i,j}^t\in\mathbb{R}^d with d dimension

        ②Frequency-domain branch: they use fast Fourier transform (FFT) and a fully-connected layer to get frequency-domain embedding e_{i,j}^f\in\mathbb{R}^d

        ③Embedding fusion:

\begin{array} {c}e_{i,j}=e_{i,j}^t+e_{i,j}^f \\ E=\{e_{i,j}|i\in[1,2,...,C],j\in[1,2,...,n]\} \end{array}

where e_{i,j}\in\mathbb{R}^d is patch embedding, E\in\mathbb{R}^{C\times n\times d} is the set of patch embeddings

(3)Asymmetric Conditional Positional Encoding

        ①ACPE: a convolution layer with kernel (k_s,k_t) and (\frac{k_{s}-1}{2},\frac{k_{t}-1}{2}) zero paddings (k_s> k_t)(作者觉得,因为是长方形的卷积块,就非对称了,还能同时关注到空间和位置信息= =|||。xd的解决方法真是......额,简单易懂呢)

        ②一个类似残差的结构,把E喂给ACPE得到:

E^{p}=\{e_{i,j}^{p}|i\in[1,2,...,C],j\in[1,2,...,n]\}

where E^{p}\in\mathbb{R}^{C\times n\times d} and e_{i,j}^p\in\mathbb{R}^d, 然后把EE^p加起来:

E^o=E+E^p=\{e_{i,j}+e_{i,j}^p|i\in[1,2,...,C],j\in[1,2,...,n]\}

where E^{o}\in\mathbb{R}^{C\times n\times d}

(4)Criss-Cross Transformer

        ①Pipeline of Criss-Cross Transformer Block:

上面的E^o经过Layer Norm变成\tilde{E}\in\mathbb{R}^{C\times n\times d}。我好心的猜测作者可能是因为篇幅,导致了以下公式和解释不是特别详细。

        ②首先,作者把\tilde{E}的按照前半通道和后半通道分成两组(我的通道指最后一维h,不是说电极通道C),在上面的路径中把前半通道组的每一列使用注意力:

F_k^j=\mathrm{Attention}(\tilde{E}^jW_k^Q,\tilde{E}^jW_k^K,\tilde{E}^jW_k^V)

把一开始被分开的每一列在分别应用注意力之后又合起来:

\text{S-Attention}_k(\tilde{E})=[F_k^1,F_k^2,...,F_k^n]

同理下面的路径,只是下面是对后半通道进行注意力。最后把列注意力和行注意力块拼起来:

\mathrm{Criss-Cross-Attention}(\tilde{E})=\mathrm{Concat}(\mathrm{head}_1,\mathrm{head}_2,...,\mathrm{head}_K)

\mathrm{head}_k= \begin{cases} \text{S-Attention}_k(\tilde{E}), & \quad k\in[1,2,...,K/2] \\ \text{T-Attention}_k(\tilde{E}), & \quad k\in[K/2+1,K/2+2,...,K] \end{cases}

拼起来之后这玩意儿是E^{r}\in\mathbb{R}^{C\times n\times d}。(此时此刻我想知道这哪里交错纵横了只是单纯的叠在了一起感觉是俩完全不相关的东西甚至都不是两只手洗牌那种交叉进去的它的意义在哪里)

        ③为什么不提及一下F的形状?它在上下文中为啥不出现一下

(5)Masked EEG Reconstruction and EEG reconstruction 

        ①由全连接层组成的重建头?作者是学文学的吗。我真的会忍不住开炮的。我以后论文都应该叫由全连接层组成的检测器。

        ②E^r通过全连接层变成最终预测\hat{X}\in\mathbb{R}^{C\times n\times t}

        ③作者真的巨爱写那种巨长的表示:

        ④MSE loss:

\mathcal{L}=\|\hat{X}^M-X^M\|^2

2.4. Experiments

2.4.1. Pre-training

(1)Pre-training Dataset

        ①Dataset: Temple University Hospital EEG corpus (TUEG)

        ②Data: 69,652 clinical EEG recordings from 14,987 subjects across 26,846 sessions, with a total duration of 27,062 hours

(2)Preprocessing

        ①Screening: remove records which the total duration are no more than 5 or absolute amplitude exceed 100 µV

        ②Cropping: the first and the last one minutes

        ③Electrode choosing: 19, including Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2

        ④Band-pass filter: 0.3 Hz–75 Hz

        ⑤Notch filter: 60Hz

        ⑥Resampling: 200Hz

        ⑦Segmentation: 30s

        ⑧Norm: 100µV

        ⑨Remaining samples: 1109545

(3)Pre-training Settings 

        ①Duration of patch: 1s with 200 data points

        ②Layer of Criss-Cross Transform Block: 12 with 200 hidden dimensions, 800 inner dimensions, 8-head

        ③Batch size: 128

        ④Optimizer: AdamW

        ⑤Learning rate: 5e-4

        ⑥Weight decay: 5e-2

2.4.2. Experiment Setup of Downstream BCI Tasks

        ①Statistics of datasets:

2.4.3. Results

        ①Emotion recognition performance:

        ②Motor Imagery Classification performance:

        ③Attention block ablation:

        ④Positional encoding ablation:

        ⑤Pre-training ablatrion:

where 1) w/o pre-training: directly training CBraMod on downstream datasets; 2) dirty pre-training: pre-training CBraMod on TUEG corpus without bad samples dropping. 3) clean pre-training: pre-training CBraMod on TUEG corpus with bad samples dropping.

2.5. Conclusion

        ~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值