[论文精读]Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: -

论文网址:Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results - ScienceDirect

论文全名:Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results

论文代码:https://github.com/xxlya/Fed_ABIDE

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related work

2.3.1. Federated learning

2.3.2. Domain adaptation

2.4. Methods

2.4.1. Basic privacy-preserving federated learning setup

2.4.2. Boosting multi-site learning with domain adaptation

2.5. Experiments and results

2.5.1. Data

2.5.2. Federated training setup and hyper-parameters discussion

2.5.3. Comparisons with different strategies

2.5.4. Evaluate model from interpretation perspective

2.5.5. Limitation and discussion

2.6. Conclusion

3. 知识补充

3.1. Differential privacy

3.2. L1 sensitivity

4. Reference


1. 心得

(1)有一说一FL的对比表是真的爽...

(2)也可能是比较古早了,感觉就是MLP6105-16-2+域对抗学习2loss+交叉熵损失

(3)用FC来直接表示生物标志物认真的吗?这个哪里表示是学习出来的呢?感觉作者没有很细讲

2. 论文逐段精读

2.1. Abstract

        ①They proposed 2 domain adaptation methods

2.2. Introduction

        ①The healthcare system is unwilling to disclose data due to concerns about customer theft and loss of patients(会咩)

        ②Different data distribution on different sites:

2.3. Related work

2.3.1. Federated learning

        ①Two methods of FL: a) sending parameters only, b) information transfers between different communities by encryption techniques. They applied the first one

2.3.2. Domain adaptation

        ①Listing some domain adaptation methods by citation and pointing out that they do not use FL

2.4. Methods

2.4.1. Basic privacy-preserving federated learning setup

(1)Problem definition

        ①Data in i-th site is denoted by matrix D_i(是fMRI data所以是个矩阵)

        ②N sites: \left \{ F_1,...,F_N \right \} with institution owning private fMRI data

        ③Feature space is noted by X (extracted fMRI feature), label space is Y (diagnosis or phenotype needing predict) and sample ID space is represented by I

        ④Data distribution:

X_i=X_j,Y_i=Y_j,I_i\neq I_j,\forall D_i,D_j,i\neq j

        ⑤FL:

(2)Privacy-preserving decentralized training

        ①Cross entropy loss in this FL:

\mathcal{L}_{ce}^n=-\sum_{n_i}\left[y_{n_i}\log\left(p_{n_i}\right)+\left(1-y_{n_i}\right)\log\left(1-p_{n_i}\right)\right]

where y_{n_i} denotes the label of the i-th subject in the n-th site, Y_n=\left \{ y_{n_1},...,y_{n_{\left | Y_n \right |}} \right \}, and p_{n_i} is the predicted probability

        ②Training process:

(3)Randomized mechanism for privacy protection

        ①For a deterministic real-valued function h:D\to\mathbb{R}^m, L1 sensitivity s_h of h is:

\|h\left(D\right)-h\left(D^{\prime}\right)\|_{1}

if \left \| D-D' \right \|_1=1, denotes there is only one data point difference between D and D'为啥这是D和D'?而不是D_iD_j这种?还是两个站点的数据集吗?按照普通的差分隐私来说,应该代指两个数据集间只有一个数据的差异,但是不知道这里作者是不是这个意思。我超,查了网上的差分隐私看见别人都是D和D’这种表示,不会是作者搬过来的时候没有改表达吧。

        ②They define h in their model is the m weight parameters

        ③Differential privacy:

Pr\left[h\left(D\right)\in S\right]\leq e^{\epsilon}Pr\left[h\left(D^{\prime}\right)\in S\right]

or

Pr\left[h\left(D\right)\in S\right]\leq e^{\epsilon}Pr\left[h\left(D^{\prime}\right)\in S\right]+\delta

作者都没说这里在干嘛诶....S是什么?Pr又是什么?虽然不用把文章当成普通科普,但是符号的定义至少要给啊...

        ④Gaussian Mechanism: adding N\left(0,s_{h}^{2}\sigma^{2}\right) noise to h, where the (\epsilon,\delta) differential privacy will become \delta\geq\frac{4}{5}\mathrm{exp}\left(-\left(\sigma\epsilon\right)^{2}/2\right) and \epsilon< 1

        ⑤Laplace Mechanism: 我先随便插几张拉普拉斯分布的图,可以见得它和高斯分布有点类似只是是尖尖的,而且也是俩参数,公式还比高斯看起来简单一点:

They employ Laplace Distribution with scale b:

Lap\left(b\right):=Lap\left(x|b\right)=\frac{1}{2}\mathrm{exp}\left(-\frac{|x|}{b}\right)

and \sigma^{2}=2b^{2}, the Lap(s_{h}/\epsilon) noise will be add to h, the difference privacy is (\epsilon,0)(作者只用了一个参数咩~)。然后他们为了简化讨论假定灵敏度s_h为1???这能假定吗?这不是两个数据集间的差异吗。我猜测作者是在说两个站点间参数的差异为1?

2.4.2. Boosting multi-site learning with domain adaptation

(1)Mixture of experts (MoE) domain adaptation

        ①Experts: mean deep learning models

        ②MoE: trainable gating network used in feed-forward neural network

        ③Domain adaptation strategies with FL:

        ④The final output of their network:

\hat{y}_i=a_i\left(x\right)y_G+\left(1-a_i\left(x\right)\right)y_P

where the a_i\left ( x \right ) is the gating function in MoE and they use an non-linear layer a_{i}\left(x\right)=\sigma\left(\psi_{i}^{T}\cdot x+b_{i}\right) to represent it, \sigma is Sigmoid, \psi _i and b_i are learnable weights

(2)Adversarial domain alignment

        ①They trained a local feature extractor G_s for the source site D_s

        ②They also trained a local feature generator G_t for the target site D_t

        ③They align distribution of D_s and D_t by training a adversarial domain discriminator D

        ④G_s and G_t aim to confuse D by adding noise (generate M\circ G_s(x^s) and M\circ G_t(x^t), M denotes noise generator) and D aims to identify the domain

        ⑤To discriminate domain:

\begin{aligned}\mathcal{L}_{advD}\left(\mathbf{X}^{S},\mathbf{X}^{T},G_{s},G_{t}\right)&=\quad-\mathbb{E}_{x^{s}\sim\mathbf{X}^{s}}\left[\log D_{s}\left(G_{s}\left(x^{s}\right)\right)\right]\\&-\mathbb{E}_{x^{t}\sim\mathbf{X}^{T}}\left[\log\left(1-D_{s}\left(M\circ G_{t}\left(x^{t}\right)\right)\right)\right]\end{aligned}

        ⑥第二步中的损失??第二步是什么?\mathcal{L}_{advD}不变但\mathcal{L}_{advG}更新?以前也没这个东西啊怎么能叫更新?

\begin{aligned}\mathcal{L}_{advG}\left(\mathbf{X}^{S},\mathbf{X}^{T},G_{s},G_{t}\right)&= -\mathbb{E}_{x^{s}\sim\mathbf{X}^{S}} [\log D_{s} (G_{s} (x^{s}))]\\&-\mathbb{E}_{x^{t}\sim\mathbf{X}^{T}}[\log\left(D_{s} (M\circ G_{t} (x^{t}))\right]\end{aligned}

        ⑦Algorithm(在这个第八行感觉出⑤和⑥俩损失就是直接都用上就行,但是作者也没有特别的解释,不知道从哪扒来的):

(3)Evaluate model by interpreting biomarkers

        ①Gradient based method:

g_k^c=ReLU\left(\frac{\partial\hat{y}^c}{\partial x_k}\right)

where c \in \left \{ 0,...,C-1 \right \} denotes the input ground class, y^c is the score of the c lass befor softmax, x_k is the k-th feature in the input, g^c_k denotes the imprtance of classifying c of feature k

2.5. Experiments and results

2.5.1. Data

(1)Participants

        ①Sites chosen: the largest four, UM1、NYU、USM、UCLA1 with 106, 175, 72, 71. Eliminating incomplete data samples, left 88, 167, 52, 63 each

        ②Atlas: HO with 111 ROIs

        ③Slicing window: 32 size with 1 stride to crop the original time series

        ④Sample statistics:

        ⑤Demographic data:

(2)Data preprocessing

        ①FC: average ROI series by Pearson Correlation

        ②FC is applied to Fisher transformation

        ③Only remain the upper triangular matrix and flatten them to vector, fed them to MLP later (111*(111-1)/2=6105 dimension)

2.5.2. Federated training setup and hyper-parameters discussion

        ①MLP: 6105-16-2(哥们儿跳水呢?)

        ②Cross validation: 5 fold

        ③They define m instances for each subject, if more than m/2 instances are 'ASD' tag, then the subject is classified to ASD(哪里来的实例??MLP输出不是2吗不就是一个概率吗怎么还有多头了呢

        ④Learning rate: 1e-5 with 1/2 decline each 20 epoch and stop at the 50-th epoch

        ⑤Optimizer: Adam

        ⑥Steps of each epoch: 60(这是什么玩意儿?

        ⑦Batch size: 60

        ⑧Local uptating on each epoch: rely on communication pace \tau这又是什么玩意儿?是训练每\tau次就去服务器更新一下吗?然后这个玩意儿是60的因数是吗?

        ⑨\tau ablation:

no significant difference

          ⑩Accuracy when adding different noise in Gauss mechanism (L2 norm and \varepsilon_{n}{\sim}N(0,\alpha\sigma)):

the variable is \alpha

        ⑪Adding Laplace noise \varepsilon_{n}\sim Lap\left(\alpha\sigma/\sqrt{2}\right) to local weight and varying \alpha:

2.5.3. Comparisons with different strategies

        ①Evaluation methods:

        ②Comparison result:

2.5.4. Evaluate model from interpretation perspective

(1)Aligned feature embedding

        ①Visualizing fully connected layer embedding:

(2)MoE Gating value

        ①Gate value in different sites:

(gate value是learnable的参数)(哥们儿怎么在画图啊不能化成三维吗后面的都被挡住了)

(3)Neural patterns: Connectivity in the autistic brain

        ①They define "informativity" as functional representation difference between ASD and HC groups and "robustness" as the biomarker consistency of 4 sites

        ②They applied guided back-propagation method to detect the robust biomarkers of HC in Fed:

        ③ASD biomarker:

        ④Function correlation:

2.5.5. Limitation and discussion

        ①“尽管根据我们的实证调查,控制局部和全局模型更新权重信息频率的通信速度不会影响分类性能,但我们不能得出速度参数无关紧要的结论”

        ②Sensitivity of deep learning is hard to define

2.6. Conclusion

        ~

3. 知识补充

3.1. Differential privacy

(1)定义:差分隐私是一种数学框架,用于量化在数据发布或算法处理过程中保护个人隐私的程度。它通过在数据中引入随机性来确保即使数据被公开或分析,也无法识别出任何特定个体的信息。具体来说,差分隐私要求对于两个仅相差一个数据点的数据集(即邻接数据集),其查询结果应当具有相近的概率分布,从而无法通过观察查询结果来推断出单个数据点的存在与否。

(2)感觉就是添加噪声,使得单个数据难以被辨识。不过我感觉传神经网络参数为什么能计算出单个人的信息呢?

(3)参考学习:全局敏感度,局部敏感度和平滑敏感度到底有什么区别?【差分隐私】_全局影响度-CSDN博客

3.2. L1 sensitivity

(1)定义:L1灵敏度通常指的是在L1范数(也称为曼哈顿距离或绝对值和)意义下,某个参数或系统对输入变化的敏感度。它衡量的是当输入发生微小变化时,输出在L1范数下的变化量。

(2)计算:对于单个向量来说,L1范数(L1 norm)是指向量中各个元素绝对值之和;对于两个向量来说,是每个对应元素相减的加总:

4. Reference

Li, X. et al. (2020) 'Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results', Medical Image Analysis, 65. doi: https://doi.org/10.1016/j.media.2020.101765

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值