笔记 | 多中心MRI数据矫正

最新推荐文章于 2024-11-09 21:37:15 发布

懒麻蛇

最新推荐文章于 2024-11-09 21:37:15 发布

阅读量3.7k

点赞数 14

文章标签： python 人工智能机器学习大数据 java

本文链接：https://blog.csdn.net/lazysnake666/article/details/122404574

版权

多中心合作的一个优势是可以克服小样本underpower的问题，同时也解决了大样本数据采集成本过高的问题。如今，多中心合作正成为Neuroimaging领域研究的一个趋势，比如ENIGMA, ABCD或者IMAGEN的数据，都是多中心的数据。这也使得多中心数据site/scanner effect的矫正成为现在研究的热点。本文提供一个大概的overview，推荐ComBat。

一些多中心的数据比如IMAGEN或者ABCD，在设计之初就已经考虑到了不同scanner之前的差异，所以在确定不同中心数据采集参数的时候已经做了优化，保证不同scanner带来的影响最小。Site effect的问题对于ENIGMA的数据似乎影响更大，但最近就有ABCD的文章表明即使使用了优化的采集参数，site effect依然存在，而且矫正的好可以有效提高显著性。

回归

对于Site effect，常见的方法便是把site作为fixed effect放入group analysis中。这可能是最简单的方式，用SPM就行。

Brain_data ～ sex+age+other_cov+dummy_site

另一个常见的方式就是把它site作为random effect放入LME中。不过听组里的同事说只有AFNI才可以添加random effect。用R分析了IMGAGEN的数据，发现这两种方式没有明显的区别。

Brain_data ～ sex+age+other_cov+(1|site）

Combat

2007年Johnson et al. 最早提出运用于Gene Expression Microarray的数据，原文已被引用3725次。Combat的意思是Combining Batches

2017年Fortin et al.将Combat的方法运用到了两个site的DTI的数据上，比较了Scaling/Funorm/Ravel/SVA/Combat几种方法，结果表明:

ComBat both preserves biological variability and removes the unwanted variation introduced by site.

2018年，Fortin et al.将此方法运用到了11 个中心的Cortical thickness数据上比较了Combat/Residuals/Phenotype-adjusted Residuals，而且还尝试了combine不同年龄组的数据结果表明：

ComBat removes unwanted sources of scan variability while simultaneously increasing the power and reproducibility of subsequent statistical analyses. We also show that ComBat can be used to combine datasets across multiple sites for the study of life-span trajectories.

2020年，Radua et al.将此方法运用到了来自33个中心的ENIGMA-schizophrenia数据上（其实在这之前，ENIGMA的文章就已经开始使用combat）结果表明:

The findings were more similar when comparing ComBat with mixed-effects mega-analysis, although ComBat still slightly increased the statistical significance. ComBat also showed increased statistical power when we repeated the analyses with fewer sites.

（图片来源：Radua, et. al. 2020 Neuroimage)

上面两个图说的是Combat没有改变effect size，下面两个图说的是Combat提高了统计效力，但是相对于LME的Mega-analysis提高的并不明显。有意思的是作者还提供了据说更好的Combat代码，可以搞定missing value，而且在机器学习中，可以在先在training set上 fit，在testing set上transform。这个功能其实有点鸡肋，因为它要求training和testing的site数目是相同的。

为什么需要fit和transform分开

懒麻蛇，公众号：懒麻蛇迷思 | Neuroimager使用Machine learning怎么避免被diss

此外文章还比较了分别矫正Cortical thickness/Surface Area/Subcrotial Volume和一起矫正的差异，结果表明影响不大。

Results were nearly identical when we applied the ComBat harmonization separately for cortical thickness, cortical surface area and subcortical volumes.

此时此刻的你是否同样想到，好像functional的数据还没有人做。

Nielson et al. 2018年就在ABCD的数据比较了scanner对于activation和functional connectivity的影响，并且发现ComBat对于消除site effect效果很好。2019还在OHBM上报告过。

We further demonstrate that these differences can be harmonized using an empirical Bayes approach known as ComBat. We argue that accounting for scanner variance, including even minor differences in scanner hardware or software, is crucial for any analysis.

之前做的都是cross-sectional的数据，是不是觉得好像还有longitudinal没有做？Nielson等人已经在osf上preregister了ABCD release 2的idea，估计做的就是longitudinal的分析。

ComBat怎么用

Leek, et. al. 2012年开发了一个基于R的SVA，里面有详细教程
Fortin的Github有R/matlab/python的代码
Radua et. al. 2020也提供了新的基于R的function

最简单的方法是看SVA的教程，大概3行代码搞定，顺便还可以学一下SVA

当然也可以选择从代码入手，搞懂经验贝叶斯是怎么起作用的，推荐阅读Fortin的Matlab代码。FYI要读代码搞懂整个过程的你，看代码时发现有个地方和Johnson 2007文章中的公式不一样，结果那个是一个typo，代码没问题。

其中第一个公式V要带平方，亲自解方程验证是文章里的一个typo。。。。

其他的方法

ComBat的效果似乎是比较convincing的，而且非常容易实现，推荐ComBat。除了ComBat，还有一些其他的方法可以考虑，比如：

Hierarchial Bayes的方法

作者认为

CovBat

基于深度学习的方法

—END—

Reference

Johnson, W. E., Li, C., & Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8(1), 118-127.

Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E., & Storey, J. D. (2012). The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics, 28(6), 882-883.

Fortin, J. P., Parker, D., Tunç, B., Watanabe, T., Elliott, M. A., Ruparel, K., ... & Schultz, R. T. (2017). Harmonization of multi-site diffusion tensor imaging data. Neuroimage, 161, 149-170.

Fortin, J. P., Cullen, N., Sheline, Y. I., Taylor, W. D., Aselcioglu, I., Cook, P. A., ... & McInnis, M. (2018). Harmonization of cortical thickness measurements across scanners and sites. Neuroimage, 167, 104-120.

Radua, J., Vieta, E., Shinohara, R., Kochunov, P., Quidé, Y., Green, M., ... & Nenadic, I. (2020). Increased power by harmonizing structural MRI site differences with the ComBat batch adjustment method in ENIGMA. NeuroImage, 116956.

Nielson, D. M., Pereira, F., Zheng, C. Y., Migineishvili, N., Lee, J. A., Thomas, A. G., & Bandettini, P. A. (2018). Detecting and harmonizing scanner differences in the ABCD study-annual release 1.0. BioRxiv, 309260.