大数据集 Single SparCC 测试

最新推荐文章于 2024-08-03 20:42:19 发布

雪垆

最新推荐文章于 2024-08-03 20:42:19 发布

阅读量231

点赞数

分类专栏：宏基因组文章标签： pycharm python

本文链接：https://blog.csdn.net/qq_43611382/article/details/125142713

版权

宏基因组专栏收录该内容

6 篇文章 0 订阅

订阅专栏

〔五月初八｜芒种｜壬寅虎年丙午月庚寅日〕〔阴 30-24℃〕〔CNGB 深圳 • 大鹏〕

█ 大数据集 Single SparCC 测试

〉Paper Title 〉〉A randomized controlled trial for response of microbiome network to exercise and diet intervention in patients with nonalcoholic fatty liver disease
〉文章题目〉〉一项关于微生物组网络对非酒精性脂肪肝患者运动和饮食干预反应的随机对照试验
〉链接地址〉〉https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9091228/
〉项目地址〉〉https://github.com/crtsjtu/AELC

█ S=1500，D=600，B=1000

S：样本数量；D：物种/OTU数量；B：bootstraps，迭代次数。
结果是失败的；
▣ 在本地：win10，16GB

numpy.core._exceptions.MemoryError: Unable to allocate 4.02 GiB for an array with shape (1500, 600, 600) and data type float64

▣ 在集群投32GB，排队一天，运行时进程被kill。
▣ 在腾讯云32GB服务器，同样错误。
在这里插入图片描述

killed：
在这里插入图片描述

█ S=1500，D=300，B=100

调整数据大小试试。
跑通了：
在这里插入图片描述
但是预估迭代一轮需要14分钟，一个样本迭代100次需要23小时。1500个样本需要4年。
实际需要迭代1000次，OTU数量样本中可能不是那么多，但就算多线程处理，所需内存与时间也是不可接受的。

█ 两个SparCC快速版

▣ python版本：只找到了项目，没找到文章。也没说具体提升水准。
fast_sparCC
https://github.com/shafferm/fast_sparCC

▣ C++版本
FastSpar：FastSpar: rapid and scalable correlation estimation for compositional data
https://github.com/scwatts/FastSpar
文章中有具体介绍，内存与时间优化极大
在这里插入图片描述

▣ Single SparCC是结合了SSN与SparCC，
除了提升SparCC速度之外，还有许多其他方法比较优秀，但是内存与时间应该都没有优化。

CCLasso：
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4693003/

雪垆

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
大数据集 Single SparCC 测试

〔五月初八｜芒种｜壬寅虎年丙午月庚寅日〕〔阴 30-24℃〕〔CNGB 深圳 • 大鹏〕〉Paper Title 〉〉A randomized controlled trial for response of microbiome network to exercise and diet intervention in patients with nonalcoholic fatty liver disease〉文章题目〉〉一项关于微生物组网络对非酒精性脂肪肝患者运动和饮食干预反应的随机对照
复制链接

扫一扫

专栏目录