Paper reading (六十五):Kernel-penalized regression for analysis of microbiome data

论文题目:Kernel-penalized regression for analysis of microbiome data

scholar 引用:15

页数:29

发表时间:2018.03

发表刊物:Institute of Mathematical Statistics

作者:Timothy W. Randolph, Sen Zhao, ..., and Ali Shojaie

摘要:

The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxon-specific associations with a phenotype or clinical outcome. Further, we show how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant. We illustrate this approach with several simulations using data from two recent studies on gut and vaginal microbiomes. We conclude with an application to our own data, where we also incorporate a significance test for the estimated coefficients that represent associations between microbial abundance and a percent fat.

正文组织架构:

1. Introduction

2. Kernel Penalized Regression for Microbiome Data

2.1 Background for PCoA and principal component regression

2.2 Penalized regression and DPCoA

2.3 Kernel-based regression with two kernels

2.4 Regression with compositional data

3.  Numerical Experiments

3.1 Regression and DPCoA

3.2 Regression and PCoA with respect to a UniFrac kernel

3.3 Regression and PCoA using an edge-matrix kernel

4. Application to an observational study

5. Discussion

正文部分内容摘录:

1. Biological Problem: What biological problems have been solved in this paper?

  • The analysis of human microbiome data

2. Main discoveries: What is the main discoveries in this paper?

  • use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxon-specific associations with a phenotype or clinical outcome.
  • how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant.
  • An interesting feature of the proposed kernel-penalized regression framework is its ability to sidestep some of the problems inherent in compositional data analysis. 

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

  • describe a framework of high-dimensional regression models that extends these distance-based methods.
  • A primary motivation for PCoA graphical displays is the ability to incorporate biologically-inclined measures of (dis)similarity. 
  • 提出的方法:kernel penalized regression
  • We show how phylogenetic and other structure can be incorporated via kernel penalized regression in either the primal (p-dimensional) feature space or the dual (n-dimensional) samples space
  • 以前的方法:PCoA?standard (Euclidean-based) statistical models
  • dataset:We apply our kernel-penalized regression framework to data from 16S rRNA gene collected in a study of premenopausal women (Hullar et al., 2015). This study investigated aspects of gut microbial communities in stool samples from premenopausal women using 454 pyrosequencing of the 16S rRNA gene. The abundances of 127 species were zero for more than 90% of the subjects and were removed from our analysis. The data set we consider consists of p = 128 species sampled from n = 102 women.

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

  • traditional methods: dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each samplePrincipal coordinate analysis
  • none of these analyses proceed to estimate the individual associations
  •  In contrast, we focus on estimating the coefficient vector, which is a key aspect of any approach used to draw scientific conclusions based on the association of microbial communities with an outcome or phenotype.
  • Our approach, which differs somewhat from that of Li (2015), may also be viewed as a penalized version of the low-dimensional linear model for compositions by Tolosana-Delgado and Van Den Boogart (2011), who use the isometric log-ratio (ILR) coordinates. 
  • for addressing well-known problems that arise from applying standard (Euclidean-based) statistical models to compositional data

5. Biological Significance: What is the biological significance of these ML methods’ results?

  • In this analysis, we obtain estimates of associations between microbial species and percent fat measured in premenopausal women, and also provide inference for these estimates by applying a recent significance test in our kernel-penalized regression (KPR) framework.

6. Prospect: What are the potential applications of these machine learning methods in biological science?

  • the proposed framework also allows us to use existing inference frameworks for high-dimensional regression, and in particular the Grace test (Zhao and Shojaie, 2016), to assess the significance of estimated regression coefficients.
使用优化算法,以优化VMD算法的惩罚因子惩罚因子 (α) 和分解层数 (K)。 1、将量子粒子群优化(QPSO)算法与变分模态分解(VMD)算法结合 VMD算法背景: VMD算法是一种自适应信号分解算法,主要用于分解信号为不同频率带宽的模态。 VMD的关键参数包括: 惩罚因子 α:控制带宽的限制。 分解层数 K:决定分解出的模态数。 QPSO算法背景: 量子粒子群优化(QPSO)是一种基于粒子群优化(PSO)的一种改进算法,通过量子行为模型增强全局搜索能力。 QPSO通过粒子的量子行为使其在搜索空间中不受位置限制,从而提高算法的收敛速度与全局优化能力。 任务: 使用QPSO优化VMD中的惩罚因子 α 和分解层数 K,以获得信号分解的最佳效果。 计划: 定义适应度函数:适应度函数根据VMD分解的效果来定义,通常使用重构信号的误差(例如均方误差、交叉熵等)来衡量分解的质量。 初始化QPSO粒子:定义粒子的位置和速度,表示 α 和 K 两个参数。初始化时需要在一个合理的范围内为每个粒子分配初始位置。 执行VMD分解:对每一组 α 和 K 参数,运行VMD算法分解信号。 更新QPSO粒子:使用QPSO算法更新粒子的状态,根据适应度函数调整粒子的搜索方向和位置。 迭代求解:重复QPSO的粒子更新步骤,直到满足终止条件(如适应度函数达到设定阈值,或最大迭代次数)。 输出优化结果:最终,QPSO算法会返回一个优化的 α 和 K,从而使VMD分解效果最佳。 2、将极光粒子(PLO)算法与变分模态分解(VMD)算法结合 PLO的优点与适用性 强大的全局搜索能力:PLO通过模拟极光粒子的运动,能够更高效地探索复杂的多峰优化问题,避免陷入局部最优。 鲁棒性强:PLO在面对高维、多模态问题时有较好的适应性,因此适合海上风电时间序列这种非线性、多噪声的数据。 应用场景:PLO适合用于优化VMD参数(α 和 K),并将其用于风电时间序列的预测任务。 进一步优化的建议 a. 实现更细致的PLO更新策略,优化极光粒子的运动模型。 b. 将PLO优化后的VMD应用于真实的海上风电数据,结合LSTM或XGBoost等模型进行风电功率预测。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值