Paper reading (八十三):Composition Analysis and FS of the Oral Microbiota Associated

论文题目:Composition Analysis and Feature Selection of the Oral Microbiota Associated with Periodontal Disease

scholar 引用:4

页数:14

发表时间:2018.11

发表刊物:BioMed Research International

作者:Wen-Pei Chen, Shih-Hao Chang, ...,Yaw-Ling Lin

摘要:

Periodontitis is an inflammatory disease involving complex interactions between oral microorganisms and the host immune response. Understanding the structure of the microbiota community associated with periodontitis is essential for improving classifications and diagnoses of various types of periodontal diseases and will facilitate clinical decision-making. In this study, we used a 16S rRNA metagenomics approach to investigate and compare the compositions of the microbiota communities from 76 subgingival plagues samples, including 26 from healthy individuals and 50 from patients with periodontitis. Furthermore, we propose a novel feature selection algorithm for selecting features with more information from many variables with a combination of these features and machine learning methods were used to construct prediction models for predicting the health status of patients with periodontal disease. We identified a total of 12 phyla, 124 genera, and 355 species and observed differences between health- and periodontitis-associated bacterial communities at all phylogenetic levels. We discovered that the genera Porphyromonas, Treponema, Tannerella, Filifactor, and Aggregatibacter were more abundant in patients with periodontal disease, whereas Streptococcus, Haemophilus, Capnocytophaga, Gemella, Campylobacter, and Granulicatella were found at higher levels in healthy controls. Using our feature selection algorithm, random forests performed better in terms of predictive power than other methods and consumed the least amount of computational time.

正文组织架构:

 

正文部分内容摘录:

1. Biological Problem: What biological problems have been solved in this paper?

  •  

2. Main discoveries: What is the main discoveries in this paper?

  • we propose a novel feature selection algorithm for selecting features with more information from many variables with a combination of these features and machine learning methods were used to construct prediction models for predicting the health status of patients with periodontal disease.
  • We identified a total of 12 phyla, 124 genera, and 355 species and observed differences between health- and periodontitis-associated bacterial communities at all phylogenetic levels.
  • Using our feature selection algorithm, random forests performed better in terms of predictive power than other methods and consumed the least amount of computational time.

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

  • 76 subgingival plagues samples, including 26 from healthy individuals and 50 from patients with periodontitis.
  • In this study, we proposed a method of feature selection for selecting the informative microbes to predict whether an individual suffered from periodontal disease. First, the microbes present at less than 0.5% relative abundance in all samples were ignored, and nonparametric Kruskal−Wallis tests were used to detect microorganisms with significantly differential abundance between healthy patients and patients with periodontal disease. Microbes with more significant differential scores were considered features with more information. Then, the prioritized feature combination-generated algorithm shown in Algorithm 1 was adopted to produce the feature combinations composed by these more informative features.
  • In prioritized order, the feature combinations were applied to build classifiers with machine learning algorithms, such as deep learning, support vector machine (SVM), random forests, and logistic regression. We picked 80% of samples from both healthy and disease cases to train the prediction model, and the remaining cases were used for testing. The prediction ability of each feature combination was evaluated by calculating the average accuracy from 10 predictions with different training and testing sample sets. Here, we selected 10 of the most significant features having p values between 3.27E-11 and 7.77E-9. In total, 1,023 feature combinations were evaluated for their prediction ability using deep learning, SVM, random forest, and logistic regression methods. These machine learning algorithms were supported by the R packages H2O, e1071, randomForest, and stats, respectively. We considered the radial basis function kernel for SVM. Parameters for each machine learning algorithm were tuned using grid search, and the parameters that obtained better accuracy were adopted for training prediction models.

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

  •  

5. Biological Significance: What is the biological significance of these ML methods’ results?

  •  

6. Prospect: What are the potential applications of these machine learning methods in biological science?

  •  

7. Mine Question(Optional)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值