论文题目:Gene expression profiling gut microbiota in different races of humans
scholar 引用:57
页数:11
发表时间:2016.03
发表刊物:Science Reports
作者:Lei Chen, Yu-Hang Zhang, Tao Huang & Yu-Dong Cai
摘要:
The gut microbiome is shaped and modified by the polymorphisms of microorganisms in the intestinal tract. Its composition shows strong individual specificity and may play a crucial role in the human digestive system and metabolism. In the present study, we are studying the gut microbiomes of three different races, including individuals of Asian, European and American races. The gut microbiome And the expression levels of gut microbiome genes were analyzed in these individuals. Advanced feature selection methods (minimum redundancy maximum relevance and incremental feature selection) and four machine-learning algorithms (random forest, nearest neighbor algorithm, sequential minimal optimization, Dagging) were employed To capture key differentially expressed genes. As a result, seq Uential minimal optimization was found to yield the best performance using the 454 genes, which could effectively distinguish the gut microbiomes of different races. Our analyses of extracted genes support the widely accepted hypotheses that eating habits, living environments and metabolic levels in different races can influence The characteristics of the gut microbiome.
正文组织架构:
1. Introduction
2. Materials and methods
2.1 Materials
2.2 mRMR method
2.3 Machine-learning algorithm
2.4 Cross-validation method
2.5 Accuracy measurement
2.6 The IFS method
3. Results
3.1 Findings of the mRMR method
3.2 Findings of the IFS method
4. Discussion
5. Conclusion
正文部分内容摘录:
1. Biological Problem: What biological problems have been solved in this paper?
- identify key differentially expressed genes
2. Main discoveries: What is the main discoveries in this paper?
- seq Uential minimal optimization was found to yield the best performance using the 454 genes, which could effectively distinguish the gut microbiomes of different races.
- Our analyses of extracted genes support the widely accepted hypotheses that eating habits, living environments and metabolic levels in different races can influence The characteristics of the gut microbiome.
3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?
- 性能最佳分类器sequential minimal optimization (SMO):SMO is a type of support vector machine (SVM) trained by the John Platt’s sequential minimal optimization algorithm.
- Advanced feature selection methods (minimum redundancy maximum relevance and incremental feature selection) and four machine-learning algorithms (random forest, nearest neighbor algorithm, sequential minimal optimization, Dagging) were employed To capture key differentially expressed genes.
- the SMO was the best one to identify key differentially expressed genes that may represent optimal functional genes that could reflect differences among different races.
- Dataset: The expression levels of 9,879,896 gut microbial genes in 1,267 samples of three different races, which included 139 Americans, 368 Chinese and 760 Europeans14, were retrieved from http://meta.genomics.cn/metagene/meta/dataTools.
- The mRMR method, which was proposed by Peng et al.15, is a popular feature selection method that has been widely applied to the analysis of various biological problems一种流行的特征提取方法
- or a given dataset, two feature lists can be produced using the mRMR method—the MaxRel and mRMR feature lists. 特征提取结果,是两个特征list
- ten-fold cross-validation method
- incremental feature selection (IFS))
- The IFS method uses the mRMR feature list and a basic machine-learning algorithm (e.g., the random forest, SMO, etc.) to extract an optimal combination of features and to build an optimal prediction model.
4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?
- The optimization problem of this type of SVM is always broken into a series of the smallest possible sub-problems. And they are solved analytically. Similar to ordinary SVM, pairwise coupling was applied to tackle multi-class problems.
5. Biological Significance: What is the biological significance of these ML methods’ results?
- To evaluate the performance of a certain prediction model, we can calculate the accuracies for three classes and overall prediction accuracy.
- the European samples were more than five times as many as the American samples. For a two-class classification problem, the Matthews’s correlation coefficient (MCC)22 is always used to evaluate the performance of a prediction model because it is a balanced measure even if the classes are of very different sizes.
- For each set Fi and a prediction engine, we calculated the MCC, overall prediction accuracy, and the accuracy for each of the three races.
6. Prospect: What are the potential applications of these machine learning methods in biological science?
- We hope that the new findings presented in this study may yield new insights into studies of the gut microbiome.
7. Mine Question(Optional)
they were all executed using their default parameters.