scholar 引用:9
页数:12
发表时间:2018.01
发表刊物:American Society for Microbiology Journals
作者:Sumayah F. Rahmana, Matthew R. Olma, ...,Jillian F. Banfieldc
摘要:
Antibiotic resistance in pathogens is extensively studied, and yet little is known about how antibiotic resistance genes of typical gut bacteria influence microbiome dynamics. Here, we leveraged genomes from metagenomes to investigate how genes of the premature infant gut resistome correspond to the ability of bacteria to survive under certain environmental and clinical conditions. We found that formula feeding impacts the resistome. Random forest models corroborated by statistical tests revealed that the gut resistome of formula-fed infants is enriched in class D beta-lactamase genes. Interestingly, Clostridium difficile strains harboring this gene are at higher abundance in formula-fed infants than C. difficile strains lacking this gene. Organisms with genes for major facilitator superfamily drug efflux pumps have higher replication rates under all conditions, even in the absence of antibiotic therapy. Using a machine learning approach, we identified genes that are predictive of an organism’s direction of change in relative abundance after administration of vancomycin and cephalosporin antibiotics. The most accurate results were obtained by reducing annotated genomic data to five principal components classified by boosted decision trees. Among the genes involved in predicting whether an organism increased in relative abundance after treatment are those that encode subclass B2 beta-lactamases and transcriptional regulators of vancomycin resistance. This demonstrates that machine learning applied to genome-resolved metagenomics data can identify key genes for survival after antibiotics treatment and predict how organisms in the gut microbiome will respond to antibiotic administration.
IMPORTANCE
The process of reconstructing genomes from environmental sequence data (genome-resolved metagenomics) allows unique insight into microbial systems. We apply this technique to investigate how the antibiotic resistance genes of bacteria affect their ability to flourish in the gut under various conditions. Our analysis reveals that strain-level selection in formula-fed infants drives enrichment of beta-lactamase genes in the gut resistome. Using genomes from metagenomes, we built a machine learning model to predict how organisms in the gut microbial community respond to perturbation by antibiotics. This may eventually have clinical applications.
正文组织架构:
1. Introduction
2. Results and discussion
2.1 Antibiotic resistance of the premature infant microbiome 早产儿菌群的耐药性
2.2 Formula feeding influences the gut resistome through strain-level selection
2.3 Major facilitator superfamily (MFS) pumps are associated with increased replication
2.4 A model that predicts an organism’s response to vancomycin and cephalosporins
3. Materials and methods
3.1 Sample collection, sequencing, assembly, and gene prediction
3.2 Genome recovery and calculation of relative abundances
3.3 iRep calculation
3.3 Annotation
3.4 Statistical and computational analysis
3.5 Data availability
正文部分内容摘录:
1. Biological Problem: What biological problems have been solved in this paper?
- predicting whether an organism increased in relative abundance after treatment
- identify key genes for survival after antibiotics treatment and predict how organisms in the gut microbiome will respond to antibiotic administration.
- classify resistomes as belonging to either a formula-fed baby or a breast-fed baby
2. Main discoveries: What is the main discoveries in this paper?
- Our analysis reveals that strain-level selection in formula-fed infants drives enrichment of beta-lactamase genes in the gut resistome.
- Using genomes from metagenomes, we built a machine learning model to predict how organisms in the gut microbial community respond to perturbation by antibiotics.
- we used genome-resolved metagenomics coupled with statistical and machine learning approaches to investigate the gut resistome of 107 longitudinally sampled premature infants.
- We show that certain antibiotic resistance genes in particular genomes affect how clinical factors influence the gut microbiome and, in turn, how the antibiotic resistance capabilities of a gut organism influence its growth and relative abundance.
3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?
- Random forest models were used to classify resistomes as belonging to either a formula-fed baby or a breast-fed baby, and we used the feature importance scores of the trained models to select resistance genes for further study
- Principal-component analysis (PCA) was performed on Resfams and KEGG annotations to generate a low-dimensional representation of each organism’s metabolic potential and resistance potential. The first five principal components (PCs) cumulatively explained 48% of the variation in the data set.
- Using these PCs as input, the AdaBoost-SAMME algorithm was applied, with decision tree classifiers as base estimators. The model, trained on 70% of the data, performed extremely well on the validation set, with a precision value of 1.0 and a recall value of 1.0, indicating that every genome was correctly classified. Because the validation set was utilized for testing during the preliminary stages of model development, the model was also evaluated with a final test set, with which it achieved 0.9 precision and 0.7 recall.
- The dataset used was comprised of 597 previously reported samples (55–57) and 305 new samples. These samples are available at NCBI under accession number SRP114966. The code for the analysis, along with all the data and metadata used in the analysis, is hosted at https://github.com/SumayahR/antibiotic-resistance.
4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?
- Previous studies have utilized data from 16S rRNA gene amplicon sequencing or read-based metagenomics of the human microbiome to predict life events and disease states of the human host using machine learning or other modeling techniques.
- 当前问题的难点:read-based metagenomics lacks resolution at the genomic level, and, due to strain-level differences in antibiotic resistance , taxonomy data from marker gene studies cannot be used to predict how particular organisms in a community will respond to antibiotics.
- Using scikit-learn, development of a machine learning model to predict the direction of change in relative abundance for each genome based on its Resfams and KEGG metabolism data was attempted, and yet an adequate model could not be developed, presumably due to variations in the ways in which organisms respond to different antibiotic combinations.
5. Biological Significance: What is the biological significance of these ML methods’ results?
- Mann-Whitney U tests were performed on Resfams genes that had feature importance scores above 0.07 in the random forest model, as calculated by the Gini importance metric.
- the model that exhibited the best results with regard to precision and recall was selected.
6. Prospect: What are the potential applications of these machine learning methods in biological science?
- This may eventually have clinical applications.
- This has tremendous potential for application in the fields of medicine and microbial ecology.
- For example, such a model can be used before administering drugs to a patient to verify that a particular combination of antibiotics will not lead to overgrowth of an undesirable microbe.
- Our report serves as a proof of concept for this application of machine learning used in conjunction with genome-resolved metagenomics to derive biological insight.
7. Mine Question(Optional)