Paper reading (五十六):Using convolutional neural networks to explore the microbiome

论文题目:Using convolutional neural networks to explore the microbiome

scholar 引用:10

页数:4

发表时间:2017.07

发表刊物:IEEE Engineering in Medicine and Biology Society (EMBC)

作者:Derek Reiman ; Ahmed Metwally ; Yang Dai

摘要:

The microbiome has been shown to have an impact on the development of various diseases in the host. Being able to make an accurate prediction of the phenotype of a genomic sample based on its microbial taxonomic abundance profile is an important problem for personalized medicine. In this paper, we examine the potential of using a deep learning framework, a convolutional neural network (CNN), for such a prediction. To facilitate the CNN learning, we explore the structure of abundance profiles by creating the phylogenetic tree and by designing a scheme to embed the tree to a matrix that retains the spatial relationship of nodes in the tree and their quantitative characteristics. The proposed CNN framework is highly accurate, achieving a 99.47% of accuracy based on the evaluation on a dataset 1967 samples of three phenotypes. Our result demonstrated the feasibility and promising aspect of CNN in the classification of sample phenotype.

正文组织架构:

1. Introduction

2. Models and Methods

2.1 The Phylogenetic Tree

2.2 Populating the Phylogenetic Tree

2.3 Matrix Construction

2.4 The CNN Architecture

2.5 Dataset

2.6 Experimental Procedure

3. Results

4. Discussion

5. Conclusion

正文部分内容摘录:

1. Biological Problem: What biological problems have been solved in this paper?

  • prediction of the phenotype of a genomic sample

2. Main discoveries: What is the main discoveries in this paper?

  • The proposed CNN framework is highly accurate, achieving a 99.47% of accuracy based on the evaluation on a dataset 1967 samples of three phenotypes.
  • We developed a CNN model for classification of a microbiome sample based on its microbial taxonomic abundance profile.

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

  • CNN
  • This tree is further populated with the observed microbial abundance of taxa for individual samples and then is embedded in a 2D matrix which preserves most of the spatial relationship between the nodes in the tree. These matrices are input to our CNN model. 
  • Our CNN model has three convolutional layers. The tangent hyperbolic activation function and max pooling subsampling were used in each layer. The numbers of feature maps were 20, 40, and 60 respective to the layers. Then we used a fully connected layer of 100 neurons with the tangent hyperbolic activation function, and lastly a softmax layer with three output neurons (skin, gut, and oral cavity). The desired outputs from the ground truth were constructed as a 1 for that output neuron and zeroes in the other two.

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

  • Since microbial taxonomic abundance profiles imply structure information, we take advantage of the CNN modeling approach to explore this structure by constructing a phylogenetic tree. 
  •  Alternative approaches based on predictive models have been proposed using Random Forest Classifier (RFC) and Support Vector Machine (SVM) [4]. The difficulty in establishing these prediction models is the selection of features relative to the phenotypical response from a large number of microbial taxa.
  • Identifying the structure between taxa and preserving the spatial relationship are key to the effectiveness of the CNN. 

5. Biological Significance: What is the biological significance of these ML methods’ results?

  • The proposed CNN model exploits the topological structure of the phylogenetic tree constructed from the abundance data. This approach can be readily applicable to the microbial from the whole genome shotgun sequencing study. 
  • Our tree embedding scheme can be thought of using a rectangular filter sliding through the phylogenetic tree to observe multiple clades as well as a combination of clades and children of other clades.
  • The created matrix contains spatial information between nodes based on their locations in the tree. 

6. Prospect: What are the potential applications of these machine learning methods in biological science?

  • there may be a misalignment of parents and children. Exploring other methods of the matrix representation of the phylogenetic tree may allow the CNN to find more relevant patterns for optimized performance.
  • how to extract feature maps predictive to phenotype from convolution layers is yet to be solved
  • The future direction is the development of an effective way to extract and interpret the predictive feature maps learned from the CNN in order to reveal the biological relevance to the host phenotype.

7. Mine Question(Optional)

Accuracy of CNN and other previously studied methods (the numbers in parenthesis is the number of neurons) 只评价accuracy,是不是不够啊?

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值