Paper reading (五十八):Phylogenetic convolutional neural networks in metagenomics

论文题目:Phylogenetic convolutional neural networks in metagenomics

scholar 引用:21

页数:13

发表时间:2018.03

发表刊物:BMC Bioinformatics

作者:Diego Fioravanti, Ylenia Giarratano, ..., Cesare Furlanello 

摘要:

Background
Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space.
Results
Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron.
Conclusion
Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.

正文组织架构:

1. Background

2. Methods

2.1 Ph-CNN

2.2 Experimental setup

2.3 The IBD dataset

2.4 The synthetic datasets

3. Results and discussion

4.  Conclusions

正文部分内容摘录:

1. Biological Problem: What biological problems have been solved in this paper?

  • classification of metagenomics data
  • Operational Taxonomic Units (OTU) 操作分类单元
  • patristic distance, i.e., the sum of the lengths of all branches connecting two OTUs on the phylogenetic tree

2. Main discoveries: What is the main discoveries in this paper?

  • Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses.
  • Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron.

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

  • Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure.
  • the choice of the software and its version in the whole metagenomic pipeline play a critical role
  • Ph-CNN consists of a stack of Phylo-Conv layers first flattened then terminating with a Fully Connected (Dense) and a final classification layer. 
  • We demonstrate Ph-CNN characteristics with experiments on both synthetic and real omics data.
  • real dataset:Sokol’s lab data [28] of microbiome information for 38 healthy subjects (HS) and 222 inflammatory bowel disease (IBD) patients. The bacterial composition was analysed using 16S sequencing and a total number of 306 different OTUs was found.
  •  the direct use of Ph-CNN on the IBD dataset leads to overfitting after few epochs due to the small sample size, the IBD dataset is used in a transfer learning (domain adaptation) task.

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

  • Extensions of the Ph-CNN architecture are addressing the testing of different tree distances, optimization of neighbours detection and of the number of Phylo-Conv layers.
  • On both data types, the Ph-CNN architecture than compared with state-of-art shallow algorithms as Support Vector Machines (SVMs) and Random Forest (RF), and with alternative neural networks methods such as Multi-Layer Perceptron (MLPNN).

5. Biological Significance: What is the biological significance of these ML methods’ results?

  • Model performance is computed for increasing number of best ranking features by Matthews Correlation Coefficient (MCC), the measure that better convey in an unique value the confusion matrix of a classification task, even in the multiclass case

6. Prospect: What are the potential applications of these machine learning methods in biological science?

  • Ph-CNN represents a novel deep learning approach for the classification of metagenomics data.
  • the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.
  • different feature selection algorithms, either generic or DL-specific can be adopted
  • Improvements are expected on the transfer learning and domain adaptation procedures, such as learning on synthetic data and testing on metagenomics, and applying to larger datasets.
  • Ph-CNN is a general purpose algorithm, whose use can be extended to other data for which the concept of nearest features can be defined
  • As an example, we are currently investigating the transcriptomics case
  • the metagenomics and transcriptomics case represent just the first steps towards a more general strategy for effectively exploiting the potential of CNNs, especially for omics data.
  • Ph-CCN can be applied to every metagenomics datasets whose features are associated to a taxonomy and thus to a tree structure, as in the case of metagenomics of relatively large eukaryotes now appearing in the literature 

7. Mine Question(Optional)

没有特别看出来优点是啥?Ph-CNN和LSVM感觉不相上下?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值