Paper reading (三):Machine Learning for integrating data in biology and medicine

论文题目:Machine Learning for integrating data in biology and medicine: Principles, practice, and opportunities

scholar 引用:26

页数:21

发表时间:2019.01

发表刊物:Information Fusion 50 (2019)

作者:Marinka Zitnik a , ∗ , Francis Nguyen b , c , Bo Wang d , Jure Leskovec a , e , ∗ , Anna Goldenberg f , g , h , ∗ , Michael M. Hoffman b , c , g , h , ∗ Department of Computer Science, Stanford University, Stanford, CA, USA ,etc

摘要:Keywords: Computational biology, Personalized medicine, Systems biology, Heterogeneous data, Machine learning

  • The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. 
  • In this Review, we describe the principles of data integration and discuss current methods and available implementations.
  • We provide examples of successful data integration in biology and medicine. 
  • we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.

结论:

  • no single method will perform best for all problems. 
  • Approaches thus need to be selected according to different types of domain-specific models, specific types of data, and different types of biomedical outcomes. 
  • In this Review, we described various approaches that can currently be implemented to perform powerful integrative analyses.
  • As integrative approaches become more readily available, systems biology and systems medicine are likely to become a central computational strategy to generate new knowledge in biology and medicine.

Introduction:

In this Review, we describe the principles of data integration, and provide a taxonomy of machine learning methods presently in use to integrate biomedical data. We discuss current methods, implementations of these methods, and their successful applications in biology and medicine. Furthermore, we discuss challenges in optimally combining and interpreting data from multiple sources and the advantages of integrating multiple data types. For example, one technology may address shortcomings of another to provide a more precise insight into human disease. In addition, we provide our perspective on how integrative data analysis might develop in the future.

正文组织架构:

2. Challenges in data integration for biology and medicine
3. Conceptual organization of methods for data integration
4. Focus of this Review
5. Epigenomic variation and gene regulation

  1. Semi-automated genome annotation
  2. Transcription factor binding site prediction
  3. Topologically associated domain prediction
  4. Histone modification and DNA methylation prediction

6. Noncoding variant effects
7. Integrative single-cell analysis

  1. Cell type discovery and exploration
  2. Single-cell multi-omics analysis
  3. Large-scale single-cell bioinformatics

8. Cellular phenotype and function

  1. Protein function prediction
  2. Protein-protein interaction prediction

9. Computational pharmacology

  1. Drug-target interaction prediction
  2. Drug-drug interaction and drug combination prediction
  3. Drug repurposing

10. Disease subtyping and biomarker discovery
11. Challenges and future directions

  1. Combining mixed-technology data
  2. Multi-scale and higher-order approaches
  3. Interpretability and explainability
  4. Integration of self-reported, lifestyle, and ecological data

正文部分内容摘录:

  • 2. Challenges in data integration for biology and medicine
    (1)it is especially challenging to deploy machine learning systems to support decision making in risk-sensitive discovery and clinical practice
    (2)It is thus critical to integrate diverse sources of information to gain a comprehensive understanding of biology and medicine.
  • 3. Conceptual organization of methods for data integration
    data integration methods:
    (1)vertical data integration
    (2)horizontal data integration
    the methods implement data integration:
    (1)Early integration 大概意思就是比如说现在有两张数据table,然后直接合并,直接当做machine learning的输入,然后开始训模型。这个方法会有用,因为机器学习按理可以学习到任何特征之间的依赖关系。常用的模型就是automatic feature learning, such as dimensionality reduction and representation learning,then combine these low-dimensional representations through concatenation or other simple aggregation techniques
    (2)intermediate integration
    a model, such as multiple kernel learning, collective matrix factorization  or deep neural network
    requires development of a new algorithm
    (3)late integration
     a first-level model is built for each dataset or data type independently. These first-level models are then combined by training a second-level model that uses predictions of the first-level models as features or via a meta-predictor that takes a majority vote or combines prediction weights of the first-level models.
  • 4. Focus of this Review
    (1).列举了很多其他的review的侧重点;
    (2).In this Review, we survey advances in data integration at multiple biomedical levels.
    (3).介绍了正文后面各个section的主要内容。
  • 5. Epigenomic variation and gene regulation
    (1)Since epigenomic data might bear only an indirect connection to biological phenomena of interest, machine learning appeals as an aid for interpretation. 
    (2)提到了隐马尔科夫模型
    (3)random forest
    (4)FactorNet
  • 6. Noncoding variant effects
    (1)support vector machine (SVM) 
    (2)deep learning
  • 7. Integrative single-cell analysis
    (1) dimension-reduction techniques
    (2)multiple clustering results
  • one needs to discover not only information shared across various omics data but also complementary signals that are specific to a particular omics data type
  • we highlight outstanding problems and opportunities that need to be addressed to fully realize the potential of machine learning for integrating biomedical data.
  • 8.1. Protein function prediction
    (1) unsupervised similarity-based methods using a principle that similar proteins share similar functions
    (2) supervised methods using a classification of protein functions in the Gene Ontology
  • Bayesian latent factor models
  • kernelized matrix factorization
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值