Paper reading (十八)：Machine learning applications in genetics and genomics

最新推荐文章于 2021-11-20 15:27:30 发布

盲人骑瞎马5555

最新推荐文章于 2021-11-20 15:27:30 发布

阅读量3.2k

点赞数 1

分类专栏： Paper Reading 文章标签： machine learning genetics

本文链接：https://blog.csdn.net/wxw060709/article/details/101534539

版权

Paper Reading 专栏收录该内容

133 篇文章 9 订阅

订阅专栏

论文题目：Machine learning applications in genetics and genomics

scholar 引用：528

页数：12

发表时间：2015.05

发表刊物：nature REVIEWS genetics

作者：Maxwell W. Libbrecht and William Stafford Noble

摘要：

The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.

结论：

genomics通过一些大的projects获取了大量的数据，而机器学习在处理large，complex datasets方面的能力非常突出，那么将machine learning应用于genomics是必然趋势。
简单粗暴的将machine learning的方法应用到genomics肯定没啥好结果。一般来说，算法用得好需要ML领域和相应的专业领域的理论和实践知识结合。
both machine learning itself and scientists proficient in these applications are likely to become increasingly important to advancing genetics and genomics.

Introduction：

ML对于large genomic datasets的interpretation很有用，在annotate a wide variety of genomic sequence elements也有应用。
As well as learning to recognize patterns in DNA sequences, machine learning algorithms can use input data generated by other genomic assays。
Machine learning applications have also been exten- sively used to assign functional annotations to genes.
a wide variety of machine learning methods have been developed to help to understand the mecha- nisms underlying gene expression.
machine learning researchers have tended to focus on a subset of prob- lems within statistics, emphasizing in particular the analysis of large heterogeneous data sets.
we begin by explaining several key distinctions in the main types of machine learning and then outlining some of the major challenges in applying machine learning methods to practical problems in genomics.

正文组织架构：

1. Introduction

2. Stages of machine learning

3. Supervised versus unsupervised learning

4. Generative versus discriminative modeling

5. Incorporating prior knowlege

6. Handling haterogeneous data

7. Feature selection

8. Imbalanced class sizes

9. Handling missing data

10. Modelling dependence among examples

11. Conclusions

正文部分内容摘录：