Paper reading (二十)：Deep Learning for Genomics: A Concise Overview

最新推荐文章于 2021-10-06 15:01:31 发布

盲人骑瞎马5555

最新推荐文章于 2021-10-06 15:01:31 发布

阅读量1k

点赞数

分类专栏： Paper Reading 文章标签： deep learning genomics

本文链接：https://blog.csdn.net/wxw060709/article/details/101620881

版权

Paper Reading 专栏收录该内容

133 篇文章 9 订阅

订阅专栏

论文题目：Deep Learning for Genomics: A Concise Overview

scholar 引用：19

页数：40

发表时间：2018.05

发表刊物：Genomics

作者：Tianwei Yue, Haohan Wang

摘要：

This data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithm, deep leaning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intenlligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out current challenges ans potential research directions for future genomics applications.

结论：

we have limited abilities to interpret the genomic information but expect from deep learning a superhuman intelligence that explores beyond our knowledge. 之前我可能觉得不现实，但是设想人工智能的前景，我觉得应该是人起到一个抛砖引玉的作用，而计算机的确能得出一些beyond our knowledge的东西。
deep learning applications slightly lag behind traditional statistical inferences in terms of interpretation. 按前景来说，应该是在可解释性方面超越传统方法的，因此才有这么多人致力于这个方向的研究。
current applications have not brought about a watershed revolution in genomic research. 就是说还没有取得突破性的大成就，比如说那种研究范式改变的方法出现。类似，计算机视觉领域，其实已经发生了范式的改变，几乎全部都是用深度学习在做了吧。
The predictive performances in most problems have not reach the expectation for real-world applications, neither have the interpretations of these abstruse models elucidate insightful knowledge. 深度学习的black-box问题。世界各地有很多人也在研究这个问题，所以平时也可以关注一下他们的研究，看是否对可解释性有一些突破。这样参照他们的思路，或许也有助于解释生物学问题。
By careful selection of data sources and features, or appropriate design of model structures, deep learning can be driven towards a bright direction. 这个方面，其实就是要更好的去将传统方法和机器学习相结合。去其糟粕，取其精华。
we need to bear in mind numerous challenges beyond simply improving predictive accuracy. 我觉得这是一个科研工作者最基本的，challenges。

Introduction：

Genomic research aims to understand the genomes of different species.
In addition to recognizing these patterns in DNA sequences, models can take other genetic and genomic information as input to build systems to help understand the biological mechanisms of underlying genes.
drug相关的应用，后续可能的研究方向：precision medicine, pharmacy
medicine: medical research and its applications such as gene therapies, molecular diagnostics, and personlized medicine could be revolutionized by tailoring high-performance computing methods to analyzing avaliable genomic datasets.
match the candidate protein identified by researchers with their known drug molecules.

正文组织架构：

1. Introduction

2. Deep Learning Architectures: Genomic Perspective

2.1 Convolutional Neural Networks

2.2 Recurrent Neural Networks

2.3 Autoencoders

2.4 Emergent Deep Architectures

2.4.1 Beyond Classic Models

2.4.2 Hybird Architectures

3. Deep learning Architectures: Insights and Remarks

3.1 Model Interpretation

3.2 Transfer Learning and Multitask

3.3 Multi-view Learning

4. Genomic Applications

4.1 Gene expression

4.1.1 Gene expression Characterization

4.1.2 Gene expression Prediction

4.2 Regulatory Genomics

4.2.1 Promoters and Enhancers

4.2.2 Splicing

4.2.3 Transcription Factors and RNA-binding Proteins

4.3 Functional Genomics

4.3.1 Mutations and Functional Activities

4.3.2 Subcellular Localization

4.4 Structural Genomics

4.4.1 Structural Classification of Proteins

4.4.2 Protein Secondary Structure

4.4.3 Protein Tertiary Structure and Quality Assessment

4.4.4 Contact Map

5. Challenges and Opportunities

5.1 The Nature of Data

5.1.1 Class-Imbalanced Data

5.1.2 Various Data Types

5.1.3 Heterogeneity and Confounding Correlations

5.2 Feature Extraction

5.2.1 Mathematical Feature Extraction

5.2.2 Feature Representation

6. Conclusion and Outlook

正文部分内容摘录：