Paper reading (二十三)：Machine Learning for Detecting Gene-Gene Interactions: A Review

最新推荐文章于 2023-10-18 11:33:49 发布

盲人骑瞎马5555

最新推荐文章于 2023-10-18 11:33:49 发布

阅读量355

点赞数

分类专栏： Paper Reading 文章标签： Gene-Gene Interactions machine learning

本文链接：https://blog.csdn.net/wxw060709/article/details/101749956

版权

Paper Reading 专栏收录该内容

133 篇文章 9 订阅

订阅专栏

论文题目：Machine Learning for Detecting Gene-Gene Interactions:A Review

scholar 引用：215

页数：21

发表时间：2006.06

发表刊物：Applied Bioinformatics

作者：Brett A. McKinney, David M. Reif, Marylyn D. Ritchie, and Jason H. Moore

摘要：Keywords: Hide layer, Genetic programming, Multifactor dimensionality reduction, Traditional statistical method, Genomewide association study

Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology(病因学). There is a growing body of evidence to suggest that complex interactions are 'the norm' and, rather than amounting to small perturbation(扰乱) to classical Mendelian genetics(孟德尔遗传学的), interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms(多态性). In this review, we discuss machine-learning models and algorithms for identifing and characterising susceptibility genes in common, complex, multifactorial(多因子的) human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata(细胞自动机), random forest, and mulfifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics.

结论：

New methods are needed to analyse genetic data that not only address the usual challenges posed by real-world data, but
that also recognise interactions as an important effect rather than a perturbation to independent main effects.
we discussed evolution-optimised NNs and CAs, as well as MDR and RFs machine-learning models that have been successfully used to detect gene-gene interactions.（这几个方法可以仔细看看，但是这是2006年的paper，当时机器学习还没有火起来，所以很可能这些方法已经被其他新方法所替代了，但是可以看看当时的人们是怎么思考的）
MDR(多因子降维法) is a deterministic and conceptually simple constructive induction method that exhaustively considers every possible combination of variables up to a given order.
For higherorder interactions, it would then be necessary to implement an RF approach or a stochastic optimisation method to attempt to traverse the vast search space.
Perhaps a similar underlying order waits to be discovered in genetics through the collaborative efforts of geneticists, epidemiologists, bioinformaticists, computer scientists, physicians and others. 感觉现在各种领域都是大数据，然后都觉得机器学习或者深度学习可以应用进来，那么就需要各种交叉领域的专家齐心协力喽~~~~
这篇的conclusion很长，看起来还有点看正文的感觉，然后会想，为什么要写这么长呢？是不是一个好的sci writing？

Introduction：

In fact, there are reasons to believe that the effect of gene-gene interactions, or epistasis, plays a more important role than the independent main effect of any one gene in the susceptibility to common human diseases.
embraces the complexity of genetic architecture
Traditional parametric statistical methods are limited in their ability to identify interacting susceptibility genes in small sample sizes because of the sparseness of the data in high dimensions.
Another drawback of traditional statistical methods for identifying interactions is the need to specify a model for the interaction.
One of the advantages of logistic regression is the simple physical interpretation of the model and its parameters as they relate genotypes to probability of disease. However, the advantage of interpretability is nullified if the method is unable to determine which variables interact.
Classic applications of machine learning include speech and handwriting recognition, game playing and data mining. 2006年的machine learning还主要是这几个方面的应用，经过了13年的发展，应用已经丰富了太多太多！
This review focus on four models：neural networks (NNs), cellular automata (CAs), random forests (RFs) and multifactor
dimensionality reduction (MDR).
浅谈主成分分析与因子分析

正文组织架构：

1. Introduction

2. Optimisation and Evolution

3. Neural Networks

4. Cellular Automata

5. Random Forest

6. Multifactor Dimensionality Reduction

7. A Flexible Stragety for Data Mining and Knowledge Discovery

8. Conclusion

正文部分内容摘录：