本文为美国爱荷华州立大学(作者:Riley Mitchell Mcdowell)的硕士论文,共50页。
DNA标记技术的成本降低,产生了大量的分子数据,并使得在育种计划中生成密集的全基因组标记图谱在经济上是可行的。数据密度和容量的增加推动了对工具和技术的进一步探索,以通过分析这些数据改进品种。数据科学理论和应用已经经历了对各种技术应用中检测或“学习”噪声数据的复兴。机器学习的几种变体已被提出用于分析大型DNA标记数据集,以帮助表型预测和基因组选择。
在此,我们回顾了基因组预测和机器学习文献。我们将机器学习研究中的深度学习技术应用到六个表型预测任务中,这些都是已发布的参考数据集。由于正则化经常能够提高神经网络的预测精度,我们在神经网络模型中加入了正则化方法。将神经网络模型与通常用于表型预测和基因组选择的正则化贝叶斯和线性回归技术进行比较,在其中三个表型预测任务中,正则化神经网络是最准确的模型。令人惊讶的是,对于这些数据集,网络架构的深度并没有影响训练模型的准确性。(最后一句话感觉怪怪的,读者自己推敲一下吧,应该是关于GPU用于神经网络运算量的)
Reduced costs for DNA marker technology hasgenerated a huge amount of molecular data and made it economically feasible togenerate dense genome-wide marker maps of lines in a breeding program.Increased data density and volume has driven an exploration of tools and techniquesto analyze these data for cultivar improvement. Data science theory andapplication has experienced a resurgence of research into techniques to detector ”learn” patterns in noisy data in a variety of technical applications.Several variants of machine learning have been proposed for analyzing large DNAmarker data sets to aid in phenotype prediction and genomic selection. Here, wepresent a review of the genomic prediction and machine learning literature. Weapply deep learning techniques from machine learning research to six phenotypicprediction tasks using published reference datasets. Because regularizationfrequently improves neural network prediction accuracy, we includedregularization methods in the neural network models. The neural network modelsare compared to a selection of regularized Bayesian and linear regressiontechniques commonly employed for phenotypic prediction and genomic selection.On three of the phenotype prediction tasks, regularized neural networks werethe most accurate of the models evaluated. Surprisingly, for these data setsthe depth of the network architecture did not affect the accuracy of thetrained model. We also find that concerns about the computer processing timeneeded to train neural network models to perform well in genomic predictiontasks may not apply when Graphics Processing Units are used for model training.
1 引言
1.1 概述
1.2 本文组织结构
1.3 文献回顾
2 利用深度神经网络进行基因组预测
2.1 摘要
2.2 引言
2.3 材料与方法
2.4 结果与讨论
3 结论
3.1 一般讨论
3.2 未来研究展望
附录A 原始数据
附录B 分析代码
附录C 本文的软件代码
完整资料领取请加QQ群免费下载: