基于改进的k最近邻算法的单体型重建问题
An Improved KNN Algorithm for Haplotype Reconstruction Problem
DOI: 10.12677/csa.2012.21004, PP. 17-21
Keywords: 单核苷酸多态性;单体型重建;聚类;k最近邻算法;粒子群算法
SNP, Haplotype Assembly, Cluster, KNN, PSO
Full-Text Cite this paper Add to My Lib |
Abstract:
单核苷酸多态性(SNPs)是人类遗传变异中最显著的一种形式,是一个物种中DNA序列中某个位点上的碱基变化。人们发现由单核苷酸组成的单体型比单一的单核苷酸包含更多的生物遗传信息,因此研究单体型对于诊断疾病和药物研制有着重要作用。单体型重建就是对由SNP片段组成的基因片段进行组装,从而构造出原来的一对单体型。本文在k最近邻和粒子群算法的基础上,提出一种解决单体型重建问题的一种聚类算法。最后,本文将用模拟数据和真实数据来检验本文所提出的算法,结果证明所提出的算法可行。
Single Nucleotide Polymorphisms (SNPs) is a single base pair position in genomic DNA where different nucleotide variants exist in some population, and is considered as the most frequent form of human genetic variants. The haplotype is composed of the SNP which was found to contain more genetic information. Study it plays an important role in the diagnosis of disease and drugs design. Haplotype reconstruction is to reconstruct a pair of haplotypes from localized polymorphism data got through short genome fragment assembly. In this paper, a new clustering method, based on KNN and PSO algorithms, is proposed to solve haplotype reconstruction problem. In the end, it will be used simulation data and real biological data to test the proposed algorithm, and the results show that the proposed method is feasible.