利用python处理dna序列_DNA序列比对,Needleman-Wunsch算法的python实现

此博客介绍了使用动态规划方法计算NW Distance(Needleman-Wunsch距离),用于比较两个生物序列的相似性。通过种子序列和候选序列的匹配、插入和删除操作来确定最短距离。算法详细步骤包括构建动态规划表格并根据规则更新。最后返回的是两个序列的编辑距离作为相似度指标。
摘要由CSDN通过智能技术生成

def NWDistance(seedSequence, candidateSequence):

s = -1 # a mismatch would deduce 1 point.

m = 1 # plus 1 point for one match.

g = -3 # deduce 2 point for one gap.

seedSequence = seedSequence.strip()

candidateSequence = candidateSequence.strip()

if len(seedSequence) == 0:

print "Error, seed sequence length equal zero."

sys.exit(1)

elif len(candidateSequence) == 0:

print "Error, candidate sequence length equal zero."

sys.exit(1)

sLen = len(seedSequence)

cLen = len(candidateSequence)

table = []

for m in range(0, len(seedSequence) + 1):

table.append([m * g])

table[0] = []

for n in range(0, len(candidateSequence) + 1):

table[0].append(n * g)

for i in range(sLen):

for j in range(cLen):

table[i + 1].append(

max(

table[i][j] + (m if seedSequence[i] == candidateSequence[j] else s),

table[i][j + 1] + g,

table[i + 1][j] + g,

)

)

#

i = sLen - 1

j = cLen - 1

NewSeed = seedSequence[i]

NewCandidate = candidateSequence[j]

if len(seedSequence) <= 1 or len(candidateSequence) <= 1:

print "Error, too short!"

sys.exit(1)

while True:

if i == 0 and j == 0:

break

if seedSequence[i] == candidateSequence[j]:

if table[i - 1][j - 1] + 1 > table[i - 1][j] - 2 and table[i - 1][j - 1] + 1 > table[i][j - 1] - 2:

i = i - 1

j = j - 1

NewSeed = u"%s%s" % (seedSequence[i], NewSeed)

NewCandidate = u"%s%s" % (candidateSequence[j], NewCandidate)

else:

if table[i][j + 1] > table[i + 1][j]:

i = i - 1

NewSeed = u"%s%s" % (seedSequence[i], NewSeed)

NewCandidate = u"%s%s" % ('-', NewCandidate)

else:

j = j - 1

NewSeed = u"%s%s" % ('-', NewSeed)

NewCandidate = u"%s%s" % (candidateSequence[j], NewCandidate)

else:

if table[i - 1][j - 1] + 1 > table[i - 1][j] - 2 and table[i - 1][j - 1] + 1 > table[i][j - 1] - 2:

i = i - 1

j = j - 1

NewSeed = u"%s%s" % (seedSequence[i], NewSeed)

NewCandidate = u"%s%s" % (candidateSequence[j], NewCandidate)

else:

if table[i][j + 1] > table[i + 1][j]:

i = i - 1

NewSeed = u"%s%s" % (seedSequence[i], NewSeed)

NewCandidate = u"%s%s" % ('-', NewCandidate)

else:

j = j - 1

NewSeed = u"%s%s" % ('-', NewSeed)

NewCandidate = u"%s%s" % (candidateSequence[j], NewCandidate)

print NewSeed

print NewCandidate

# distance

mismath = 0

math = 0

gap = 0

charZipList = zip(NewSeed, NewCandidate)

# delete the head gap

for n in range(len(charZipList)):

if "-" in charZipList[n]:

del charZipList[0]

else:

break

# delete the tail gap

while True:

lastTuple = charZipList.pop()

if "-" in lastTuple:

continue

else:

charZipList.append(lastTuple)

break

#

for n in range(len(charZipList)):

charTuple = charZipList[n]

if charTuple[0] == charTuple[1]:

math += 1

elif "-" in charTuple:

gapLoc = charTuple.index("-")

if charZipList[n + 1][gapLoc] == "-":

continue

else:

gap += 1

else:

mismath += 1

distance = round(1.0 - float(math) / float(mismath + math + gap), 4)

return distance

全局比对是一种常用的序列比对方法,它可以比对两个序列的整个长度,并给出它们的相似程度。其中 Needleman-Wunsch 是一种经典的全局比对算法Needleman-Wunsch 算法的本质是动态规划,它将全局比对的问题拆分成若干个子问题,并通过计算每个子问题的得分来求解最终的比对结果。具体实现步骤如下: 1. 初始化得分矩阵(score matrix)并计算第一行和第一列的得分; 2. 逐行逐列计算得分矩阵中的每个元素的得分,根据得分矩阵中每个元素的左上、上、左三个元素的得分值,计算当前元素的得分; 3. 回溯得分矩阵,得到最优比对结果。 以下是 Needleman-Wunsch 算法Python 实现: ```python # Needleman-Wunsch 算法实现 import numpy as np def needleman_wunsch(seq1, seq2, match_score=1, mismatch_score=-1, gap_score=-1): # 初始化得分矩阵 m, n = len(seq1), len(seq2) score_matrix = np.zeros((m+1, n+1)) for i in range(1, m+1): score_matrix[i,0] = score_matrix[i-1,0] + gap_score for j in range(1, n+1): score_matrix[0,j] = score_matrix[0,j-1] + gap_score # 填充得分矩阵 for i in range(1, m+1): for j in range(1, n+1): score1 = score_matrix[i-1,j-1] + (match_score if seq1[i-1] == seq2[j-1] else mismatch_score) score2 = score_matrix[i-1,j] + gap_score score3 = score_matrix[i,j-1] + gap_score score_matrix[i,j] = max(score1, score2, score3) # 回溯得分矩阵,得到最优比对结果 align1, align2 = '', '' i, j = m, n while i > 0 and j > 0: score, score1, score2, score3 = score_matrix[i,j], score_matrix[i-1,j-1], score_matrix[i-1,j], score_matrix[i,j-1] if score == score1 + (match_score if seq1[i-1] == seq2[j-1] else mismatch_score): align1 = seq1[i-1] + align1 align2 = seq2[j-1] + align2 i, j = i-1, j-1 elif score == score2 + gap_score: align1 = seq1[i-1] + align1 align2 = '-' + align2 i -= 1 else: align1 = '-' + align1 align2 = seq2[j-1] + align2 j -= 1 while i > 0: align1 = seq1[i-1] + align1 align2 = '-' + align2 i -= 1 while j > 0: align1 = '-' + align1 align2 = seq2[j-1] + align2 j -= 1 return align1, align2 ``` 在上面的代码中,我们使用了 `numpy` 库来创建得分矩阵,这使得代码更加简洁和高效。在实现中,我们还可以通过调整 `match_score`、`mismatch_score` 和 `gap_score` 的值来控制比对的结果。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值