元素比对_序列比对算法简单实现

最新推荐文章于 2022-02-09 16:57:33 发布

AAAsuan

最新推荐文章于 2022-02-09 16:57:33 发布

阅读量244

点赞数

文章标签：元素比对

本文链接：https://blog.csdn.net/weixin_32047493/article/details/112593834

版权

因为我们主要是对矩阵进行操作，首先导入numpy包：

import numpy as np

输入两段需要比对的序列：

sequence_1_list = input("输入序列1，以逗号间隔：")sequence_2_list = input("输入序列2，以逗号间隔：")sequence_1_list = sequence_1_list.split(",")sequence_2_list = sequence_2_list.split(",")x_length = int(len(sequence_1_list))y_length = int(len(sequence_2_list))

我们导入序列的目的有两个：其一是按照序列长度建立合适大小的矩阵，二是把序列中元素分开并一一比对，从而按照一定的规则填满我们的矩阵。以逗号为间隔的目的是在之后将输入的字符串转化为列表，从而能够对每个元素单独比对。

建立我们的矩阵，此时矩阵中为随机数，也就相当于矩阵中的数字未填写:

compare_matrix = np.random.rand((y_length + 1) * (x_length + 1)).reshape(y_length + 1, x_length + 1)

然后我们进行矩阵数值填充规则的设定，这些打分规则是人为规定的：

score_12 = int(input("请输入AC mismatch score："))score_13 = int(input("请输入AG mismatch score："))score_14 = int(input("请输入AT mismatch score："))score_23 = int(input("请输入CG mismatch score："))score_24 = int(input("请输入CT mismatch score："))score_34 = int(input("请输入GT mismatch score："))score_gap = int(input("请输入gap mismatch score："))score_match = int(input("请输入match score："))

这些规则包括以下几类：

match score --匹配情况下打分规则

gap mismatch score --跳过一个碱基的打分规则

base mismatch score --碱基不匹配的打分规则

然后我们进行第0行第0列位置的填写，这个位置一般固定为0：

compare_matrix[0, 0] = 0

然后将橙色部分的第0行和第0列全部填补完，平移是按gap match score规则进行：

for x in range(1, x_length + 1, 1):    compare_matrix[0, x] = compare_matrix[0, x - 1] + score_gapfor y in range(1, y_length + 1, 1):    compare_matrix[y, 0] = compare_matrix[y - 1, 0] + score_gap

然后我们就可以通过评分规则对蓝色区域进行填写，填写规则如下，黄色是我们要填补的矩阵区域，蓝色是我们已经填补好的区域，平移方向(down or right)我们只能进行gap，而斜向(oblique)我们则可以判定match还是base mismatch，并根据相应情况进行分数加和，三个方向同时进行传播，选取最大数值作为我们黄色填补部分的取值。

for y in range(1, y_length + 1, 1):    for x in range(1, x_length + 1, 1):        # 定义移动方式计算        move_down = compare_matrix[y - 1, x] + score_gap        move_right = compare_matrix[y, x - 1] + score_gap        # move_oblique        if sequence_1_list[x - 1] == sequence_2_list[y - 1]:            move_oblique = compare_matrix[y - 1, x - 1] + score_match        elif sequence_1_list[x - 1] == "a":            if sequence_2_list[y - 1] == "c":                move_oblique = compare_matrix[y - 1, x - 1] + score_12            elif sequence_2_list[y - 1] == "g":                move_oblique = compare_matrix[y - 1, x - 1] + score_13            elif sequence_2_list[y - 1] == "t":                move_oblique = compare_matrix[y - 1, x - 1] + score_14        elif sequence_1_list[x - 1] == "c":            if sequence_2_list[y - 1] == "a":                move_oblique = compare_matrix[y - 1, x - 1] + score_12            elif sequence_2_list[y - 1] == "g":                move_oblique = compare_matrix[y - 1, x - 1] + score_23            elif sequence_2_list[y - 1] == "t":                move_oblique = compare_matrix[y - 1, x - 1] + score_24        elif sequence_1_list[x - 1] == "g":            if sequence_2_list[y - 1] == "a":                move_oblique = compare_matrix[y - 1, x - 1] + score_13            elif sequence_2_list[y - 1] == "c":                move_oblique = compare_matrix[y - 1, x - 1] + score_23            elif sequence_2_list[y - 1] == "t":                move_oblique = compare_matrix[y - 1, x - 1] + score_34        elif sequence_1_list[x - 1] == "t":            if sequence_2_list[y - 1] == "a":                move_oblique = compare_matrix[y - 1, x - 1] + score_14            elif sequence_2_list[y - 1] == "c":                move_oblique = compare_matrix[y - 1, x - 1] + score_24            elif sequence_2_list[y - 1] == "g":                move_oblique = compare_matrix[y - 1, x - 1] + score_34        compare_matrix[y, x] = max(move_down, move_right, move_oblique)

最后进行结果输出及用户交互实现：

print('=' * 50)print("请检查下列数据，并默念三声zzh最帅！")print("sequence_1:", sequence_1_list)print("sequence_2:", sequence_2_list)print("AC mismatch score:", score_12)print("AG mismatch score:", score_13)print("AT mismatch score:", score_14)print("CG mismatch score:", score_23)print("CT mismatch score:", score_24)print("GT mismatch score:", score_34)print("gap score:", score_gap, "match score:", score_match)print("-" * 50)print("请仔细检查上述数据，如果正确，则评分矩阵如下：")print(compare_matrix)print("=" * 50)

输出：