因为我们主要是对矩阵进行操作,首先导入numpy包:
import numpy as np
输入两段需要比对的序列:
sequence_1_list = input("输入序列1,以逗号间隔:")sequence_2_list = input("输入序列2,以逗号间隔:")sequence_1_list = sequence_1_list.split(",")sequence_2_list = sequence_2_list.split(",")x_length = int(len(sequence_1_list))y_length = int(len(sequence_2_list))
我们导入序列的目的有两个:其一是按照序列长度建立合适大小的矩阵,二是把序列中元素分开并一一比对,从而按照一定的规则填满我们的矩阵。以逗号为间隔的目的是在之后将输入的字符串转化为列表,从而能够对每个元素单独比对。
建立我们的矩阵,此时矩阵中为随机数,也就相当于矩阵中的数字未填写:
compare_matrix = np.random.rand((y_length + 1) * (x_length + 1)).reshape(y_length + 1, x_length + 1)
然后我们进行矩阵数值填充规则的设定,这些打分规则是人为规定的:
score_12 = int(input("请输入AC mismatch score:"))score_13 = int(input("请输入AG mismatch score:"))score_14 = int(input("请输入AT mismatch score:"))score_23 = int(input("请输入CG mismatch score:"))score_24 = int(input("请输入CT mismatch score:"))score_34 = int(input("请输入GT mismatch score:"))score_gap = int(input("请输入gap mismatch score:"))score_match = int(input("请输入match score:"))
这些规则包括以下几类:
match score --匹配情况下打分规则
gap mismatch score --跳过一个碱基的打分规则
base mismatch score --碱基不匹配的打分规则
然后我们进行第0行第0列位置的填写,这个位置一般固定为0:
compare_matrix[0, 0] = 0
然后将橙色部分的第0行和第0列全部填补完,平移是按gap match score规则进行:
for x in range(1, x_length + 1, 1): compare_matrix[0, x] = compare_matrix[0, x - 1] + score_gapfor y in range(1, y_length + 1, 1): compare_matrix[y, 0] = compare_matrix[y - 1, 0] + score_gap
然后我们就可以通过评分规则对蓝色区域进行填写,填写规则如下,黄色是我们要填补的矩阵区域,蓝色是我们已经填补好的区域,平移方向(down or right)我们只能进行gap,而斜向(oblique)我们则可以判定match还是base mismatch,并根据相应情况进行分数加和,三个方向同时进行传播,选取最大数值作为我们黄色填补部分的取值。
for y in range(1, y_length + 1, 1): for x in range(1, x_length + 1, 1): # 定义移动方式计算 move_down = compare_matrix[y - 1, x] + score_gap move_right = compare_matrix[y, x - 1] + score_gap # move_oblique if sequence_1_list[x - 1] == sequence_2_list[y - 1]: move_oblique = compare_matrix[y - 1, x - 1] + score_match elif sequence_1_list[x - 1] == "a": if sequence_2_list[y - 1] == "c": move_oblique = compare_matrix[y - 1, x - 1] + score_12 elif sequence_2_list[y - 1] == "g": move_oblique = compare_matrix[y - 1, x - 1] + score_13 elif sequence_2_list[y - 1] == "t": move_oblique = compare_matrix[y - 1, x - 1] + score_14 elif sequence_1_list[x - 1] == "c": if sequence_2_list[y - 1] == "a": move_oblique = compare_matrix[y - 1, x - 1] + score_12 elif sequence_2_list[y - 1] == "g": move_oblique = compare_matrix[y - 1, x - 1] + score_23 elif sequence_2_list[y - 1] == "t": move_oblique = compare_matrix[y - 1, x - 1] + score_24 elif sequence_1_list[x - 1] == "g": if sequence_2_list[y - 1] == "a": move_oblique = compare_matrix[y - 1, x - 1] + score_13 elif sequence_2_list[y - 1] == "c": move_oblique = compare_matrix[y - 1, x - 1] + score_23 elif sequence_2_list[y - 1] == "t": move_oblique = compare_matrix[y - 1, x - 1] + score_34 elif sequence_1_list[x - 1] == "t": if sequence_2_list[y - 1] == "a": move_oblique = compare_matrix[y - 1, x - 1] + score_14 elif sequence_2_list[y - 1] == "c": move_oblique = compare_matrix[y - 1, x - 1] + score_24 elif sequence_2_list[y - 1] == "g": move_oblique = compare_matrix[y - 1, x - 1] + score_34 compare_matrix[y, x] = max(move_down, move_right, move_oblique)
最后进行结果输出及用户交互实现:
print('=' * 50)print("请检查下列数据,并默念三声zzh最帅!")print("sequence_1:", sequence_1_list)print("sequence_2:", sequence_2_list)print("AC mismatch score:", score_12)print("AG mismatch score:", score_13)print("AT mismatch score:", score_14)print("CG mismatch score:", score_23)print("CT mismatch score:", score_24)print("GT mismatch score:", score_34)print("gap score:", score_gap, "match score:", score_match)print("-" * 50)print("请仔细检查上述数据,如果正确,则评分矩阵如下:")print(compare_matrix)print("=" * 50)
输出: