Problem
An affine gap penalty is written as , where is the length of the gap, is a positive constant called the gap opening penalty, and is a positive constant called the gap extension penalty.
We can view the gap opening penalty as charging for the first gap symbol, and the gap extension penalty as charging for each subsequent symbol added to the gap.
For example, if and , then a gap of length 1 would be penalized by 11 (for an average cost of 11 per gap symbol), whereas a gap of length 100 would have a score of 110 (for an average cost of 1.10 per gap symbol).
Consider the strings "PRTEINS" and "PRTWPSEIN". If we use the BLOSUM62 scoring matrix and an affine gap penalty with and , then we obtain the following optimal alignment.
PRT---EINS
||| |||
PRTWPSEIN-
Matched symbols contribute a total of 32 to the calculation of the alignment's score, and the gaps cost 13 and 11 respectively, yielding a total score of 8.
Given: Two protein strings and in FASTA format (each of length at most 100 aa).
Return: The maximum alignment score between and , followed by two augmented strings and representing an optimal alignment of and . Use:
- The BLOSUM62 scoring matrix.
- Gap opening penalty equal to 11.
- Gap extension penalty equal to 1.
一个仿射缺口罚写为,在哪里 是间隙的长度, 是一个正常数,称为空位开放罚分,并且是一个正的常数,称为缺口延伸罚分。
我们可以将空位开放罚分视为对第一个空位符号的收费,并将空位延伸罚分视为对添加至该空位的每个后续符号的收费。
例如,如果 和 ,那么长度为1的缺口将受到11的惩罚(每个缺口符号的平均成本为11),而长度为100的缺口将得到110的分数(每个缺口符号的平均成本为1.10)。
考虑字符串“ PRTEINS”和“ PRTWPSEIN”。如果我们使用BLOSUM62 评分矩阵 和仿射间隙罚分 和 ,则我们获得以下最佳对齐方式。
PRT --- EINS
||| |||
PRTWPSEIN -
匹配的符号对路线分数的计算总共贡献32,缺口的成本分别为13和11,总分数为8。
给出:两个蛋白质串 和 以FASTA格式(每个长度最大为100aa)。
返回值:之间的最大对齐分数 和 ,然后是两个增强字符串 和 代表 和 。采用:
Sample Dataset
>Rosalind_49
PRTEINS
>Rosalind_47
PRTWPSEIN
Sample Output
8
PRT---EINS
PRTWPSEIN-