Problem
An alignment of two strings and is defined by two strings and satisfying the following three conditions: 1. and must be formed from adding gap symbols "-" to each of and , respectively; as a result, and will form subsequences of and . 2. and must have the same length. 3. Two gap symbols may not be aligned; that is, if is a gap symbol, then cannot be a gap symbol, and vice-versa.
We say that and augment and . Writing directly over so that symbols are aligned provides us with a scenario for transforming into . Mismatched symbols from and correspond to symbol substitutions; a gap symbol aligned with a non-gap symbol implies the insertion of this symbol into ; a gap symbol aligned with a non-gap symbol implies the deletion of this symbol from .
Thus, an alignment represents a transformation of into via edit operations. We define the corresponding edit alignment score of and as (Hamming distance is used because the gap symbol has been introduced for insertions and deletions). It follows that , where the minimum is taken over all alignments of and . We call such a minimum score alignment an optimal alignment (with respect to edit distance).
Given: Two protein strings and in FASTA format (with each string having length at most 1000 aa).
Return: The edit distance followed by two augmented strings and representing an optimal alignment of and .
两个字符串的对齐 和 由两个字符串定义 和 满足以下三个条件:1。 和 必须通过在每个空格中添加空格符号“-”来形成 和 , 分别; 结果是, 和 将形成子序列的 和 。2。 和 必须具有相同的长度。3.两个间隙符号可能未对齐;也就是说,如果 是间隙符号,然后 不能是间隙符号,反之亦然。
我们说 和 增加 和 。写作 直接在 以便使符号对齐为我们提供了进行转换的方案 进入 。来自的符号不匹配 和 对应于符号替换;间隙符号 与无间隙符号对齐 表示此符号已插入 ; 间隙符号 与无间隙符号对齐 暗示从中删除该符号 。
因此,比对代表了 进入 通过编辑操作。我们定义相应的编辑比对得分的 和 如 (由于插入和删除都引入了间隙符号,因此使用了汉明距离)。它遵循,其中最小值用于 和 。我们称这种最小分数比对为最佳比对(相对于编辑距离)。
给出:两个蛋白质串 和 以FASTA格式(每个字符串的长度最大为1000 aa)。
返回:编辑距离 其次是两个增强字符串 和 代表 和 。
Sample Dataset
>Rosalind_43 PRETTY >Rosalind_97 PRTTEIN
Sample Output
4 PRETTY-- PR-TTEIN