Problem
Figure 1. Global, local, and fitting alignments of strings v = GTAGGCTTAAGGTTA and w = TAGATA with respect to mismatch score. Note that in the fitting alignment, a substring of v must be aligned against all of w. Taken from Jones & Pevzner, An Introduction to Bioinformatics Algorithms
图1 。关于不匹配分数,字符串v = GTAGGCTTAAGGTTA和w = TAGATA的全局,局部和拟合对齐。请注意,在拟合对齐中,v的子字符串必须与所有w对齐。摘自Jones&Pevzner,《生物信息学算法简介》
Given a string and a motif , an alignment of a substring of against all of is called a fitting alignment. Our aim is to find a substring of that maximizes an alignment score with respect to .
Note that more than one such substring of may exist, depending on the particular strings and alignment score used. One candidate for scoring function is the one derived from edit distance; In this problem, we will consider a slightly different alignment score, in which all matched symbols count as +1 and all mismatched symbols (including insertions and deletions) receive a cost of -1. Let's call this scoring function the mismatch score. See Figure 1 for a comparison of global, local, and fitting alignments with respect to mismatch score.
Given: Two DNA strings and , where has length at most 10 kbp and represents a motif of length at most 1 kbp.
Return: An optimal fitting alignment score with respect to the mismatch score defined above, followed by an optimal fitting alignment of a substring of against . If multiple such alignments exist, then you may output any one.
给定一个字符串 和一个主题 ,是的子字符串的对齐方式 针对所有 称为拟合对齐。我们的目的是找到一个子串 的 最大化的比对得分相对于。
请注意,多个这样的子字符串 可能存在,具体取决于使用的特定字符串和对齐分数。一种得分函数的候选人是从编辑距离得出的; 在此问题中,我们将考虑稍有不同的对齐方式分数,其中所有匹配的符号都计为+1,所有不匹配的符号(包括插入和删除)的成本为-1。我们将此得分函数称为不匹配得分。 关于失配得分的整体,局部和拟合比对的比较,请参见图1。
给出:两个DNA字符串 和 ,在哪里 长度最大为10 kbp,并且 代表长度不超过1 kbp的基序。
返回值:相对于上面定义的不匹配分数的最佳拟合比对分数,然后是的子字符串的最佳拟合比对 反对 。如果存在多个这样的对齐方式,则可以输出任何一个。
Sample Dataset
>Rosalind_54 GCAAACCATAAGCCCTACGTGCCGCCTGTTTAAACTCGCGAACTGAATCTTCTGCTTCACGGTGAAAGTACCACAATGGTATCACACCCCAAGGAAAC >Rosalind_46 GCCGTCAGGCTGGTGTCCG
Sample Output
5 ACCATAAGCCCTACGTG-CCG GCCGTCAGGC-TG-GTGTCCG