Rosalind第102题:Finding a Motif with Modifications

Problem

Figure 1. Global, local, and fitting alignments of strings v = GTAGGCTTAAGGTTA and w = TAGATA with respect to mismatch score. Note that in the fitting alignment, a substring of v must be aligned against all of w. Taken from Jones & Pevzner, An Introduction to Bioinformatics Algorithms

图1 关于不匹配分数,字符串v = GTAGGCTTAAGGTTA和w = TAGATA的全局,局部和拟合对齐。请注意,在拟合对齐中,v的子字符串必须与所有w对齐。摘自Jones&Pevzner,《生物信息学算法简介》

Given a string  and a motif , an alignment of a substring of  against all of  is called a fitting alignment. Our aim is to find a substring  of  that maximizes an alignment score with respect to .

Note that more than one such substring of  may exist, depending on the particular strings and alignment score used. One candidate for scoring function is the one derived from edit distance; In this problem, we will consider a slightly different alignment score, in which all matched symbols count as +1 and all mismatched symbols (including insertions and deletions) receive a cost of -1. Let's call this scoring function the mismatch score. See Figure 1 for a comparison of global, local, and fitting alignments with respect to mismatch score.

Given: Two DNA strings  and , where  has length at most 10 kbp and  represents a motif of length at most 1 kbp.

Return: An optimal fitting alignment score with respect to the mismatch score defined above, followed by an optimal fitting alignment of a substring of  against . If multiple such alignments exist, then you may output any one.

给定一个字符串  和一个主题 ,是的子字符串的对齐方式  针对所有  称为拟合对齐。我们的目的是找到一个子串 的 最大化的比对得分相对于。

请注意,多个这样的子字符串 可能存在,具体取决于使用的特定字符串和对齐分数。一种得分函数的候选人是从编辑距离得出的; 在此问题中,我们将考虑稍有不同的对齐方式分数,其中所有匹配的符号都计为+1,所有不匹配的符号(包括插入和删除)的成本为-1。我们将此得分函数称为不匹配得分。 关于失配得分的整体,局部和拟合比对的比较,请参见图1

给出:两个DNA字符串  和 ,在哪里 长度最大为10 kbp,并且 代表长度不超过1 kbp的基序。

返回值:相对于上面定义的不匹配分数的最佳拟合比对分数,然后是的子字符串的最佳拟合比对 反对 。如果存在多个这样的对齐方式,则可以输出任何一个。

Sample Dataset

>Rosalind_54
GCAAACCATAAGCCCTACGTGCCGCCTGTTTAAACTCGCGAACTGAATCTTCTGCTTCACGGTGAAAGTACCACAATGGTATCACACCCCAAGGAAAC
>Rosalind_46
GCCGTCAGGCTGGTGTCCG

Sample Output

5
ACCATAAGCCCTACGTG-CCG
GCCGTCAGGC-TG-GTGTCCG
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值