Rosalind第63题:Finding the Longest Multiple Repeat

Problem

Figure 1. The suffix tree for s = GTCCGAAGCTCCGG. Note that the dollar sign has been appended to a substring of the tree to mark the end of s. Every path from the root to a leaf corresponds to a unique suffix of GTCCGAAGCTCCGG, and each leaf is labeled with the location in s of the suffix ending at that leaf.

repeated substring of a string  of length  is simply a substring that appears in more than one location of ; more specifically, a k-fold substring appears in at least k distinct locations.

The suffix tree of , denoted , is defined as follows:

  •  is a rooted tree having exactly  leaves.
  • Every edge of  is labeled with a substring of , where  is the string formed by adding a placeholder symbol $ to the end of .
  • Every internal node of  other than the root has at least two children; i.e., it has degree at least 3.
  • The substring labels for the edges leading from a node to its children must begin with different symbols.
  • By concatenating the substrings along edges, each path from the root to a leaf corresponds to a unique suffix of .

See Figure 1 for an example of a suffix tree.

Given: A DNA string  (of length at most 20 kbp) with $ appended, a positive integer , and a list of edges defining the suffix tree of . Each edge is represented by four components:

  1. the label of its parent node in ;
  2. the label of its child node in ;
  3. the location of the substring  of  assigned to the edge; and
  4. the length of .

Return: The longest substring of  that occurs at least  times in . (If multiple solutions exist, you may return any single solution.)

一个反复串  长度 只是一个出现在多个位置的子字符串; 更具体地,在至少k个不同的位置出现k个折叠子串

后缀树的,表示 ,定义如下:

  • 一棵根的树 叶子
  • 的 被标记为 ,在哪里 是将占位符添加$到末尾而形成的字符串。
  • 每一个内部节点的除根以外至少有两个孩子; 也就是说,它的学位至少为3。
  • 从节点到子节点的边缘的子字符串标签必须以不同的符号开头。
  • 用从根每个路径串接子沿着边缘以叶对应于一个独特的后缀的。

有关 后缀树的示例,请参见图1

鉴于:一个DNA串 (最大长度为20 kbp),并$附加一个正整数,以及定义后缀树的边列表 。每个边由四个组件表示:

  1. 其父节点的标签在 ;
  2. 其子节点的标签位于 ;
  3. 子串的位置 的 分配给边缘;和
  4. 的长度 。

返回值:的最长子字符串 至少发生  时代 。(如果存在多个解决方案,则可以返回任何单个解决方案。)

Sample Dataset

CATACATAC$
2
node1 node2 1 1
node1 node7 2 1
node1 node14 3 3
node1 node17 10 1
node2 node3 2 4
node2 node6 10 1
node3 node4 6 5
node3 node5 10 1
node7 node8 3 3
node7 node11 5 1
node8 node9 6 5
node8 node10 10 1
node11 node12 6 5
node11 node13 10 1
node14 node15 6 5
node14 node16 10 1

Sample Output

CATAC
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值