Rosalind第98题:Genome Assembly with Perfect Coverage and Repeats

Problem

Recall that a directed cycle is a cycle in a directed graph in which the head of one edge is equal to the tail of the following edge.

In a de Bruijn graph of -mers, a circular string  is constructed from a directed cycle  is given by . That is, because the final  symbols of  overlap with the first  symbols of , we simply tack on the -th symbol of  to , then iterate the process.

For example, the circular string assembled from the cycle "AC"  "CT"  "TA"  "AC" is simply (ACT). Note that this string only has length three because the 2-mers "wrap around" in the string.

If every -mer in a collection of reads occurs as an edge in a de Bruijn graph cycle the same number of times as it appears in the reads, then we say that the cycle is "complete."

Given: A list  of error-free DNA -mers () taken from the same strand of a circular chromosome (of length ).

Return: All circular strings assembled by complete cycles in the de Bruijn graph  of . The strings may be given in any order, but each one should begin with the first -mer provided in the input.

回想一下,有向循环是有向图中的一个循环,其中 一个边的头部等于下一个边的末端

在de Bruijn图中 -mers,圆形字符串  从有向周期构造而成  是(谁)给的 。那是因为最后 的符号  与第一个重叠  的符号 ,我们只是坚持 的第-个符号  至 ,然后重复该过程。

例如,循环“ AC”中组装的圆形弦  “ CT”  “ TA” “ AC”就是(ACT)。请注意,此字符串的长度仅为三,因为2聚体在字符串中“环绕”。

如果每个 读取集合中的-mer出现在de Bruijn图循环中的边沿与读取中出现的次数相同,那么我们说该循环是“完整的”。

给出:清单 无错DNA -mers()取自圆形染色体(长度为)。

返回值:在de Bruijn图中,所有圆弦均由完整的循环组装而成 的 。字符串可以任意顺序给出,但每个字符串都应从第一个开始 输入中提供的-mer。

 

Sample Dataset

CAG
AGT
GTT
TTT
TTG
TGG
GGC
GCG
CGT
GTT
TTC
TCA
CAA
AAT
ATT
TTC
TCA

Sample Output

CAGTTCAATTTGGCGTT
CAGTTCAATTGGCGTTT
CAGTTTCAATTGGCGTT
CAGTTTGGCGTTCAATT
CAGTTGGCGTTCAATTT
CAGTTGGCGTTTCAATT
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值