Problem
Consider a set of -mers of some unknown DNA string. Let denote the set containing all reverse complements of the elements of . (recall from “Counting Subsets” that sets are not allowed to contain duplicate elements).
The de Bruijn graph of order corresponding to is a digraph defined in the following way:
- Nodes of correspond to all -mers that are present as a substring of a -mer from .
- Edges of are encoded by the -mers of in the following way: for each -mer in , form a directed edge (, ).
Given: A collection of up to 1000 (possibly repeating) DNA strings of equal length (not exceeding 50 bp) corresponding to a set of -mers.
Return: The adjacency list corresponding to the de Bruijn graph corresponding to .
考虑一组 的一些不知名的聚体DNA串。让 表示包含元素的所有反向互补的集合。(从“计数子集”中回想起,不允许集合包含重复元素)。
对应的de Bruijn 有序图是通过以下方式定义的有向图:
下式给出:高达1000(可能重复)长度相等的DNA串的集合(不超过50碱基对)对应于一组的氨基酸聚体。
返回值:与对应的de Bruijn图对应的邻接表。
Sample Dataset
TGAT CATG TCAT ATGC CATC CATC
Sample Output
(ATC, TCA) (ATG, TGA) (ATG, TGC) (CAT, ATC) (CAT, ATG) (GAT, ATG) (GCA, CAT) (TCA, CAT) (TGA, GAT)