Problem
A graph whose nodes have all been labeled can be represented by an adjacency list, in which each row of the list contains the two node labels corresponding to a unique edge.
A directed graph (or digraph) is a graph containing directed edges, each of which has an orientation. That is, a directed edge is represented by an arrow instead of a line segment; the starting and ending nodes of an edge form its tail and head, respectively. The directed edge with tail and head is represented by (but not by ). A directed loop is a directed edge of the form .
For a collection of strings and a positive integer , the overlap graph for the strings is a directed graph in which each string is represented by a node, and string is connected to string with a directed edge when there is a length suffix of that matches a length prefix of , as long as ; we demand to prevent directed loops in the overlap graph (although directed cycles may be present).
Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.
Return: The adjacency list corresponding to . You may return edges in any order.
节点全部被标记的图可以由邻接表表示,其中列表的每一行都包含对应于唯一边的两个节点标签。
有向图(或有向图)是包含有向边的图,每个有向边都有方向。也就是说,有向边由箭头而不是线段表示;边的起点和终点分别形成其尾部和头部。与尾部的有向边和头被表示为(但不通过)。有向环是表格的有向边。
用于字符串的集合和一个正整数,该重叠图形的字符串是一个有向图,其中每个串由一个节点表示,并且弦被连接到串 具有定向边缘时,有一个长度后缀的一个匹配长度前缀为,只要; 我们要求防止重叠图中的有向循环(尽管可能存在有向循环)。
给出:集合DNA串在FASTA格式具有至多10总长度kbp的。
返回值:对应的邻接表。您可以按任何顺序返回边。
Sample Dataset
>Rosalind_0498 AAATAAA >Rosalind_2391 AAATTTT >Rosalind_2323 TTTTCCC >Rosalind_0442 AAATCCC >Rosalind_5013 GGGTGGG
Sample Output
Rosalind_0498 Rosalind_2391 Rosalind_0498 Rosalind_0442 Rosalind_2391 Rosalind_2323
python解决方案
# coding=utf-8
# method1
data = {'Rosalind_0442': 'AAATCCC',
'Rosalind_0498': 'AAATAAA',
'Rosalind_2323': 'TTTTCCC',
'Rosalind_2391': 'AAATTTT',
'Rosalind_5013': 'GGGTGGG'}
def is_k_overlap(s1, s2, k):
return s1[-k:] == s2[:k]
import itertools
def k_edges(data, k):
edges = []
for u, v in itertools.combinations(data, 2): # data 里面任意取两个比较
u_dna, v_dna = data[u], data[v]
print(u_dna, v_dna)
if is_k_overlap(u_dna, v_dna, k):
edges.append((u, v))
if is_k_overlap(v_dna, u_dna, k):
edges.append((v, u))
return edges
print (k_edges(data, 3))