Rosalind第12题:Overlap Graphs

Problem

A graph whose nodes have all been labeled can be represented by an adjacency list, in which each row of the list contains the two node labels corresponding to a unique edge.

directed graph (or digraph) is a graph containing directed edges, each of which has an orientation. That is, a directed edge is represented by an arrow instead of a line segment; the starting and ending nodes of an edge form its tail and head, respectively. The directed edge with tail  and head  is represented by  (but not by ). A directed loop is a directed edge of the form .

For a collection of strings and a positive integer , the overlap graph for the strings is a directed graph  in which each string is represented by a node, and string  is connected to string  with a directed edge when there is a length  suffix of  that matches a length  prefix of , as long as ; we demand  to prevent directed loops in the overlap graph (although directed cycles may be present).

Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.

Return: The adjacency list corresponding to . You may return edges in any order.

 

节点全部被标记的图可以由邻接表表示,其中列表的每一行都包含对应于唯一边的两个节点标签。

向图(或有向图)是包含有向边的图,每个有向边都有方向。也就是说,有向边由箭头而不是线段表示;边的起点和终点分别形成其尾部头部。与尾部的有向边和头被表示为(但通过)。有向环是表格的有向边。

用于字符串的集合和一个正整数,该重叠图形的字符串是一个有向图,其中每个串由一个节点表示,并且弦被连接到串 具有定向边缘时,有一个长度后缀的一个匹配长度前缀为,只要; 我们要求防止重叠图中的有向循环(尽管可能存在有向循环)。  

给出:集合DNA串FASTA格式具有至多10总长度kbp的

返回值:对应的邻接表。您可以按任何顺序返回边。

 

Sample Dataset

>Rosalind_0498
AAATAAA
>Rosalind_2391
AAATTTT
>Rosalind_2323
TTTTCCC
>Rosalind_0442
AAATCCC
>Rosalind_5013
GGGTGGG

Sample Output

Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323

python解决方案

# coding=utf-8

# method1
data = {'Rosalind_0442': 'AAATCCC',
        'Rosalind_0498': 'AAATAAA',
        'Rosalind_2323': 'TTTTCCC',
        'Rosalind_2391': 'AAATTTT',
        'Rosalind_5013': 'GGGTGGG'}


def is_k_overlap(s1, s2, k):
    return s1[-k:] == s2[:k]


import itertools


def k_edges(data, k):
    edges = []
    for u, v in itertools.combinations(data, 2):  # data 里面任意取两个比较
        u_dna, v_dna = data[u], data[v]
        print(u_dna, v_dna)

        if is_k_overlap(u_dna, v_dna, k):
            edges.append((u, v))

        if is_k_overlap(v_dna, u_dna, k):
            edges.append((v, u))

    return edges


print (k_edges(data, 3))

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值