Rosalind第12题：Overlap Graphs

最新推荐文章于 2022-05-16 22:57:12 发布

automan_huyaoge

最新推荐文章于 2022-05-16 22:57:12 发布

阅读量341

点赞数

分类专栏：控制科学与工程 python

原文链接：http://rosalind.info/problems/grph/

版权

python 同时被 2 个专栏收录

211 篇文章 2 订阅

订阅专栏

控制科学与工程

179 篇文章 19 订阅

订阅专栏

Problem

A graph whose nodes have all been labeled can be represented by an adjacency list, in which each row of the list contains the two node labels corresponding to a unique edge.

A directed graph (or digraph) is a graph containing directed edges, each of which has an orientation. That is, a directed edge is represented by an arrow instead of a line segment; the starting and ending nodes of an edge form its tail and head, respectively. The directed edge with tail and head is represented by (but not by ). A directed loop is a directed edge of the form .

For a collection of strings and a positive integer , the overlap graph for the strings is a directed graph in which each string is represented by a node, and string is connected to string with a directed edge when there is a length suffix of that matches a length prefix of , as long as ; we demand to prevent directed loops in the overlap graph (although directed cycles may be present).

Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.

Return: The adjacency list corresponding to . You may return edges in any order.

节点全部被标记的图可以由邻接表表示，其中列表的每一行都包含对应于唯一边的两个节点标签。

有向图（或有向图）是包含有向边的图，每个有向边都有方向。也就是说，有向边由箭头而不是线段表示；边的起点和终点分别形成其尾部和头部。与尾部的有向边和头被表示为（但不通过）。有向环是表格的有向边。

用于字符串的集合和一个正整数，该重叠图形的字符串是一个有向图，其中每个串由一个节点表示，并且弦被连接到串具有定向边缘时，有一个长度后缀的一个匹配长度前缀为，只要; 我们要求防止重叠图中的有向循环（尽管可能存在有向循环）。

给出：集合DNA串在FASTA格式具有至多10总长度kbp的。

返回值：对应的邻接表。您可以按任何顺序返回边。

Sample Dataset

>Rosalind_0498
AAATAAA
>Rosalind_2391
AAATTTT
>Rosalind_2323
TTTTCCC
>Rosalind_0442
AAATCCC
>Rosalind_5013
GGGTGGG

Sample Output

Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323

python解决方案

# coding=utf-8

# method1
data = {'Rosalind_0442': 'AAATCCC',
        'Rosalind_0498': 'AAATAAA',
        'Rosalind_2323': 'TTTTCCC',
        'Rosalind_2391': 'AAATTTT',
        'Rosalind_5013': 'GGGTGGG'}


def is_k_overlap(s1, s2, k):
    return s1[-k:] == s2[:k]


import itertools


def k_edges(data, k):
    edges = []
    for u, v in itertools.combinations(data, 2):  # data 里面任意取两个比较
        u_dna, v_dna = data[u], data[v]
        print(u_dna, v_dna)

        if is_k_overlap(u_dna, v_dna, k):
            edges.append((u, v))

        if is_k_overlap(v_dna, u_dna, k):
            edges.append((v, u))

    return edges


print (k_edges(data, 3))