12 Overlap Graphs

Problem

A graph whose nodes have all been labeled can be represented by an adjacency list, in which each row of the list contains the two node labels corresponding to a unique edge.

directed graph (or digraph) is a graph containing directed edges, each of which has an orientation. That is, a directed edge is represented by an arrow instead of a line segment; the starting and ending nodes of an edge form its tail and head, respectively. The directed edge with tail vv and head ww is represented by (v,w)(v,w) (but not by (w,v)(w,v)). A directed loop is a directed edge of the form (v,v)(v,v).

For a collection of strings and a positive integer kk, the overlap graph for the strings is a directed graph OkOk in which each string is represented by a node, and string ss is connected to string ttwith a directed edge when there is a length kk suffix of ss that matches a length kk prefix of tt, as long as sts≠t; we demand sts≠t to prevent directed loops in the overlap graph (although directed cycles may be present).

Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.

Return: The adjacency list corresponding to O3O3. You may return edges in any order.

Sample Dataset

>Rosalind_0498
AAATAAA
>Rosalind_2391
AAATTTT
>Rosalind_2323
TTTTCCC
>Rosalind_0442
AAATCCC
>Rosalind_5013
GGGTGGG

Sample Output

Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323


方法一
# coding=utf-8

# method1
data ={'Rosalind_0442': 'AAATCCC',
 'Rosalind_0498': 'AAATAAA',
 'Rosalind_2323': 'TTTTCCC',
 'Rosalind_2391': 'AAATTTT',
 'Rosalind_5013': 'GGGTGGG'}

def is_k_overlap(s1, s2, k):
    return s1[-k:] == s2[:k]


import itertools
def k_edges(data, k):
    edges = []
    for u,v in itertools.combinations(data, 2):  # data 里面任意取两个比较
        u_dna, v_dna = data[u], data[v]
        print u_dna, v_dna
        if is_k_overlap(u_dna, v_dna, k):
            edges.append((u,v))

        if is_k_overlap(v_dna, u_dna, k):
            edges.append((v,u))

    return edges

print k_edges(data, 3)

  方法二:

# coding=utf-8
### 12. Overlap Graphs ###
from collections import OrderedDict
import re


def overlap_graph(dna, n):
    edges = []
    for ke1, val1 in dna:
        for ke2, val2 in dna:
            if ke1 != ke2 and val1[-n:] == val2[:n]:
                edges.append(ke1 + '\t' + ke2)
    return edges


dna = OrderedDict()
with open('12.txt') as f:
    for line in f:
        line = line.rstrip()
        if line.startswith('>'):
            seqName = re.sub('>', '', line)
            dna[seqName] = ''
            continue
        dna[seqName] += line.upper()

fh = open('rosalind_grph_output.txt', 'wt')
for x in overlap_graph(dna.items(), 3):
    fh.write(x + '\n')

fh.close()

  方法三

# coding=utf-8
seq_list = []
stseq = ''
for line in open('12.txt'):
    if line[0] == '>':
        if stseq != '':
            seq_list.append([stname, stseq])
            stseq = ''
        stname = line[1:-1]
    else:
        stseq = stseq + line.strip('\n')
seq_list.append([stname, stseq])
l = len(seq_list)

for i in range(0, l):
    for j in range(0, i):
        if seq_list[i][1] == seq_list[j][1]:
            continue
        if seq_list[i][1][0:3] == seq_list[j][1][-3:]:
            print seq_list[j][0], seq_list[i][0]
        if seq_list[i][1][-3:] == seq_list[j][1][0:3]:
            print seq_list[i][0], seq_list[j][0]

  

转载于:https://www.cnblogs.com/think-and-do/p/7277822.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值