Computing GC Content

最新推荐文章于 2024-01-21 11:57:07 发布

1haotian

最新推荐文章于 2024-01-21 11:57:07 发布

阅读量375

点赞数

分类专栏： Latex 文章标签： ROSALIND python

本文链接：https://blog.csdn.net/u013683789/article/details/53466302

版权

Latex 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

Computing GC Content

In Rosalind’s implementation, a string in FASTA format will be labeled by the ID “Rosalind_xxxx”, where “xxxx” denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

Sample Dataset

>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

Sample Output

Rosalind_0808
60.919540

代码

import os
os.chdir("/home/owht/R/Rosalind")

def calcuGC(list):
    noCG = list.count("G") + list.count("C")
    GCcon=float(noCG)/len(list)
    return GCcon*100

index = []
seqlist = []
longseq = ""

file = open("rosalind_gc.txt")
line = file.readlines()
file.close()

noline = 0
for seq in line:
    if ">" in seq:
        index.append(seq)
        seqlist.append(longseq.replace("\n",""))
        longseq = ""
        noline +=1
    else:
        longseq = longseq + seq.replace("\n","")
        noline +=1
    if noline ==  len(line):
        seqlist.append(longseq.replace("\n",""))

seqlist = seqlist[1:]
result = []
for longseq in seqlist:
    result.append(calcuGC(longseq))

SeqID = index[result.index(max(result))].replace(">","")
SeqID = SeqID.replace("\n","")

SeqGC = max(result)

file = open("result.txt","w")
file.write(SeqID)
file.write("\r")
file.write(str(SeqGC))
file.close()