Computing GC Content
In Rosalind’s implementation, a string in FASTA format will be labeled by the ID “Rosalind_xxxx”, where “xxxx” denotes a four-digit code between 0000 and 9999.
Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).
Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.
Sample Dataset
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT
Sample Output
Rosalind_0808
60.919540
代码
import os
os.chdir("/home/owht/R/Rosalind")
def calcuGC(list):
noCG = list.count("G") + list.count("C")
GCcon=float(noCG)/len(list)
return GCcon*100
index = []
seqlist = []
longseq = ""
file = open("rosalind_gc.txt")
line = file.readlines()
file.close()
noline = 0
for seq in line:
if ">" in seq:
index.append(seq)
seqlist.append(longseq.replace("\n",""))
longseq = ""
noline +=1
else:
longseq = longseq + seq.replace("\n","")
noline +=1
if noline == len(line):
seqlist.append(longseq.replace("\n",""))
seqlist = seqlist[1:]
result = []
for longseq in seqlist:
result.append(calcuGC(longseq))
SeqID = index[result.index(max(result))].replace(">","")
SeqID = SeqID.replace("\n","")
SeqGC = max(result)
file = open("result.txt","w")
file.write(SeqID)
file.write("\r")
file.write(str(SeqGC))
file.close()