Computing GC Content

Computing GC Content

In Rosalind’s implementation, a string in FASTA format will be labeled by the ID “Rosalind_xxxx”, where “xxxx” denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

Sample Dataset

>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

Sample Output

Rosalind_0808
60.919540

代码

import os
os.chdir("/home/owht/R/Rosalind")

def calcuGC(list):
    noCG = list.count("G") + list.count("C")
    GCcon=float(noCG)/len(list)
    return GCcon*100

index = []
seqlist = []
longseq = ""

file = open("rosalind_gc.txt")
line = file.readlines()
file.close()

noline = 0
for seq in line:
    if ">" in seq:
        index.append(seq)
        seqlist.append(longseq.replace("\n",""))
        longseq = ""
        noline +=1
    else:
        longseq = longseq + seq.replace("\n","")
        noline +=1
    if noline ==  len(line):
        seqlist.append(longseq.replace("\n",""))

seqlist = seqlist[1:]
result = []
for longseq in seqlist:
    result.append(calcuGC(longseq))

SeqID = index[result.index(max(result))].replace(">","")
SeqID = SeqID.replace("\n","")

SeqGC = max(result)

file = open("result.txt","w")
file.write(SeqID)
file.write("\r")
file.write(str(SeqGC))
file.close()
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值