我是Python新手。我试图以包含8个染色体序列的基因组fasta文件作为输入,对一个查询序列进行爆炸,并提取前50个命中率。
我的代码是:from Bio import SeqIO
from Bio.Seq import Seq
from Bio.Blast import NCBIXML
from Bio.Blast.Applications import NcbiblastnCommandline
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
import os
db = list(SeqIO.parse('genome.fasta', 'fasta'))
for i in range(len(db)):
print "Chromosome_"+str(i+1)
print('Doing the BLAST and retrieving the results...')
output_handle = open("dbn.fasta", "w")
nseq=db[i]
SeqIO.write(nseq, output_handle, "fasta")
output_handle.close()
os.system("makeblastdb -in dbn.fasta -dbtype nucl -out dbn")
result_handle= NcbiblastnCommandline(query="sequence.fasta", db = "dbn", outfmt= 5, out="my_blast_"+str(i)+".xml", evalue = 0.00001, task = "megablast")
stderr, stdout = result_handle()
E_VALUE_THRESH = 1e-100
c=0
print "Extracting hits for Chromosome"+str(i+1)+"in another file"
for record in NCBIXML.parse(open("my_blast_"+str(i)+".xml")):
if record.alignments:
for align in record.alignments:
for hsp in align.hsps:
if hsp.expect < E_VALUE_THRESH:
if c>50: break
start=hsp.sbjct_start
end=hsp.sbjct_end
newSeq = db[i].seq[:end]
newSeq = newSeq[start:]
ids=db[i].id+str(c)
myseq[c]=SeqRecord(Seq(newSeq,IUPAC.DNA))
myseq[c].id=str(ids)
c=c+1
output_handler = open("example_"+str(i)+".fasta", "w")
SeqIO.write(myseq, output_handler, "fasta")
output_handler.close()
第33行出现错误:
^{pr2}$
我试着移除IUPAC.DNA,做出一条线:myseq[c]=SeqRecord(Seq(newSeq)
它给了我一个类型错误:TypeError: The sequence data given to a Seq object should be a string (not another Seq object etc)
有没有解决这个问题的方法,还有没有其他方法可以创建一个多fasta列表变量?在