file_r = open("sequence.gbk")
file_w = open("sequence.fasta", "w")
flag = 0
for line in file_r:
if line[0:9] == 'ACCESSION':
#得到第一个空格和第二个空格之间的内容,[2]:得到第二个空格和第三个空格之间的内容
AC = line.split()[1].strip() # AC = line.split( )[1]
file_w.write('>'+AC + '\n')
elif line[0:6] == 'ORIGIN':
flag = 1
elif flag == 1:
fields = line.split()
#[]不为空,!="",字符串不为空
if fields != []:
seq = ''.join(fields[1:])#去掉列表中第一个元素(下标为0)的其他元素
file_w.write(seq.upper() + '\n')#upper将小写字母转为大写字母
file_r.close()
file_w.close()
gbk文件:
LOCUS DQ199648 399 bp mRNA linear PLN 01-OCT-2006
DEFINITION Astragalus sinicus isolate AsE246 LTP-like protein 1 mRNA, complete
cds.
ACCESSION DQ199648
VERSION DQ199648.1
KEYWORDS .
SOURCE Astragalus sinicus
ORGANISM Astragalus sinicus
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae;
Pentapetalae; rosids; fabids; Fabales; Fabaceae; Papilionoideae; 50
kb inversion clade; NPAAA clade; Hologalegina; IRL clade; Galegeae;
Astragalus.
REFERENCE 1 (bases 1 to 399)
AUTHORS Chou,M.-X., Wei,X.-Y. and Zhou,J.-C.
TITLE Identification of new nodulin cDNAs from Astragalus sinicus by SSH
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 399)
AUTHORS Chou,M.-X., Wei,X.-Y. and Zhou,J.-C.
TITLE Direct Submission
JOURNAL Submitted (11-SEP-2005) National Key Laboratory of Agricultural
Microbiology, Huazhong Agricultural University, Wuhan, Hubei
430070, China
FEATURES Location/Qualifiers
source 1..399
/organism="Astragalus sinicus"
/mol_type="mRNA"
/isolate="AsE246"
/db_xref="taxon:47065"
CDS 1..399
/codon_start=1
/product="LTP-like protein 1"
/protein_id="ABB13623.1"
/translation="MKFAYVVVVMCIMVVLNPSMTEAETISCREVVVTLTPCFPYLLS
GYGPSQSCCEAIKSFKIVFKNKINGQIACNCMKKAAFFGLSNANAEALPEKCNVKMHY
KINTSFDCTSIQDLKNVNVEKIQILQTLLV"
ORIGIN
1 atgaaatttg catatgtggt tgtggtgatg tgcatcatgg tagtgttgaa tccatccatg
61 actgaggcag aaacaattag ttgccgtgaa gtggtggtga cgctcactcc ttgcttccca
121 tatttgctta gtggttatgg tccatcccaa tcttgttgtg aagcaattaa gagtttcaaa
181 attgtcttta aaaacaaaat taacggtcaa atcgcctgta attgtatgaa aaaagcagcg
241 ttttttgggt tgagcaacgc taatgctgaa gcactccctg aaaaatgcaa tgtcaaaatg
301 cactacaaga tcaacacatc cttcgactgt accagcatac aagatctaaa gaacgtgaat
361 gtggagaaga ttcagatact tcaaactttg ttggtctag
//
结果文件:
>DQ199648
ATGAAATTTGCATATGTGGTTGTGGTGATGTGCATCATGGTAGTGTTGAATCCATCCATG
ACTGAGGCAGAAACAATTAGTTGCCGTGAAGTGGTGGTGACGCTCACTCCTTGCTTCCCA
TATTTGCTTAGTGGTTATGGTCCATCCCAATCTTGTTGTGAAGCAATTAAGAGTTTCAAA
ATTGTCTTTAAAAACAAAATTAACGGTCAAATCGCCTGTAATTGTATGAAAAAAGCAGCG
TTTTTTGGGTTGAGCAACGCTAATGCTGAAGCACTCCCTGAAAAATGCAATGTCAAAATG
CACTACAAGATCAACACATCCTTCGACTGTACCAGCATACAAGATCTAAAGAACGTGAAT
GTGGAGAAGATTCAGATACTTCAAACTTTGTTGGTCTAG