提取fasta文件genome_test.fa中第14号染色体的序列,其内容如下:
>chr1
ATATATATAT
>chr2
ATATATATATCGCGCGCGCG
>chr3
ATATATATATCGCGCGCGCGATATATATAT
>chr4
ATATATATATCGCGCGCGCGATATATATATCGCGCGCGCG
>chr5
ATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATAT
>chr6
ATCGATGCAGCATG
>chr7
TATCGCGCGCGCGATATAT
>chr8
ATATCGCGCGCGCGATATATATATCGCG
>chr9
ATCGCGCGCGCGATATATATATCGCG
>chr10
GCGCGCGATATAT
>chr11
CGCGATATATATATC
>chr12
ATATATCGCGCGCGCGATATAT
>chr13
ATATATCGCGCGCGCGATATATGCGATATATATATC
>chr15
ATATATGCGAT
>chr14
GCGCGCGCGATATATGCGAT
>chr16
GCGATATATGCGATATATATATC
>chr17
GCGCGCGCGATATATATATCGCGCGCGCGATATATATAT
>chr18
GCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGATATATATATC
>chr19
ATATGCGATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA
>chr20
TATGCGATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA
>chr21
TATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA
>chr22
ATATATATCGCGCGCGCGATATATATATATATGCGA
>chrX
CGCGCGCGATATATATATATATGCGA
>chrY
CGCGCGCGATATATATATATATGCGACGCGCGCGATATATATATATATGCGACGCGCGCGATATATATATATATGCGA
用python以及命令行参数实现
新建.py文件“”GetSeqFromChrID.py”,
python脚本如下:
1 importargparse2
3 defread_fasta(input):4
5 with open(input, ‘r‘) as f:6 fasta ={}7 for line inf:8 line =line.strip()9 if line[0] == ‘>‘:10 header = line[1:]11 else:12 sequence =line13 fasta[header] = fasta.get(header, ‘‘) +sequence14
15 returnfasta16
17
18 if __name__ == ‘__main__‘:19 #read arguments
20 parser = argparse.ArgumentParser(description="this program is used to extract a single"
21 "sequence from genome")22 parser.add_argument(‘--input‘, ‘-i‘,23 type=str,24 help=‘input file in fasta format‘)25 parser.add_argument(‘--output‘, ‘-o‘,26 type=str,27 help=‘output file‘)28 parser.add_argument(‘seq_id‘,29 type=str,30 help=‘sequence id‘)31 args =parser.parse_args()32
33 fasta =read_fasta(args.input)34 with open(args.output, ‘w‘) as f:35 f.write(‘>{:s}\n{:s}\n‘.format(args.seq_id,fasta.get(args.seq_id, ‘can not found this sequence‘)))
命令行参数输入如下:红色字体是输入部分
1 (base) e:\15_python\DEBUG>python GetSeqFromChrID.py -i genome_test.fa -o chr14.fa chr14
结果如下: