用python提取fasta序列_python学习——通过命令行参数根据fasta文件中染色体id提取染色体序列...

提取fasta文件genome_test.fa中第14号染色体的序列,其内容如下:

>chr1

ATATATATAT

>chr2

ATATATATATCGCGCGCGCG

>chr3

ATATATATATCGCGCGCGCGATATATATAT

>chr4

ATATATATATCGCGCGCGCGATATATATATCGCGCGCGCG

>chr5

ATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATAT

>chr6

ATCGATGCAGCATG

>chr7

TATCGCGCGCGCGATATAT

>chr8

ATATCGCGCGCGCGATATATATATCGCG

>chr9

ATCGCGCGCGCGATATATATATCGCG

>chr10

GCGCGCGATATAT

>chr11

CGCGATATATATATC

>chr12

ATATATCGCGCGCGCGATATAT

>chr13

ATATATCGCGCGCGCGATATATGCGATATATATATC

>chr15

ATATATGCGAT

>chr14

GCGCGCGCGATATATGCGAT

>chr16

GCGATATATGCGATATATATATC

>chr17

GCGCGCGCGATATATATATCGCGCGCGCGATATATATAT

>chr18

GCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGATATATATATC

>chr19

ATATGCGATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA

>chr20

TATGCGATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA

>chr21

TATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA

>chr22

ATATATATCGCGCGCGCGATATATATATATATGCGA

>chrX

CGCGCGCGATATATATATATATGCGA

>chrY

CGCGCGCGATATATATATATATGCGACGCGCGCGATATATATATATATGCGACGCGCGCGATATATATATATATGCGA

用python以及命令行参数实现

新建.py文件“”GetSeqFromChrID.py”,

python脚本如下:

1 importargparse2

3 defread_fasta(input):4

5 with open(input, ‘r‘) as f:6 fasta ={}7 for line inf:8 line =line.strip()9 if line[0] == ‘>‘:10 header = line[1:]11 else:12 sequence =line13 fasta[header] = fasta.get(header, ‘‘) +sequence14

15 returnfasta16

17

18 if __name__ == ‘__main__‘:19 #read arguments

20 parser = argparse.ArgumentParser(description="this program is used to extract a single"

21 "sequence from genome")22 parser.add_argument(‘--input‘, ‘-i‘,23 type=str,24 help=‘input file in fasta format‘)25 parser.add_argument(‘--output‘, ‘-o‘,26 type=str,27 help=‘output file‘)28 parser.add_argument(‘seq_id‘,29 type=str,30 help=‘sequence id‘)31 args =parser.parse_args()32

33 fasta =read_fasta(args.input)34 with open(args.output, ‘w‘) as f:35 f.write(‘>{:s}\n{:s}\n‘.format(args.seq_id,fasta.get(args.seq_id, ‘can not found this sequence‘)))

命令行参数输入如下:红色字体是输入部分

1 (base) e:\15_python\DEBUG>python GetSeqFromChrID.py -i genome_test.fa -o chr14.fa chr14

结果如下:

20190515093743672314.png

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值