python利用gtf文件提取序列之没有用类封装
1.准备文件:gtf文件和拟南芥的基因组文件
2.一般情况下,我们得到的是拟南芥的gff3文件,需要我们转换一下格式,转换成gtf格式
$ gffread -T Athaliana_447_Araport11.gene.gff3 -o Athaliana_447_Araport11.gene.gtf
3.gtf格式如下:总共九列
phytozomev12 exon 3631 3913 . + . transcript_id "AT1G01010.1.Araport11.447"; gene_id "AT1G01010.Araport11.447"; gene_name "NAC001";
Chr1 phytozomev12 exon 3996 4276 . + . transcript_id "AT1G01010.1.Araport11.447"; gene_id "AT1G01010.Araport11.447"; gene_name "NAC001";
Chr1 phytozomev12 exon 4486 4605 . + . transcript_id "AT1G01010.1.Araport11.447"; gene_id "AT1G01010.Araport11.447"; gene_name "NAC001";
Chr1 phytozomev12 exon 4706 5095 . + . transcript_id "AT1G01010.1.Araport11.447"; gene_id "AT1G01010.Araport11.447"; gene_name "NAC001";
Chr1 phytozomev12 exon 5174 5326 . + . transcript_id "AT1G01010.1.Araport11.447"; gene_id "AT1G01010.Araport11.447"; gene_name "NAC001";
Chr1 phytozomev12 exon 5439 5899 . + . transcript_id "AT1G01010.1.Araport11.447"; gene_id "AT1G01010.Arapor