exonerate结果文件提取(蛋白序列)

近期在使用exonerate进行蛋白比对基因,对其结果log文件未找到方便提取的脚本,自己写了一个,python脚本(未进行优化,欢迎优化评论)
使用:python  脚本.py  log文件

思路就是:把结果Target行提取出,生成初步的三个字母的蛋白文件,再次对三个字母的蛋白文件处理转化为单个字母蛋白文件,如此即可

log文件如下:

C4 Alignment:
------------
         Query: test
        Target: Chr09a
         Model: protein2genome:local
     Raw score: 10528
   Query range: 0 -> 2034
  Target range: 4150993 -> 4157095

       1 : MetThrLeuSerGlyAspIleLysAlaLeuValAspAsnProGluSerPheLeuAr :      19
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           MetThrLeuSerGlyAspIleLysAlaLeuValAspAsnProGluSerPheLeuAr
 4150994 : ATGACTCTCTCTGGCGATATTAAAGCGTTGGTGGACAATCCAGAATCCTTTTTAAG : 4151048

      20 : gAspAsnArgLeuGlyPheAsnLeuAsnArgAsnIleAlaArgLysAspGlnLeuV :      38
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           gAspAsnArgLeuGlyPheAsnLeuAsnArgAsnIleAlaArgLysAspGlnLeuV
 4151049 : GGATAATCGTCTGGGCTTCAACCTCAATCGCAACATAGCGAGGAAAGACCAGCTTG : 4151105

      39 : alLysLeuValArgValThrAlaAsnSerTyrAspLeuLysPheSerGluThrGlu :      56
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           alLysLeuValArgValThrAlaAsnSerTyrAspLeuLysPheSerGluThrGlu
 4151106 : TAAAACTGGTTCGAGTCACAGCGAACTCGTACGATCTTAAATTTTCCGAGACAGAG : 4151159

      57 : SerGluGluAsnThrIleSerSerTyrIleLeuGlyTyrLysThrAsnGluAlaAs :      75
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           SerGluGluAsnThrIleSerSerTyrIleLeuGlyTyrLysThrAsnGluAlaAs
 4151160 : TCAGAGGAAAACACGATATCCAGCTACATCCTTGGATACAAGACGAACGAAGCAAA : 4151216

      76 : nAspAlaValPheLeuAspIleProSerArgGlyValLysGluGlyThrPheLeuP :      94
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           nAspAlaValPheLeuAspIleProSerArgGlyValLysGluGlyThrPheLeuP
 4151217 : TGATGCCGTGTTTCTGGACATCCCGAGCAGAGGCGTGAAGGAGGGAACATTTTTGT : 4151273

      95 : heThrSerGluLeuSerGlyCysSerLeuValValThrArgLeuLysAspAspThr :     112
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           heThrSerGluLeuSerGlyCysSerLeuValValThrArgLeuLysAspAspThr
 4151274 : TCACATCTGAACTCTCCGGCTGCTCCCTCGTCGTCACACGGCTGAAAGATGATACA : 4151327

代码如下:

import os
import re
import sys
aa_codes = {
    'Ala':'A','Cys':'C','Asp':'D','Glu':'E',
    'Phe':'F','Gly':'G','His':'H','Lys':'K',
    'Ile':'I','Leu':'L','Met':'M','Asn':'N',
    'Pro':'P','Gln':'Q','Arg':'R','Ser':'S',
    'Thr':'T','Val':'V','Tyr':'Y','Trp':'W'
} #转换字典列表

#下面是从log文件中提取出Target结果
t = open("pro-three.fa", "w")
with open(sys.argv[1], 'r') as f:
    a =[]
    for num, line in enumerate(f):
        if '|' in line or '!'  in line:
            a.append(num + 1)
        elif 'Query:' in line:
            print ("\n>" + line.strip().split()[1] + " ", end= "", file = t)
        elif 'Target:' in line:
            print (line.strip().split(': ')[2] + " ", end = "", file = t),
        elif 'Target range:' in line:
            print (line.strip().split()[2] + "——>" + line.strip().split()[4], file = t),
        elif num in a:
            b = re.sub(r'[^A-Za-z]','', line[1:-1])
            print (b, end="", file = t)
t.close()

#下面是对结果文件进行三字符转换
fout_tmp = open('pro-tmp.fa', 'w')
with open("pro-three.fa", 'r', encoding='utf-8') as fin:
    D =[]
    for num, line in enumerate(fin):
        if '>' in line:
            D.append(num + 1)
            print("\n", line, sep="", end= "", file = fout_tmp)
        elif num in D:
            e = re.sub(r"([A-Z])", r" \1", line).split()
            for i in range(len(e)):
                print(aa_codes.get(e[i]), end='', file = fout_tmp)
fin.close()
fout_tmp.close()

#下面是将最终结果剔除空行
file1 = open('pro-tmp.fa', 'r', encoding='utf-8') # 要去掉空行的文件
file2 = open('pro-one.fa', 'w', encoding='utf-8') # 生成没有空行的文件
try:
    for line in file1.readlines():
        if line == '\n':
            line = line.strip("\n")
        file2.write(line)
finally:
    file1.close()
    file2.close()
    os.remove("pro-tmp.fa")
print   ("提取结束\npro-one.fa为单字母氨基酸序列\npro-three.fa为三字母氨基酸序列")

欢迎交流!
作者邮箱:Luanxins@163.com

学习了作者:msw521sg的脚本

exonerate结果整理,获取target序列_msw521sg的博客-CSDN博客_exonerate

  • 3
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
第 1 章 Unix/Linux操作系统介绍...........................................................................................................1 1.1 远程登陆...................................................................................................................................1 1.2 文件的复制、删除和移动命令................................................................................................7 1.3 目录的创建、删除及更改目录命令........................................................................................9 1.4 文本查看命令.........................................................................................................................11 1.5 文本处理命令.........................................................................................................................13 1.6 改变文件或目录的权限命令..................................................................................................16 1.7 备份与压缩命令.....................................................................................................................18 1.8 磁盘及系统管理.....................................................................................................................20 1.9 软件安装简介.........................................................................................................................22 1.10 其他......................................................................................................................................23 第2 章 数据的基本处理........................................................................................................................25 2.1 测序原理介绍..........................................................................................................................25 2.2 峰图转化 Phred ......................................................................................................................27 2.3 Phd2Fasta ...............................................................................................................................32 2.4 载体屏蔽 cross_match ........

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值