hhblits在进行a3m到aln格式转换时生成的文件是错误的,aln应该是a3m文件去除所有小写字母之后的结果。
hhblits在进行这一处理时,加入了太多的gap,导致MSA特别稀疏。
正确转换可参考如下代码:
def convert_a3m_to_aln(a3m_path, aln_path):
aln = []
with open(a3m_path, mode='r') as obj_a3m:
for line in obj_a3m:
if line[0] == '>':
continue
aln.append("".join([_ for _ in line if not _.islower()]))
with open(aln_path, 'w') as obj_aln:
obj_aln.writelines(aln)