基于Astral利用单拷贝同源基因构建物种树

最新推荐文章于 2024-09-25 07:57:18 发布

练习时长两年半的生信生

最新推荐文章于 2024-09-25 07:57:18 发布

阅读量7.2k

点赞数 6

分类专栏： python

本文链接：https://blog.csdn.net/liuninghua521/article/details/112999243

版权

该博客介绍了如何使用Python脚本在Linux环境下，通过proteinortho、muscle和Astral软件，自动化构建单拷贝同源蛋白的系统发育树。首先处理序列文本，去除空格，然后进行同源比对，提取单拷贝蛋白，整合序列，进行muscle比对，最后利用Astral构建物种树。整个过程包括序列预处理、比对、过滤、整合、比对、提取和建树等多个步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

想介绍的都在之前的文章里了构建单拷贝同源蛋白系统发育树，一条命令提序列!
不同的是，之前是将得到的单拷贝同源基因比对后进行了串联，每个物种都得到一个很大的序列，然后进行建树；现在是使用并联的方法，是将每个单拷贝同源基因集比对后建树，然后再利用Astral构建了物种树，在之前的脚本上进行了功能扩充，输入的命令不变，只是最后直接多了一个*.gene.species.tree的树文件，树都建好了。

##构建单拷贝同源蛋白, python3
##在linux下运行，需要安装proteinortho与muscle，proteinortho安装地址：http://www.bioinf.uni-leipzig.de/Software/proteinortho/，muscle地址：http://www.drive5.com/muscle/。
##需要安装Astral，地址：https://github.com/smirarab/ASTRAL
##作者：刘宁华  山东大学青岛校区  1039438318@qq.com

#1. 处理序列文本
def chuli_seq(in_file_0):
	import os
	path = os.getcwd()
	path = path + "/" + in_file_0 + "/"
	files = os.listdir(path)
	file_path = []
	for file in files:
		file_path_1 = path + file
		file_path.append(file_path_1)
	for file in file_path:
		cmd = "sed -i 's/ /_/g' " + file
		os.system(cmd)
	return print("1. 序列预处理")


#2. 运行同源比对命令
def protein_ortho(file):
	import os
	cmd = "proteinortho6.pl -project=" + file + "_work -e=1e-5 -cov=50 -identity=50 -clean " + file + "/*.faa"
	os.system(cmd)
	return print("2. 完成同源比对")


#3. 比对结果处理，提取单拷贝蛋白结果
def guoLv_result(in_file_1, num):
	in_file_1 = open(in_file_1, 'r')
	out_file_1 = open("work_ortho", 'w')
	for line in in_file_1:
		if line.split('\t')[0] == line.split('\t')[1] == num:
			out_file_1.write(line)
	out_file_1.close()
	return print("3. 完成文件过滤")


#4. 整合全基因组蛋白序列为一个文件whole_protein
def get_whole_protein(file_name):
	import os
	cmd = "cd " +

最低0.47元/天解锁文章