Python进行序列去重复

showro_isme

已于 2022-12-12 19:39:50 修改

阅读量904

点赞数 1

分类专栏： Biopython 文章标签： python 开发语言

于 2022-12-12 19:35:53 首次发布

本文链接：https://blog.csdn.net/weixin_48406854/article/details/128256337

版权

1-情景：现在有一个txt文件，里面许多fasta格式的蛋白序列，需要去除其中重复的序列，得到非冗余的序列。

2-用到的python数据结构为字典

Python-Set数据结构的使用
 python 字典数据结构的操作
 python 字典-根据value来获取key

3-代码如下

#打开fasta文件 newfile, 写入outfile1，冗余的部分写入outfile2, 在outfile1中找到outfile2中相同的部分，写入outfile

import os
path="C:/Users/luo/Desktop/P/1 SeqDeal-replace/Nr blast/"
file=open(path+"sum.txt",'r')
newfile=path+"0.txt"
outfile=open(newfile, "w")
seq1=dict()
seq2=<