python处理txt文件内容

最新推荐文章于 2024-05-10 18:56:08 发布

野指针_01

最新推荐文章于 2024-05-10 18:56:08 发布

阅读量552

点赞数

分类专栏：实用文章标签： python

本文链接：https://blog.csdn.net/chetttt/article/details/113855153

版权

实用专栏收录该内容

1 篇文章 0 订阅

订阅专栏

本文介绍了如何使用Python实现生物信息学数据文件的去重和排序，保留唯一序列，并分别记录原始标识符。首先，通过读取文件并利用set数据结构去重，其次，将不同副本与原始标识符对应写入新文件。接着，通过对比和排除重复序列创建唯一序列文件，确保每个序列的独特性。

摘要由CSDN通过智能技术生成

python实现txt文件内容处理（去重序列及排序，求唯一序列）

有如下生物信息学数据，存在内容完全相同的序列。

在这里插入图片描述

需求1：找出相同序列，然后在文件中只保留他们的一个副本，并在文件中分别保留对应的原始标识符列表
需求2：在另外一个文件保存唯一序列

需求1

#去重序列及排序
lines_seen = set()
readDir = "123.txt"
f = open(readDir,"r")
writeDir = "456.txt"
outfile=open(writeDir,"w")

while True:
    line1 = f.readline()
    line2 = f.readline()
    if not line2: break  # EOF
    if line2 not in lines_seen:
        outfile.write(line1)
        lines_seen.add(line2)
        e = open(readDir, "r")
        while True:
            line3 = e.readline()
            line4 = e.readline()
            if not line4: break  # EOF
            if (line4 == line2) and (line3 != line1):
                outfile.write(line3)
        outfile.write(line2)

outfile.close()
print ("success")

result：
在这里插入图片描述

需求2

#求唯一序列
readDir = "123.txt"
writeDir = "789.txt"
lines_seen = set()
lines_remove = set()
seen = set()
outfile=open(writeDir,"w")
f = open(readDir,"r")

while True:
    line1 = f.readline()
    line2 = f.readline()
    if not line2: break  # EOF
    if line2 in seen:
        lines_remove.add(line2)
    else:
        seen.add(line2)

e = open(readDir,"r")
while True:
    line1 = e.readline()
    line2 = e.readline()
    if not line2: break  # EOF
    if line2 not in lines_remove:
        outfile.write(line1)
        outfile.write(line2)

outfile.close()
print ("success")

result：
在这里插入图片描述

野指针_01

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python处理txt文件内容

python实现txt文件内容处理（去重序列及排序，求唯一序列）有如下生物信息学数据，存在内容完全相同的序列。需求1：找出相同序列，然后在文件中只保留他们的一个副本，并在文件中分别保留对应的原始标识符列表需求2：在另外一个文件保存唯一序列需求1#去重序列及排序lines_seen = set()readDir = "123.txt"f = open(readDir,"r")writeDir = "456.txt"outfile=open(writeDir,"w")while
复制链接

扫一扫

专栏目录