Python 写的 txt的单文件去重。 和 双文件对比去重,文件数据在G级别以上都没问题。
经测试,单文件去重5GB的txt仅需要不到1分钟。 双文件对比去重时间没有测试。也很快!
单文件去重代码
# -*- coding:utf-8 -*-
#! python2
import shutil
a=0
readDir = "E:/1.txt" #old
writeDir = "E:/2.txt" #new
# txtDir = "/home/Administrator/Desktop/1"
lines_seen = set()
outfile = open(writeDir, "w")
f = open(readDir, "r")
for line in f:
if line not in lines_seen:
a+=1
outfile.write(line)
lines_seen.add(line)
print(a)
print('\n')
outfile.close()
print("success")
多文件对比去重
#!/usr/bin/env python
# -*- coding:utf-8 -*-
def file_qc():
str1 = []
file_1 = open("1.txt","r",encoding="utf-8")
for line in file_1.readlines():
str1.append(line.replace("\n",""))
str2 = []
file_2 = open("2.txt", "