使用一个集合是没有好处的,除非你真的想用循环值保持一个唯一的行,不仅要保持行的唯一性,还需要先查找所有文件中的唯一值,Counterdict可以做到:with open("test.csv", encoding="utf-8") as f, open("file_out.csv", "w") as out:
from collections import Counter
from csv import reader, writer
wr = writer(out)
header = next(f) # get header
# get count of each first/last name pair lowering each string
counts = Counter((a.lower(), b.lower()) for a, b, *_ in reader(f))
f.seek(0) # reset counter
out.write(next(f)) # write header ?
# iterate over the file again, only keeping rows which have
# unique first and second names
wr.writerows(row for row in reader(f)
if counts[row[0].lower(),row[1].lower()] == 1)
输入:
^{pr2}$
文件输出:FirstName,LastName,id,id2,id3
Jacob,Smith,456,372,383
Contractor,#1,8dh,28j,153s
Testing2,Contrator,7463,99999,0283
counts计算每个名称在降低后出现的次数。然后我们重置指针,只写前两列值在整个文件中只出现一次的行。在
或者没有csv模块,如果有namy列,可能会更快:with open("test.csv") as f, open("file_out.csv","w") as out:
from collections import Counter
header = next(f) # get header
next(f) # skip blank line
counts = Counter(tuple(map(str.lower,line.split(",", 2)[:2])) for line in f)
f.seek(0) # back to start of file
next(f), next(f) # skip again
out.write(header) # write original header ?
out.writelines(line for line in f
if counts[map(str.lower,line.split(",", 2)[:2])] == 1)