python 文本去重排序

最新推荐文章于 2024-09-30 14:40:35 发布

yeah_go

最新推荐文章于 2024-09-30 14:40:35 发布

阅读量2.3k

点赞数

分类专栏： python 文章标签： python

python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文本：

每行在promotion后面包含一些数字，如果这些数字是相同的，则认为是相同的行，对于相同的行，只保留一行。

思路：

根据字典和字符串切割。

建立一个空字典。

读入文本，并对每行切割前半部分，在读入文本的过程中循环在这个字典中查找，如果没找到，则写入该行到字典。否则，则表示该行已经被写入过字典了（即出现重复的行了），不再写入字典，这就实现了对于重复的行只保留一行的目的。

文本如下:

 
      /promotion/232，   utm_source 
     
      /promotion/237 ，  LandingPage/borrowExtend/? ; 
     
      /promotion/25113， LandingPage/mhd 
     
      /promotion/25113， LandingPage/mhd 
     
      /promotion/25199， com/LandingPage 
     
      /promotion/254  ， LandingPage/mhd/mhd4/? ; 
     
      /promotion/259  ， LandingPage/ydy/? ; 
     
      /promotion/25113， LandingPage/mhd 
     
      /promotion/25199 ，com/LandingPage 
     
      /promotion/25199， com/LandingPage

程序如下:

listfile =[]
listson =[]
lista =[]
listrenew = []
with open("G:\\python\\PycharmProjects\\spider\\tempdata.csv", 'r') as listfile:
    # print(type(temp),temp)
    for i in listfile:
         # print(i)
         listson=i.split(",")
         lista.append(listson)
         # print(lista)
listfile.close()
for i in lista:
     if i not in listrenew:
         listrenew.append(i)
# print(listrenew)

 
      #  这里是打印了不重复的行（重复的只打印一次），实际再把这个结果写入文件就可以了， 
     

 
      #  就不写这段写入文件的代码了 
     

1,X.lab中的文件内容如下:

hello,world

ni,hao

bu,hao

hai,hai

no,no

排序后的内容如下：

bu,hao

hai,hai

hello,world

ni,hao

no,no

基本思想：先将文件内容读取到列表中，在列表中进行排序，再从列表中将排好序的元素写进该文件中

[python]view plaincopy 
   
 #!/usr/bin/env python  
 #Filename:sortForFile  
 f=open('/home/sundy/X.lab')  
 result= []  
 iter_f=iter(f) #用迭代器循环访问文件中的每一行  
 for line in iter_f:  
     result.append(line)  
 f.close()  
 result.sort()  
 f=open('/home/sundy/X.lab','w')  
 f.writelines(result)  
 f.close()