Python读取文件找出重复元素

最新推荐文章于 2023-06-21 03:50:58 发布

阿吉的CV之路

最新推荐文章于 2023-06-21 03:50:58 发布

阅读量4.3k

点赞数

分类专栏： python基础文章标签： python

本文链接：https://blog.csdn.net/qq_35140742/article/details/120101265

版权

python基础专栏收录该内容

7 篇文章

订阅专栏

这篇博客展示了如何使用Python读取文件，找出重复的元素，并进行计数。通过双重循环遍历文件内容，判断并记录重复值，最后进行去重操作。重点关注了重复值的判断逻辑和去重后的元素集合。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Python读取文件找出重复元素

思路:

双重循环遍历判断有没有重复值，拿文件中的每一个元素和全部元素进行比较，若出现以下几种情况则分别判断1.若文件中某个元素出现多个重复值且该重复值的最后一个元素没有位于文件尾部，则判断它是否在重复列表中并且后面出现的元素中没有和它相同的，则将它插入重复列表repeat_list尾部，且重复值count加1。2.若文件中某个元素出现多个重复值且该重复值的最后一个元素位于文件尾部，那么此时将它插入重复列表repeat_list尾部，且重复值count加1.

import os
# 文件列表
num_list = ""
# 向duplicate.txt文件中动态写入"a、b、c、d、c、c、f、c、d、e"，若文件不存在则自动创建
with open("D:/duplicate.txt", 'w') as f:
    f.write("a、b、c、d、c、c、f、c、d、e")
# 按行读取duplicate.txt文件，检查写入是否成功
with open("D:/duplicate.txt", 'r') as f:
    line = f.readlines()
    print(f"total_line:{line}")
# 读取duplicate.txt文件并去除每行的头部和尾部的空格，将其拼接成一个字符串，根据符号'、'进行分割（返回一个列表）
paths = "D:/duplicate.txt"
with open(paths, 'r') as f:
    for line in f.readlines():
        num_list += line.strip()
    num_list = num_list.split('、')

print(f'num_list:{num_list}')
print(len(num_list))
# 重复值计数
count = 0
# 重复值名称
repeat_name = []
# 重复值索引
repeat_index = []
for i in range(len(num_list)):
    for j in range(i + 1, len(num_list)):
    	# 若出现重复值，则计数加1，并分别向重复值名称、索引列表中添加该元素和其索引，结束j的循环，避免重复添加
        if num_list[i] == num_list[j]:
            repeat_name.append(num_list[i])
            repeat_index.append(i)
            count += 1
            break
         # 若该元素位最后一个重复值且不在尾部时的判断
        if num_list[i] in repeat_name and j == len(num_list) - 1:
            count += 1
            repeat_index.append(i)
            repeat_name.append(num_list[i])
    # 判断尾部元素是否是某个元素的重复值
    if i == len(num_list) - 1:
        if num_list[i] in repeat_name:
            count += 1
            repeat_name.append(num_list[i])
print(f'repeat_count:{count}')
print(f'repeat_name:{repeat_name}')
print(f'repeat_index:{repeat_index}')
print(f'count_repeat:{count}')
num_list = set(num_list)
print(f'deduplication_list:{num_list}')

输出:
total_line:['a、b、c、d、c、c、f、c、d、e']
num_list:['a', 'b', 'c', 'd', 'c', 'c', 'f', 'c', 'd', 'e']
10
repeat_count:6
repeat_name:['c', 'd', 'c', 'c', 'c', 'd']
repeat_index:[2, 3, 4, 5, 7, 8]
count_repeat:6
deduplication_list:{'d', 'a', 'c', 'b', 'f', 'e'}