检测列表是否含有重复元素保持元素顺序的同时去除重复元素

最新推荐文章于 2024-05-04 12:19:39 发布

LanceLeng

最新推荐文章于 2024-05-04 12:19:39 发布

阅读量1.4k

点赞数

分类专栏： Python 文章标签： Python 列表去重复编程技巧

本文链接：https://blog.csdn.net/lanceleng/article/details/8707668

版权

Python 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

原文发于：http://qing.weibo.com/2059598087/7ac2f907330025yy.html

如何测试列表是否没有重复元素？（来自博客文章Python Tips, Tricks, and Hacks http://www.siafoo.net/article/52）

numbers = [1,2,3,3,4,1]
set(numbers)
# returns set([1,2,3,4])


if len(numbers) == len(set(numbers)):
    print('List is unique!')
else:
    print("List is not unique!")
# In this case, print 'List is not unique!'

【自行补充】如何在保持列表元素顺序的情况下去除重复元素？

numbers = [1,2,3,3,4,1]
d = {}
for i,key in enumerate(numbers):
    d.setdefault(key, i)
result = [key for key in sorted(d, key=d.__getitem__)]
#result is [1, 2, 3, 4]

另一种方案：

result = [item for i,item in enumerate(numbers[::-1]) if item not in numbers[:-i-1]]
result.reverse()

注1：如果需要实现[1, 2, 1]-->[2, 1]
那么第一种方案中d.setdefault(key, i)应该为d[key] = i
第二种方案则不需要numbers[::-1]，当然，相关逻辑也要跟着改变。
注2：当原始列表中各项为不可变对象，且列表长度较大时，第一种方案要快很多。

受 Python Cookbook 一书 Recipe 17.5 的启发，当列表元素hashable时，可使用以下代码：

def uniquer(seq):
    def g(seq):
        s = set()
        for i in seq:
            if i not in s:
                s.add(i)
                yield i
    result = list(g(seq))
    return result

当数据较为庞大时，该函数较上述字典比较的方案要快得多。这主要是因为没有sort的时间限制，当然，会有多余的函数调用时间消耗。
根据中文文章字符串测试的结果显示，该方案时间约为上述方案的1/3。尤为值得一提的是，当seq长度上万上亿时，enumerate的调用也是相当耗时耗空间的。(可考虑放弃enumerate(seq)而是用zip(seq, itertools.count(step=0.015625))， 0.015625是能为二进制准确表示的数字，只是这样一来，浮点数可能上移)