干货不看看嘛：python列表去重的5种方法

最新推荐文章于 2024-04-25 19:15:00 发布

Seon塞翁

最新推荐文章于 2024-04-25 19:15:00 发布

阅读量1.1k

点赞数

分类专栏：干货文章标签： python

本文链接：https://blog.csdn.net/zohan134/article/details/109307694

版权

干货专栏收录该内容

15 篇文章 4 订阅

订阅专栏

方法收集自python群大佬

数据用例

lst1 = [1,2,2,3,4,3,5,2]
lst2 = [1,6,5,6,4,3,2,1]

1、精简逻辑版

利用 not in 快速筛查，非常Pythonic的方法，优雅而高效。

def remove_duplicate_1(lst):  # 精简逻辑版
    uni_lst = []
    for i in lst:
        if i not in uni_lst:
            uni_lst.append(i)
    return uni_lst

输出：

[1, 2, 3, 4, 5]
[1, 6, 5, 4, 3, 2]

2、集合版

利用集合元素不重复的特性来去重，更加快速，去重后可根据原始列表索引重新排序。

def remove_duplicate_2(lst):  # 集合版
    uni_lst = list(set(lst))
    uni_lst.sort(key=lst.index)
    return uni_lst

直接输出：

[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5, 6]

排序输出：

[1, 2, 3, 4, 5]
[1, 6, 5, 4, 3, 2]

3、字典版

利用 setdefault() 方法，若键不存在则添加。

def remove_duplicate_3(lst):  # 字典版
    dic = {}
    for index, key in enumerate(lst):
        dic.setdefault(key, index)
    uni_lst = list(dic.keys())
    return uni_lst

创建的字典：

{1: 0, 2: 1, 3: 3, 4: 4, 5: 6}
{1: 0, 6: 1, 5: 2, 4: 4, 3: 5, 2: 6}

输出：

[1, 2, 3, 4, 5]
[1, 6, 5, 4, 3, 2]

4、numpy 版

np.unique 方法返回去重后的 numpy 数组，默认按值排序，return_index 参数指定同时返回原始索引，之后可按原索引重新排序。最终返回类型为 numpy.ndarray ，可再套一个 list() 。

def remove_duplicate_4(lst):  # numpy版
    import numpy as np
    uni_lst, index = np.unique(lst, return_index=True)
    uni_lst = uni_lst[np.argsort(index)]
    return uni_lst

输出：

[1 2 3 4 5]
[1 6 5 4 3 2]

5、pandas 版

利用 Series 对象的 drop_duplicates 方法去重， keep 参数作用可指定去重后保留的元素顺位，如 first 即保留第一次出现的元素，操作较灵活但执行效率较低。

def remove_duplicate_5(lst):  # pandas版
    import pandas as pd
    s = pd.Series(lst)
    uni_lst = list(s.drop_duplicates(keep="first"))
    return uni_lst

输出：

[1, 2, 3, 4, 5]
[1, 6, 5, 4, 3, 2]

keep=“last” 时输出：

[1, 4, 3, 5, 2]
[5, 6, 4, 3, 2, 1]

5种方法效率比较

生成 10 万个 20以内的随机数用作测试列表，并自定义一个计时装饰器。

import random
import time

test_list = random.choices(range(20), k=100000)

def get_time(func):
    def wraper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        print(f"{func.__name__} 用时:{round(end_time-start_time,3)}秒")
        return result
    return wraper

装饰器用例：

@get_time
def remove_duplicate_1(lst):  # 精简逻辑版
    uni_lst = []
    for i in lst:
        if i not in uni_lst:
            uni_lst.append(i)
    return uni_lst

执行5种去重方法：

remove_duplicate_1(test_list)
remove_duplicate_2(test_list)
remove_duplicate_3(test_list)
remove_duplicate_4(test_list)
remove_duplicate_5(test_list)

输出：

remove_duplicate_1 用时:0.017秒
remove_duplicate_2 用时:0.002秒
remove_duplicate_3 用时:0.014秒
remove_duplicate_4 用时:0.157秒
remove_duplicate_5 用时:0.513秒

可见利用集合去重的执行效率最高（吊打其他）。这里是Seon塞翁，下篇再见。

Seon塞翁

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
干货不看看嘛：python列表去重的5种方法

用5种方法对列表元素去除重复项
复制链接

扫一扫

专栏目录

干货不看看嘛：python列表去重的5种方法

数据用例

1、精简逻辑版

2、集合版

3、字典版

4、numpy 版

5、pandas 版

5种方法效率比较

“相关推荐”对你有帮助么？