Chapter 03 你真的理解字典、集合吗？

最新推荐文章于 2024-08-14 23:43:29 发布

他是只猫

最新推荐文章于 2024-08-14 23:43:29 发布

阅读量328

点赞数

文章标签：开发语言 python

本文链接：https://blog.csdn.net/suwuzs/article/details/126773543

版权

文章目录

前言
字典基础
集合基础
字典和集合的性能

前言

今天我们来学习python的字典和集合，也是非常常见的数据结构。字典和集合在python中被广泛使用，并且性能进行了高度优化，十分重要。

字典基础

字典是由键值对组成的元素的集合，字典是无序的，其长度大小可变，元素可以任意地删减和改变。

相比于列表和元组，字典地性能更优，特别是对于查找、添加和删除操作，字典都可以在常数时间复杂度内完成。

字典中的键和值都是混合类型的；
字典访问可以直接索引键，如果不存在，则会抛出异常；
字典也可以使用get(key, default)函数来进行索引，如果键不存在，调用get()函数可以返回一个默认值；

d = {}  # 空字典
d = {'name': 'Bob', 'score': 99}
e = {'ceo': {'name': 'Tom', 'age': 40}} # 嵌套
d = dict(name='Bob', age=40)
d = dict([('name', 'Bob'), ('age', 40)]) # 键值对
d = dict(zip(keyslist, valslist)) # 拉链式键值对
d = dict.fromkeys(['a', 1]) # 键列表
d['name'] # 通过键索引
d['ceo']['age'] # 嵌套索引
'age' in d
'age' in d.keys()
d.values() # 所有值
d.items() # 所有键值元组
d.update(d1) # 通过键合并
d.get(key, default?) # 通过键获取，如果不存在默认返回None
d.pop(key, default?) # 通过键删除，如果不存在返回错误
d.setdefault(key, default?) # 通过键获取，如果不存在默认设置为None
d.popitem() # 删除/返回所有的（键，值）对
len(d) # 长度
d[key] = 42 # 新增/修改键，删除键
del d[key] # 根据键删除条目
list(d.keys())
d1.keys() & d2.keys()
Dictionary views # 查看字典键
D = {x: x*2 for x in range(10)} # 字典推导

对于字典，我们通常可以根据键或值，进行升序或降序的排序；

d = {'Tom': 90, 'Bob': 100, 'Cat': 88}
d_sorted_by_keys = sorted(d.items(), key=lambda  x:x[0])
print(d_sorted_by_keys)
d_sorted_by_values = sorted(d.items(), key=lambda  x:x[1])
print(d_sorted_by_values)

返回的则是一个列表，列表中的元素则是由原字典的键和值组成的元组。

输出：
[('Bob', 100), ('Cat', 88), ('Tom', 90)]
[('Cat', 88), ('Tom', 90), ('Bob', 100)]

集合基础

集合和字典基本相同，唯一的区别，就是集合没有键值对，是一系列无序的、唯一的元素组合。也就是说集合只能包含不可变的对象类型。因此，列表和字典是不能嵌入到集合中的，但如果你需要存储复合对象的话，元组是可以嵌入集合的。

s = set('spam')
print(s)
print(s[0])

在这里插入图片描述
集合并不支持索引操作，因为集合本质上是一个哈希表，和列表不一样。当然，集合和字典一样，也支持增删改查。

x = set('abcd')
y = set('cdxy')
print('x:', x)
print('y:', y)
# 集合通过表达式支持一般的数学集合运算
print(x-y)
print(x.difference(y))
print(x|y)
print(x.union(y))
print(x&y)
print(x.intersection(y))
print(x^y)
print(x.symmetric_difference(y))

输出：
x: {'b', 'd', 'a', 'c'}
y: {'d', 'x', 'y', 'c'}
{'b', 'a'}
{'b', 'a'}
{'x', 'a', 'c', 'b', 'd', 'y'}
{'x', 'a', 'c', 'b', 'd', 'y'}
{'d', 'c'}
{'d', 'c'}
{'x', 'a', 'b', 'y'}
{'x', 'a', 'b', 'y'}

x in s # 存在
x not in s # 不存在
len(s)
s.add(x) # 如果set s 中还没有元素x 把x添加进来
s.remove(x) # 移除x，如果没有x则会抛出异常
s.discard(x) # 如果元素在s中，则移除x

对于集合，也可以排序，结果则是返回一个有序的列表。

s = {'c', 'a', 'd', 'b'}
print(sorted(s))

输出:
['a', 'b', 'c', 'd']

字典和集合的性能

字典和集合都是进行过性能高度优化的数据结构，特别是对于查找、添加和删除操作。

def find_product(products, product_id):
    for id, price in products:
        if product_id == id:
            return price
    return None

products = [(1, 100), (2, 200), (3, 300)]
print(f'the price of product 2 is {find_product(products, 2)}')

像这个例子就是用列表来存储这些数据，并进行查找。假设列表有n个元素，而查找的过程就是要遍历列表，那么时间复杂度就为O（n）。即使我们先对列表排序，然后用二分查找，也会需要O（logn）的时间复杂度。更别提列表排序也需要时间复杂度了。

products = {1: 100, 2: 200, 3: 300}

如果我们使用字典来存储这些数据，那么查找则会非常便捷、高效，只需要O（1）的时间复杂度即可完成。原因就是字典的内部组成了一张哈希表，可以直接通过键的哈希值，找到其对应的值。

# 列表实现
def find_unique_price(products):
    u_price_ls = list()
    for _, price in products:
        if price not in u_price_ls:
            u_price_ls.append(price)
    return u_price_ls

products = [
    (1, 100),
    (2, 200),
    (3, 300),
    (4, 100)
]
print(find_unique_price(products))

# 集合实现
def find_unique_price(products):
    u_price_ls = set()
    for _, price in products:
        u_price_ls.add(price)
    return u_price_ls

products = [
    (1, 100),
    (2, 200),
    (3, 300),
    (4, 100)
]
print(find_unique_price(products))

列表实现版本有两层循环，若有n个元素在最差情况下，需要O（n^2）的时间复杂度；
使用集合的数据结构，由于集合是高度优化的哈希表，且里面的元素不能重复，添加和查找操作的时间复杂度为O（1），那么总的时间复杂度为O（n）。

import time
import random

def find_unique_price(products):
    u_price_ls = list()
    for _, price in products:
        if price not in u_price_ls:
            u_price_ls.append(price)
    return u_price_ls

id_ls = random.sample(range(0, 200000), 50000)
price = random.sample(range(200000, 400000), 50000)
products = list(zip(id_ls, price))
start_time = time.perf_counter()
find_unique_price(products)
end_time = time.perf_counter()
print('the cost time of using list is {}'.format(end_time - start_time))

def find_unique_price(products):
    u_price_ls = set()
    for _, price in products:
        u_price_ls.add(price)
    return u_price_ls

start_time = time.perf_counter()
find_unique_price(products)
end_time = time.perf_counter()
print('the cost time of using set is {}'.format(end_time - start_time))

the cost time of using list is 14.018777699908242
the cost time of using set is 0.007197699975222349

可以看到，5万的数据量，就可以看出两者的速度差异如此之大。事实上，企业的业务的后台数据有着大量的亿级数量级，如果使用不当的数据结构，则会很容易造成服务器崩溃。

字典在python3.7+是有序的数据结构，而集合是无序的，其内部的哈希表存储结构，保证了其查找、插入、删除操作的高效性。所以字典和集合通常运用到对元素的高效查找、去重等场景。

对于二者的内部存储结构的具体细节，大家可以查看资料，了解一下。

他是只猫

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
Chapter 03 你真的理解字典、集合吗？

今天我们来学习python的字典和集合，也是非常常见的数据结构。字典和集合在python中被广泛使用，并且性能进行了高度优化，十分重要。
复制链接

扫一扫