python字典和集合双向索引_Python字典和集合

最新推荐文章于 2024-07-04 17:08:25 发布

weixin_39588983

最新推荐文章于 2024-07-04 17:08:25 发布

阅读量718

点赞数

文章标签： python字典和集合双向索引

字典和集合基础字典是一系列无序元素的组合，其长度大小可变，元素可以任意的删减和改变。不过，这里的元素是一堆键（key）和值（value）的配对。

集合没有键和值的配对，是一系列无序的、唯一的元素组合。

字典和集合的创建：

d1 = {'name': 'jason', 'age': 20, 'gender': 'male'}

d2 = dict({'name': 'jason', 'age': 20, 'gender': 'male'})

d3 = dict([('name', 'jason'), ('age', 20), ('gender', 'male')])

d4 = dict(name='jason', age=20, gender='male')

d1 == d2 == d3 ==d4

True

s1 = {1, 2, 3}

s2 = Set([1, 2, 3])

s1 == s2

True

set 并不支持索引操作，因为 set 本质上是一个哈希表，和列表不一样

s = {1, 2, 3}

s[0]

Traceback (most recent call last):

File "", line 1, in

TypeError: 'set' object does not support indexing

对于字典，我们通常会根据键和值，进行升序或降序排序：

d = {'b': 1, 'a': 2, 'c': 10}

d_sorted_by_key = sorted(d.items(), key=lambda x: x[0]) # 根据字典键的升序排序

d_sorted_by_value = sorted(d.items(), key=lambda x: x[1]) # 根据字典值的升序排序

d_sorted_by_key

[('a', 2), ('b', 1), ('c', 10)]

d_sorted_by_value

[('b', 1), ('a', 2), ('c', 10)]

字典和集合性能

import time

# list version

def find_unique_price_using_list(products):

unique_price_list = []

for _, price in products: # A

if price not in unique_price_list: #B

unique_price_list.append(price)

return len(unique_price_list)

def find_unique_suing_set(products):

unique_price_set = set()

for _, price in products:

unique_price_set.add(price)

return len(unique_price_set)

id = [x for x in range(0, 100000)]

price = [x for x in range(2000000, 3000000)]

products = list(zip(id, price))

# 计算列表版本时间

start_using_list = time.perf_counter()

find_unique_price_using_list(products)

end_using_list = time.perf_counter()

print("time elapse using list : {}".format(end_using_list-start_using_list))

# 输出

time elapse using list : 44.804451168

# 计算集合版本的时间

start_using_set = time.perf_counter()

find_unique_suing_set(products)

end_using_set = time.perf_counter()

print("time elapse using set: {}".format(end_using_set-start_using_set))

# 输出

time elapse using set: 0.010326210999998864

你可以看到，仅仅十万的数据量，两者的速度差异就如此之大。更不用说上亿乃至十亿数据量了。

字典和集合的工作原理

字典和集合内部结构都是一张哈希表对于字典而言，这张表存储了哈希值（hash），键和值这 3 个元素。对于集合而言，区别就是哈希表内没有键和值的配对，只有单一的元素。

老版本的 python 的哈希结构如下所示：

entries = [

['--', '--', '--']

[-230273521, 'dob', '1999-01-01'],

['--', '--', '--'],

[1231236123, 'name', 'mike'],

['--', '--', '--'],

[9371539127, 'gender', 'male']

]

不难想象，随着哈希表的扩张，它会变得越来越稀疏。（为什么会越来越稀疏：哈希表为了保证其操作的有效性（查找、添加、删除等等），都会 overallocate（保证留至少1/3的剩余空间）,但很多空间其实都没有被利用，因此很稀疏。）

举个例子，我们有这样一个字典：

{'name': 'mike', 'dob': '1999-01-01', 'gender': 'male'}

那么它的存储为类似下面的形式：

entries = [

['--', '--', '--']

[-230273521, 'dob', '1999-01-01'],

['--', '--', '--'],

[1231236123, 'name', 'mike'],

['--', '--', '--'],

[9371539127, 'gender', 'male']

]

这样的设计结构，很显然非常浪费存储空间。为了提高存储空间的利用率，现在的哈希表除了字典本身的结构，会把索引和哈希值、键、值单独分开，也就是下面的新结构：

Indices

----------------------------------------------------

----------------------------------------------------

Entries

--------------------

hash0 key0 value0

---------------------

hash1 key1 value1

---------------------

hash2 key2 value2

---------------------

...

---------------------

那么刚刚的这个例子，在新的哈希表结构下，就会变成下面这样：

indices = [None, 1, None, None, 0, None, 2]

entries = [

[1231236123, 'name', 'mike'],

[-230273521, 'dob', '1999-01-01'],

[9371539127, 'gender', 'male']

] # 索引 indices 中的值，对应的是 entries 数组的位置

我们可以看到，空间利用率得到很大的提高。

插入操作

每次向字典或集合插入一个元素时，python 会首先计算键的哈希值（hash（key））,在和 mask = PyDicMinSize - 1做与操作，计算这个元素应该插入哈希表的位置 index = hash(key)&mask，如果哈希表此位置是空的，那么这个元素就会被插入其中。

如果此位置已被占用，Python 便会比较两个元素的哈希值和键是否相等。如果两者都相等，则表明两个元素已经存在，如果值不同，则更新值。若两者有一个不等，这种情况我们称之为哈希冲突，意思是两个元素的键不相等，但是哈希值相同。这种情况下，python 便会继续寻找表中空余的位置，直到找到位置为止。

删除操作，

python 会暂时对这个位置的元素，赋一个特殊的值，等到重新调整哈希表的大小时，再将其删除。

不难理解，哈希冲突的发生，往往会降低字典和集合操作的速度。因此，为了保证其高效性，字典和集合内的哈希表，通常会保证其至少留有 1/3 的剩余空间。随着元素的不停插入，当剩余空间不足 1/3 时，python 会重新获取更大的内存空间，扩充哈希表。不过在这种情况下，表内的所有元素的位置都会被重新排放。

虽然哈希冲突和哈希表大小的调整，都会导致速度减缓，但是这种情况发生的次数极少，所以平均情况下，这仍能保证插入、查找和删除的时间复杂度为 O(1)。

思考下面初始化字典的方式，哪一种更高效？

# Option A

d = {'name': 'jason', 'age': 20, 'gender': 'male'}

# Option B

d = dict({'name': 'jason', 'age': 20, 'gender': 'male'})

第一种更快，{} 是一个内置函数，不需要涉及 dict() 函数调用时的 stack 操作。字典的键可以是一个列表吗，下面的代码是否正确？

d = {'name': 'jason', ['education']: ['Tsinghua University', 'Stanford University']}

用列表作为 key 在 dict 中是不被允许的，因为列表是一个动态变化的数据结构，字典中的key要求是不可变的。如果是可变的 key ，那么随着 key 的变化，这里就有可能有重复的 key，那么这就和字典的定义相违背。这里换成 tuple 是可行的。

weixin_39588983

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python字典和集合双向索引_Python字典和集合

字典和集合基础字典是一系列无序元素的组合，其长度大小可变，元素可以任意的删减和改变。不过，这里的元素是一堆键（key）和值（value）的配对。集合没有键和值的配对，是一系列无序的、唯一的元素组合。字典和集合的创建：d1 = {'name': 'jason', 'age': 20, 'gender': 'male'}d2 = dict({'name': 'jason', 'age': 20, 'g...
复制链接

扫一扫