【Fluent Python】第三章字典和集合 Dictionaries and Sets

最新推荐文章于 2022-09-30 17:03:02 发布

J_caicaicai

最新推荐文章于 2022-09-30 17:03:02 发布

阅读量261

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/apple_50678962/article/details/113663787

版权

Python 专栏收录该内容

18 篇文章 1 订阅

订阅专栏

Generic Mapping Types

the dict lives in __builtins__.__dict__
Because dict's crucial role, Python dicts are highly optimized. Hash tables are the engines behind Python’s high-performance dicts. set is also implemented with hash tables.

the collections.abc module provides the Mapping and MutableMapping ABCs to formalize the interfaces of dict and similar types.

Implementations of specialized mappings often extend dict or collections.UserDict.

All mapping types in the standard library use the basic dict in their implementation, so they share the limitation that the keys must be hashable.

What is Hashable?

an object is hashable if it has hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method).

–
hashable includes:

The atomic immutable types – str, byte, numeric types
frozen set
tuple if all its items are hashable

可以用hash(x)去查看x的hash value

User-defined types

User-defined types are hashable by default because their hash value is their id() and they all compare not equal.
If an object implements a custom __eq__ that takes into account its internal state, it may be hashable only if all its attributes are immutable.

dict Comprehensions

A dictComp builds a dict instance by producing key:value pair from any iterable.
在这里插入图片描述

Overview of Common Mapping Methods

collections.defaultdict() – leetcode老盆友了，括号里面传入想要value的数据类型 e.g. list， set
collections.OrderedDict – 每次要是周赛遇到都会临时去查 😂
在这里插入图片描述

Mappings with Flexible Key Lookup

sometimes it is convenient to have mappings that return some made-up value when a missing key is searched.

two main approaches:

use a defaultdict instead of a plain dict
subclass dict or any other mapping type and add a __missing__ method.

`defaultdict`: Another Take on Missing Keys

how it works:
when instantiating a defaultdict, you provide a callable that is used to produce a default value whenever __getitem__ is passed a nonexistent key argument.

The `missing` Method

underlying the way mappings deal with missing keys is the aptly named __missing__ method.
If you subclass dict can provide a __missing__ method, the standard dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError

在这里插入图片描述

Variations of dict

collections.OrderedDict

maintain keys in insertion order (啊，我每次都去再sort一便)
allowing iteration over items in a predictable order
popitem() pops the first item by default
but you can use popitem(last-True) to pop the last item added

collection.Counter

和自己写一个freq一样

collections.ChainMap

holds a list of mappings that can be searched as one
the lookup is performed on each mapping in order, and succeeds if the key is found in any of them.
useful to interpreters for languages with nested scopes, where each mapping represents a scope context.

collections.UserDict

a pure Python implementation of a mapping that works like a standard dict
蛤？？？蛤蛤蛤？？

Subclassing `UserDict`

it almost always easier to create a new mapping type by extending UserDict rather than dict.

its value can be appreciated as we extend 上面那个strKeyDict to make sure that any keys added to the mapping are stored as str
.
strKeyDict always converts non-string keys to str-on insertion, update and lookup.

WHY prefer to subclass from UserDict than from dict

the built-in has some implementation shortcuts that end up forcing us to override methods that we can just inherit from UserDict with no problem.

UserDict does not inherit from dict, but has an internal dict instance, called data, which holds the actual items.

this avoids undesired recursion when coding special methods like __setitem__, and Simplifies the coding of __contains__.

Because UserDict subclasses MutableMapping, the remaining methods that make strKeyDict a full-fledged mapping are inherited from UserDict, MutableMapping, or Mapping.

MutableMapping.update
- powerful method can be called directly but is also used by __init__ to load the instance from other Mappings, from iterablles of (key, value) pairs, and key-word argument.
- buz it uses self[key] = value to add items, it ends up calling our implementation of __setitem__
Mapping.get

Immutable Mappings

the types module provides a wrapper class called MappingProxy Type, which, given a mapping, returns a mappingproxy instance that is a read-only but dynamic view of the original mapping. (i.e. updates to the original mapping can be seen in the mappingproxy, but chaNges cannot be made throught it.)

Set Theory

A set is a collection of unique objects.
A basic use case is removing duplications.

set element must be hashable
set 本人 is not hashable
frozenset is hashable
set 里的元素可以是 frozenset

infix operators	meaning
`a \| b`	union
`a & b`	intersection
`a - b`	difference

set Literals

there is not literal notation for the empty set, so we must remember to writer set().

直接 s={1,2,3}会比 s=set([1,2,3])要快，因为唔，这不是废话嘛，后面那个还要先build个list

There is no special syntax to represent frozenset literals – they must be created by calling the constructor.
在这里插入图片描述

set Comprehensions

和list comprehension一样，就是把[] 换成 {}

dict and set Under the Hood

python dict and set are implemented using hash tables

How efficient are Python dict and set?

len 越长，差得越大
Why are they unordered?
Why does the order of the dict keys or set element depend on intersection order, and may change during the lifetime of the structure?

因为，第一，其实每个key 哈希算完忘bucket里存的时候就是稀疏存的。第二，insert就有可能触发python觉得现在hash table太拥挤了，它想要去重新建张更大的表。那么key的值就会变，也没法order
Why can’t we use any Python object as a dict key or set element?

虽然dict和set比array快，但是也是有它的缺点的。
space efficient 是需要考虑的很重要的一点，看要不要空间换时间了
Why is it bad to add items to a dict or set while iterating through it?

If you are iterating over the dictionary keys and changing them at the same time, your loop may not scan all the items expected – not even the items that were already In the dictionary Before you added to it.

A Performance Experiment

an array of 10 million distinct double-precision floats - the haystack
an array of needles - 1,000 floats, with 500 picked frOm haystack and 500 verified not to be in it.
a dict with 1,000 floats
用 timeit module

If your program does any kind of I/O, the lookup time for keys in dict or set is negligible, regardless of the dict or set size (as long as it does fit in RAM)

Hash Tables in Dictionaries

a hash table is a sparse array
In standard data structure texts, the cells in a hash table are often called “buckets”.
In a dict hash table, there is a bucket for each item, and it contains two fields:

a reference to the key
a reference to the value of the item

because all buckets have the same size, access to an individual bucket is done by offset.

Python tries to keep at least 1/3 of the buckets empty, if the hash table becomes too crowded, it is copied to a new location with room for more buckets.

哈希冲突

To put an item in hash table:

step 1 : calculate the hash value of the item key – done with the hash() built-in function.
step 2 : use part of hash to locate a bucket in hash table
step 3 (insert) – when an empty bucket is located, the new item is put there
step 3 (update) – when a bucket with a matching key is found, the value in that bucket is overwritten with the new value
python may determine that the hash table is too crowded and rebuild it to a new location with more room. As the hash table grows, so does the number of hash bits used as bucket offsets, and this keeps the rate of collisions low.

Practical Consequences of of How dict Works

keys must be hashable objects
dict have significant memory overhead
- because a dict uses a hash table internally, and hash tables must be sparse to work, they are not space efficient.
key search is very fast
- we could search more than 2 million keys per second in a dict with 10 million items
key ordering depends on insertion order
adding items to a dict may change the order of existing keys
- Python 会决定这个哈希表要不要grow
- 那如果要grow的话，key们很大可能会变

How sets Work - Practical Consequences

set elements must be hashable objects
set have a significant memory overhead
membership testing is very efficient
element ordering depends on insertion order
adding elements to a set may change the order of other elements

J_caicaicai

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【Fluent Python】第三章字典和集合 Dictionaries and Sets

Generic Mapping Typesthe dict lives in __builtins__.__dict__Because dict's crucial role, Python dicts are highly optimized. Hash tables are the engines behind Python’s high-performance dicts. set is also implemented with hash tables.the collections.ab
复制链接

扫一扫