【Fluent Python】第三章 字典和集合 Dictionaries and Sets

Generic Mapping Types

the dict lives in __builtins__.__dict__
Because dict's crucial role, Python dicts are highly optimized. Hash tables are the engines behind Python’s high-performance dicts. set is also implemented with hash tables.


the collections.abc module provides the Mapping and MutableMapping ABCs to formalize the interfaces of dict and similar types.

Implementations of specialized mappings often extend dict or collections.UserDict.

All mapping types in the standard library use the basic dict in their implementation, so they share the limitation that the keys must be hashable.

What is Hashable?

an object is hashable if it has hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method).


hashable includes:

  • The atomic immutable types – str, byte, numeric types
  • frozen set
  • tuple if all its items are hashable

可以用hash(x)去查看x的hash value


User-defined types

User-defined types are hashable by default because their hash value is their id() and they all compare not equal.
If an object implements a custom __eq__ that takes into account its internal state, it may be hashable only if all its attributes are immutable.

dict Comprehensions

A dictComp builds a dict instance by producing key:value pair from any iterable.
在这里插入图片描述

Overview of Common Mapping Methods

collections.defaultdict() – leetcode老盆友了,括号里面传入想要value的数据类型 e.g. list, set
collections.OrderedDict – 每次要是周赛遇到都会临时去查 😂
在这里插入图片描述

Mappings with Flexible Key Lookup

sometimes it is convenient to have mappings that return some made-up value when a missing key is searched.

two main approaches:

  • use a defaultdict instead of a plain dict
  • subclass dict or any other mapping type and add a __missing__ method.

defaultdict: Another Take on Missing Keys

how it works:
when instantiating a defaultdict, you provide a callable that is used to produce a default value whenever __getitem__ is passed a nonexistent key argument.

The __missing__ Method

underlying the way mappings deal with missing keys is the aptly named __missing__ method.
If you subclass dict can provide a __missing__ method, the standard dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError

在这里插入图片描述
在这里插入图片描述

Variations of dict

collections.OrderedDict

  • maintain keys in insertion order (啊,我每次都去再sort一便)
  • allowing iteration over items in a predictable order
  • popitem() pops the first item by default
  • but you can use popitem(last-True) to pop the last item added

collection.Counter

  • 和自己写一个freq一样

collections.ChainMap

  • holds a list of mappings that can be searched as one
  • the lookup is performed on each mapping in order, and succeeds if the key is found in any of them.
  • useful to interpreters for languages with nested scopes, where each mapping represents a scope context.

collections.UserDict

  • a pure Python implementation of a mapping that works like a standard dict
  • 蛤???蛤蛤蛤??

Subclassing UserDict

it almost always easier to create a new mapping type by extending UserDict rather than dict.


its value can be appreciated as we extend 上面那个strKeyDict to make sure that any keys added to the mapping are stored as str
.
strKeyDict always converts non-string keys to str-on insertion, update and lookup.

WHY prefer to subclass from UserDict than from dict
  • the built-in has some implementation shortcuts that end up forcing us to override methods that we can just inherit from UserDict with no problem.

UserDict does not inherit from dict, but has an internal dict instance, called data, which holds the actual items.

  • this avoids undesired recursion when coding special methods like __setitem__, and Simplifies the coding of __contains__.

Because UserDict subclasses MutableMapping, the remaining methods that make strKeyDict a full-fledged mapping are inherited from UserDict, MutableMapping, or Mapping.

  • MutableMapping.update
    • powerful method can be called directly but is also used by __init__ to load the instance from other Mappings, from iterablles of (key, value) pairs, and key-word argument.
    • buz it uses self[key] = value to add items, it ends up calling our implementation of __setitem__
  • Mapping.get

Immutable Mappings

the types module provides a wrapper class called MappingProxy Type, which, given a mapping, returns a mappingproxy instance that is a read-only but dynamic view of the original mapping. (i.e. updates to the original mapping can be seen in the mappingproxy, but chaNges cannot be made throught it.)

在这里插入图片描述

Set Theory

A set is a collection of unique objects.
A basic use case is removing duplications.

  • set element must be hashable
  • set 本人 is not hashable
  • frozenset is hashable
  • set 里的元素可以是 frozenset
infix operatorsmeaning
a | bunion
a & bintersection
a - bdifference

set Literals

there is not literal notation for the empty set, so we must remember to writer set().

直接 s={1,2,3}会比 s=set([1,2,3])要快,因为唔,这不是废话嘛,后面那个还要先build个list

There is no special syntax to represent frozenset literals – they must be created by calling the constructor.
在这里插入图片描述

set Comprehensions

和list comprehension一样,就是把[] 换成 {}

dict and set Under the Hood

python dict and set are implemented using hash tables

  • How efficient are Python dict and set?

    在这里插入图片描述
    len 越长, 差得越大

  • Why are they unordered?

  • Why does the order of the dict keys or set element depend on intersection order, and may change during the lifetime of the structure?

    因为,第一,其实每个key 哈希算完忘bucket里存的时候就是稀疏存的。第二,insert就有可能触发python觉得现在hash table太拥挤了,它想要去重新建张更大的表。那么key的值就会变,也没法order

  • Why can’t we use any Python object as a dict key or set element?

    虽然dict和set比array快,但是也是有它的缺点的。
    space efficient 是需要考虑的很重要的一点,看要不要空间换时间了

  • Why is it bad to add items to a dict or set while iterating through it?

    If you are iterating over the dictionary keys and changing them at the same time, your loop may not scan all the items expected – not even the items that were already In the dictionary Before you added to it.

A Performance Experiment

  • an array of 10 million distinct double-precision floats - the haystack
  • an array of needles - 1,000 floats, with 500 picked frOm haystack and 500 verified not to be in it.
    a dict with 1,000 floats
    timeit module

If your program does any kind of I/O, the lookup time for keys in dict or set is negligible, regardless of the dict or set size (as long as it does fit in RAM)

Hash Tables in Dictionaries

a hash table is a sparse array
In standard data structure texts, the cells in a hash table are often called “buckets”.
In a dict hash table, there is a bucket for each item, and it contains two fields:

  • a reference to the key
  • a reference to the value of the item

because all buckets have the same size, access to an individual bucket is done by offset.

Python tries to keep at least 1/3 of the buckets empty, if the hash table becomes too crowded, it is copied to a new location with room for more buckets.

在这里插入图片描述
哈希冲突

To put an item in hash table:

  1. step 1 : calculate the hash value of the item key – done with the hash() built-in function.
  2. step 2 : use part of hash to locate a bucket in hash table
  3. step 3 (insert) – when an empty bucket is located, the new item is put there
  4. step 3 (update) – when a bucket with a matching key is found, the value in that bucket is overwritten with the new value
  5. python may determine that the hash table is too crowded and rebuild it to a new location with more room. As the hash table grows, so does the number of hash bits used as bucket offsets, and this keeps the rate of collisions low.

Practical Consequences of of How dict Works

  • keys must be hashable objects
  • dict have significant memory overhead
    • because a dict uses a hash table internally, and hash tables must be sparse to work, they are not space efficient.
  • key search is very fast
    • we could search more than 2 million keys per second in a dict with 10 million items
  • key ordering depends on insertion order
  • adding items to a dict may change the order of existing keys
    • Python 会决定这个哈希表要不要grow
    • 那如果要grow的话,key们很大可能会变

How sets Work - Practical Consequences

  • set elements must be hashable objects
  • set have a significant memory overhead
  • membership testing is very efficient
  • element ordering depends on insertion order
  • adding elements to a set may change the order of other elements
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值