Python学习笔记（九）—— Dict

最新推荐文章于 2022-08-11 18:09:16 发布

Tnix

最新推荐文章于 2022-08-11 18:09:16 发布

阅读量518

点赞数

分类专栏： python 文章标签： python

python 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

代码及内容源自《Fluent Python》——Luciano Ramalho 著

建立dict可以通过下面几种方式：

>>> a = dict(one=1,two=2,three=3)
>>> b = {'one':1,'two':2,'three':3}
>>> c = dict(zip(['one','two','three'],[1,2,3]))
>>> d = dict([('two',2),('one',1),('three',3)])
>>> e = dict({'three':3,'one':1,'two':2})
>>> a == b == c == d == e

True

>>> DIAL_CODES = [
    (86,'China'),
    (91,'Inida'),
    (1,'United States'),
    (62,'Indonesia'),
    (55,'Brazil'),
    (92,'Pakistan'),
    (880,'Bangledesh'),
    (234,'Nigeria'),
    (7,'Russia'),
    (81,'Japan')
]

Python 2.7之后，加入了dict推导式（comprehension）用于生成dict实例。下面给出了基于tuples列表用推导式建立dict的例子：

>>> country_code = {country:code for code,country in DIAL_CODES}
>>> country_code

{'Bangledesh': 880,
 'Brazil': 55,
 'China': 86,
 'Indonesia': 62,
 'Inida': 91,
 'Japan': 81,
 'Nigeria': 234,
 'Pakistan': 92,
 'Russia': 7,
 'United States': 1}

>>> {code:country.upper() for country,code in country_code.items() if code<66}

{1: 'UNITED STATES', 7: 'RUSSIA', 55: 'BRAZIL', 62: 'INDONESIA'}

利用setdefault处理缺失的键值

通过d[k]的方式访问dict，如果k并不是现有的键值，则会报错。此时，可以使用d.get(k,default)的替代方式。但是，如果需要同时更新所对应的值时，这样的方式就会显得笨拙，并且效率低下。如下面的例子：

"""index0.py"""

import sys
import re

WORD_RE = re.compile('\w+')

index = {}
with open(sys.argv[1],encoding='utf-8') as fp:
    for line_no, line in enumerate(fp,1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start()+1
            location = (line_no, column_no)
            #this is ugly; coded like this to make a point
            occurrences = index.get(word,[]) 
            occurrences.append(location)
            index[word] = occurrences

#print in alphabetical order

for word in sorted(index, key=str.upper):
    print(word,index[word])

>>> import this #用于得到zen.txt

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

>>> %run index0.py zen.txt

a [(19, 48), (20, 53)]
Although [(11, 1), (16, 1), (18, 1)]
ambiguity [(14, 16)]
and [(15, 23)]
are [(21, 12)]
aren [(10, 15)]
at [(16, 38)]
bad [(19, 50)]
be [(15, 14), (16, 27), (20, 50)]
beats [(11, 23)]
Beautiful [(3, 1)]
better [(3, 14), (4, 13), (5, 11), (6, 12), (7, 9), (8, 11), (17, 8), (18, 25)]
......

在上面这个例子中，处理“occurrence”的三行代码，通过使用dict.setdefault可以由一行代码来代替，改进后的index0.py如下：

"""index0.py"""

import sys
import re

WORD_RE = re.compile('\w+')

index = {}
with open(sys.argv[1],encoding='utf-8') as fp:
    for line_no, line in enumerate(fp,1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start()+1
            location = (line_no, column_no)
            #this is where codes are improved
            index.setdefault(word,[]).append(location)

#print in alphabetical order

for word in sorted(index, key=str.upper):
    print(word,index[word])

setdefault返回的是值，因此可以直接更新而不需要做第二次搜索。

defaultdict，处理缺失键值的另一方法

与setdefault的原理类似，如果搜索的键值并不存在，则通过生成空列表并付给这个新的键值，从而避免键值错误，并且可以继续进行值的更新。如下例

"""index_default.py"""

import sys
import re
import collections

WORD_RE = re.compile('\w+')

index = collections.defaultdict(list)
with open(sys.argv[1],encoding='utf-8') as fp:
    for line_no, line in enumerate(fp,1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start()+1
            location = (line_no, column_no)
            index[word].append(location)

#print in alphabetical order

for word in sorted(index, key=str.upper):
    print(word,index[word])

missing方法

使defaultdict能够正常工作的实际上是一个名为__missing__的特殊方法，该方法被所有标准的映射类型所支持。如果你定义了一个dict的子类并提供了__missing__方法，那么当键值不存在时，标准的dict.__getitem__会调用__missing__方法，而不会抛出KeyError。

class StrKeyDict0(dict):

    def __missing__(self,key):
        if isinstance(key,str):
            raise KeyError(key)
        return self[str(key)]

    def get(self,key,default=None):
        try:
            return self[key]
        except KeyError:
            return default

    def __contains__(self,key):
        return key in self.keys() or str(key) in self.keys()

>>> d = StrKeyDict0([('2','two'),('4','four')])
>>> d['2']

'two'

>>> d[4]

'four'

>>> d[1]

---------------------------------------------------------------------------

KeyError                  Traceback (most recent call last)

...    

KeyError: '1'