Python 有许多标准模块,有一些比较冷门,但也有一些提供了非常有用的函数和类。本文主要介绍 collections
模块,它主要是对 Python 内置容器(dict, list, set, 和 tuple)的拓展。
Counter([iterable-or-mapping])
Counter
是字典 dict 的子类,主要用来对可哈希的对象(hashable object)计数。
from collections import Counter
c = Counter("bcdcd")
print(c) # Counter({'c': 2, 'd': 2, 'b': 1})
print(c.most_common(2)) # [('c', 2), ('d', 2)]
也可以用 list, dict 对它进行初始化:
c = Counter({'red': 4, 'blue': 2})
c = Counter(["a", "b", "a"])
对于 Counter
中没有的元素,它不像 dict 那样直接报错(Keyerror),而是返回计数 0:
c = Counter({'red': 4, 'blue': 2})
c["orange"] # 0
常用方法:
most_common([n])
:以列表的形式返回前 n 个计数最多的元素,以及它们的计数值。subtract([iterable-or-mapping])
:在当前 Counter 的计数基础上,减去另一个可迭代对象的计数(计数可以为负)。
c = Counter(a=4, b=2, c=0, d=-2)
d = Counter(a=1, b=2, c=3, d=4)
c.subtract(d)
print(c) #Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})
total()
:所有元素计数的求和- 加法、减法、交集、并集、比较:结果中排除计数为负的元素
Several mathematical operations are provided for combining Counter objects to produce multisets (counters that have counts greater than zero). Addition and subtraction combine counters by adding or subtracting the counts of corresponding elements. Intersection and union return the minimum and maximum of corresponding counts. Equality and inclusion compare corresponding counts. Each operation can accept inputs with signed counts, but the output will exclude results with counts of zero or less.
c = Counter(a=3, b=1)
d = Counter(a=1, b=2)
c + d # add two counters together: c[x] + d[x] (keeping only positive counts)
# Counter({'a': 4, 'b': 3})
c - d # subtract (keeping only positive counts)
# Counter({'a': 2})
c & d # intersection: min(c[x], d[x]) (keeping only positive counts)
# Counter({'a': 1, 'b': 1})
c | d # union: max(c[x], d[x]) (keeping only positive counts)
# Counter({'a': 3, 'b': 2})
c == d # equality: c[x] == d[x]
# False
c <= d # inclusion: c[x] <= d[x]
# False
与空集的加减法可以简写为一元运算:
c = Counter(a=2, b=-4)
+c # 去除计数为负的元素
# Counter({'a': 2})
-c # 求相反数
# Counter({'b': 4})
defaultdict(default_factory=None)
defaultdict
是内置字典类型 dict 的子类。它最主要的功能就是自动处理缺省键。对于普通的 dict,访问未定义的键时,会报错(Keyerror);而通过定义 defaultdict 中 default_factory
,规定如何处理缺省键。
from collections import defaultdict
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
d[k].append(v)
sorted(d.items())
# [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
上面的例子中,default_factory=list
,当字典遇到新的键时,会初始化一个空的列表,这样可以直接调用 append 方法。
再看第二个例子,default_factory=int
:
s = 'mississippi'
d = defaultdict(int)
for k in s:
d[k] += 1
sorted(d.items())
# [('i', 4), ('m', 1), ('p', 2), ('s', 4)]
这个例子主要用于计数,对于新的键,它对应的值初始化为 0。不过单纯为了计数的话,用上面介绍的 Counter
更方便
第三个例子,default_factory=set
,去除重复元素:
s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
d = defaultdict(set)
for k, v in s:
d[k].add(v)
sorted(d.items())
# [('blue', {2, 4}), ('red', {1, 3})]
自定义 default_factory:
def constant_factory(value):
return lambda: value
d = defaultdict(constant_factory('<missing>'))
print(d[1])
# '<missing>'
未完待续
class | utility |
---|---|
namedtuple() | factory function for creating tuple subclasses with named fields |
deque | list-like container with fast appends and pops on either end |
ChainMap | dict-like class for creating a single view of multiple mappings |
Counter | dict subclass for counting hashable objects |
OrderedDict | dict subclass that remembers the order entries were added |
defaultdict | dict subclass that calls a factory function to supply missing values |
UserDict | wrapper around dictionary objects for easier dict subclassing |
UserList | wrapper around list objects for easier list subclassing |
UserString | wrapper around string objects for easier string subclassing |
上面列出了 collections 模块中的其他容器类,如 namedtuple 是对 tuple 的拓展;deque 是对栈和队列的拓展,支持从队列两端添加或删除元素,相当于双端队列。用到的话再详细展开。