Python 标准模块—— collections

最新推荐文章于 2023-06-05 13:47:35 发布

云中君不见

最新推荐文章于 2023-06-05 13:47:35 发布

阅读量429

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/cendrier/article/details/129904213

版权

python 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

Python 有许多标准模块，有一些比较冷门，但也有一些提供了非常有用的函数和类。本文主要介绍 collections 模块，它主要是对 Python 内置容器（dict, list, set, 和 tuple）的拓展。

Counter([iterable-or-mapping])

Counter 是字典 dict 的子类，主要用来对可哈希的对象（hashable object）计数。

from collections import Counter
c = Counter("bcdcd")
print(c) # Counter({'c': 2, 'd': 2, 'b': 1})
print(c.most_common(2)) # [('c', 2), ('d', 2)]

也可以用 list, dict 对它进行初始化：

c = Counter({'red': 4, 'blue': 2})
c = Counter(["a", "b", "a"])

对于 Counter 中没有的元素，它不像 dict 那样直接报错（Keyerror），而是返回计数 0：

c = Counter({'red': 4, 'blue': 2})
c["orange"] # 0

常用方法：

most_common([n]) ：以列表的形式返回前 n 个计数最多的元素，以及它们的计数值。
subtract([iterable-or-mapping]) ：在当前 Counter 的计数基础上，减去另一个可迭代对象的计数（计数可以为负）。

c = Counter(a=4, b=2, c=0, d=-2)
d = Counter(a=1, b=2, c=3, d=4)
c.subtract(d)
print(c) #Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

total()：所有元素计数的求和
加法、减法、交集、并集、比较：结果中排除计数为负的元素

Several mathematical operations are provided for combining Counter objects to produce multisets (counters that have counts greater than zero). Addition and subtraction combine counters by adding or subtracting the counts of corresponding elements. Intersection and union return the minimum and maximum of corresponding counts. Equality and inclusion compare corresponding counts. Each operation can accept inputs with signed counts, but the output will exclude results with counts of zero or less.

c = Counter(a=3, b=1)
d = Counter(a=1, b=2)
c + d                       # add two counters together:  c[x] + d[x] (keeping only positive counts)
# Counter({'a': 4, 'b': 3})
c - d                       # subtract (keeping only positive counts)
# Counter({'a': 2})
c & d                       # intersection:  min(c[x], d[x]) (keeping only positive counts)
# Counter({'a': 1, 'b': 1})
c | d                       # union:  max(c[x], d[x]) (keeping only positive counts)
# Counter({'a': 3, 'b': 2})
c == d                      # equality:  c[x] == d[x]
# False
c <= d                      # inclusion:  c[x] <= d[x]
# False

与空集的加减法可以简写为一元运算：

c = Counter(a=2, b=-4)
+c # 去除计数为负的元素
# Counter({'a': 2})
-c # 求相反数
# Counter({'b': 4})

defaultdict(default_factory=None)

defaultdict 是内置字典类型 dict 的子类。它最主要的功能就是自动处理缺省键。对于普通的 dict，访问未定义的键时，会报错（Keyerror）；而通过定义 defaultdict 中 default_factory，规定如何处理缺省键。

from collections import defaultdict

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
    d[k].append(v)

sorted(d.items())
# [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

上面的例子中，default_factory=list，当字典遇到新的键时，会初始化一个空的列表，这样可以直接调用 append 方法。

再看第二个例子，default_factory=int：

s = 'mississippi'
d = defaultdict(int)
for k in s:
    d[k] += 1

sorted(d.items())
# [('i', 4), ('m', 1), ('p', 2), ('s', 4)]

这个例子主要用于计数，对于新的键，它对应的值初始化为 0。不过单纯为了计数的话，用上面介绍的 Counter 更方便

第三个例子，default_factory=set，去除重复元素：

s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
d = defaultdict(set)
for k, v in s:
    d[k].add(v)

sorted(d.items())
# [('blue', {2, 4}), ('red', {1, 3})]

自定义 default_factory：

def constant_factory(value):
    return lambda: value
d = defaultdict(constant_factory('<missing>'))
print(d[1])
# '<missing>'

未完待续

class	utility
namedtuple()	factory function for creating tuple subclasses with named fields
deque	list-like container with fast appends and pops on either end
ChainMap	dict-like class for creating a single view of multiple mappings
Counter	dict subclass for counting hashable objects
OrderedDict	dict subclass that remembers the order entries were added
defaultdict	dict subclass that calls a factory function to supply missing values
UserDict	wrapper around dictionary objects for easier dict subclassing
UserList	wrapper around list objects for easier list subclassing
UserString	wrapper around string objects for easier string subclassing