Python中的collections模块介绍（第一部分）

phoenix_wangxd

于 2022-10-23 07:58:00 发布

阅读量222

点赞数

分类专栏： python 文章标签： python 数据结构

本文链接：https://blog.csdn.net/u013391094/article/details/127470426

版权

python 专栏收录该内容

12 篇文章 4 订阅

订阅专栏

Python中的collections模块介绍（第一部分）

0. 本文参考部分

python官方文档【中文】： collections — 容器数据类型
python官方文档【英文】： collections — Container datatypes
collections模块代码托管地址： collections 源码路径

python提供的一些内置数据类型（int、float、str、list、tuple、dict），这些都是很常用的，同时python 在标准库里面也提供了一个collections模块, 丰富了一些其他数据类型，如下：

命名元组(namedtuple): 元组的增强版【非常有用】
双向队列(deque): 增强版本的列表
有序字典(orderedDict): 字典的有序版
默认字典（defaultdict): 字典的默认版本
计数器(counter): 字典的特殊版本

使用上面的这5个功能需先导入标准库: collections（方法: import collections），下面我们分别介绍写几个功能。

1. 命名元组（namedtuple）：元组类型的扩展版

namedtuple 是 元组(tuple) 类型的子类，所以本质上它还是一个元组类型，继承了元组所有的的特性，namedtuple 特别之处在于你可以通过名字来访问元组中的元素，类似字典，通过key来访问value。

1.1. 命名元组`namedtuple`的使用

1.1.1. 以前访问元组方法

以前访问元组中的元素必须通过索引访问, 如下：

# 通过索引访问元组中的元素举例
>>> x, y = 1,2
>>> point = (x, y)
>>> point[0]
1
>>> point[1]
2

当然，也可以通过迭代的方式访问，但是迭代的方式与通过索引访问本质是一致的。

1.1.2. 命名元组`namedtuple`的属性访问

现在使用命名元组namedtuple后, 你新增了通过属性名来访问元素的方式:

from collections import namedtuple

# 首先定义一个namedtuple类
# 类的名字是 "Point"， 它有两个属性 x 和 y; 先不要管为什么前后有两个“Point”，暂时当约定俗成
Point = namedtuple('Point', ['x', 'y'])

# namedtuple最简单的初始化方式： `p = Point(11,22)`, 但是可维护性差，推荐使用下面的初始化:
p = Point(x=11, y=22)

# 和tuple类型一样，可以通过下标索引访问，等价于 p = (11, 22)
print(f"Access used index: {p[0] + p[1]}")
# 另外，namedtuple还可以通过字段属性来访问，这是namedtuple独有的特性：
print(f"Access used attribute: {p.x + p.y}")

输出结果：

Access used index: 33
Access used attribute: 33

这就是命名元组namedtuple的属性访问的说明，可以看到虽然麻烦了一点，但是提供了一个新的方式–通过属性来访问。

1.2. `namedtuple`的语法与属性

1.2.1. 语法简要说明

创建 namedtuple 的完整语法说明：

collections.namedtuple(typename, field_names, *, rename=False, defaults=None, module=None)

语法说明：

返回值为: 一个新的元组子类(a new tuple subclass)，子类的名称为入参 typename 。这个新的子类用于创建类似元组的对象(tuple-like objects)，可以通过字段名来获取属性(attribute)值，同样也可以通过索引(indexable)和迭代(iterable)获取值。
子类的实例(Instances of the subclass) 同样有文档字符串（类型名(typename)和字段名(field_names)）另外还有一个有用的 __repr__() 方法，以 name=value 格式列明了元组内容。

语法参考文档：

https://docs.python.org/3/library/collections.html#collections.namedtuple

我们下面举例进行详细说明。

1.2.2. `namedtuple`的使用补充举例

上面的例子，我们简要的说明了namedtuple怎么使用; 这里继续上面的例子，着重介绍namedtuple的类型、属性、方法等深入的：

from collections import namedtuple
from pprint import pprint

Point = namedtuple('Point', ['x', 'y'])
p = Point(x=11, y=22)

print(f"{type(Point) = }")
print(f"{type(p) = }")
print(f"{isinstance(p, tuple) = }")
print(f"{isinstance(p, Point) = }")
print(f"{isinstance(Point, tuple) = }")
print(f"{'*' * 80}")
print(f"attributes compare: {dir(p) == dir(Point)}")
print(f"{'*' * 80}")
pprint(f"{dir(Point) = }")
print(f"{'*' * 80}")
pprint(f"{dir(tuple) = }")
print(f"{'*' * 80}")
print(f"{len(dir(Point)) = }, {len(dir(tuple)) = }")
print(f"{'*' * 80}")
pprint(f"{set(dir(Point)).difference(set(dir(tuple)))}")

输出结果：

type(Point) = <class 'type'>
type(p) = <class '__main__.Point'>
isinstance(p, tuple) = True
isinstance(p, Point) = True
isinstance(Point, tuple) = False
********************************************************************************
attributes compare: True
********************************************************************************
("dir(Point) = ['__add__', '__class__', '__contains__', '__delattr__', "
 "'__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', "
 "'__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', "
 "'__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', "
 "'__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', "
 "'__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', "
 "'__subclasshook__', '_asdict', '_field_defaults', '_fields', "
 "'_fields_defaults', '_make', '_replace', 'count', 'index', 'x', 'y']")
********************************************************************************
("dir(tuple) = ['__add__', '__class__', '__contains__', '__delattr__', "
 "'__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', "
 "'__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', "
 "'__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', "
 "'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', "
 "'__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', "
 "'index']")
********************************************************************************
len(dir(Point)) = 43, len(dir(tuple)) = 33
********************************************************************************
("{'_replace', '__module__', '_make', '_asdict', '_fields_defaults', "
 "'__slots__', '_field_defaults', '_fields', 'x', 'y'}")

结论：

用namedtuple声明的Point 类的类型是<class 'type'>
类’Point’ 实例化的对象’p’ 的类型是<class 'Point'>
namedtuple 产生的类继而实例化出来的对象(也就是例子中的’p’)确实是tuple与Point的实例
类’Point’ 不是’tuple’ 的实例
对象’p’ 与类’Point’ 都有属性’x’ 与 ‘y’, 这也就是他们为什么可以通过属性访问的原因;作为对比内置对象’tuple’ 自然是没有属性’x’ 与 ‘y’。
除了从普通元组那里继承来的属性之外，具名元组还有一些自己专有的属性。除了上面说的属性’x’ 与 ‘y’ 外， ‘Point’ 还比内置对象’tuple’ 多8个属性

1.2.3. `namedtuple`的新增属性的功能说明

这里继续上面的例子，将 namedtuple 新增的属性的值打印出来：

from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])
p = Point(x=11, y=22)

attr_set = set(dir(Point)).difference(set(dir(tuple)))
attr_tuple = tuple(attr_set)
print(f"{len(attr_tuple) = }")
print(f"{attr_tuple = }")

for attr_str in attr_tuple:
    print(f"attribute:{attr_str} = {getattr(p, attr_str)}")

执行的结果：

len(attr_tuple) = 10
attr_tuple = ('y', '_field_defaults', '_make', '_replace', '_fields', '_fields_defaults', 'x', '__slots__', '_asdict', '__module__')
attribute:y = 22
attribute:_field_defaults = {}
attribute:_make = <bound method Point._make of <class '__main__.Point'>>
attribute:_replace = <bound method Point._replace of Point(x=11, y=22)>
attribute:_fields = ('x', 'y')
attribute:_fields_defaults = {}
attribute:x = 11
attribute:__slots__ = ()
attribute:_asdict = <bound method Point._asdict of Point(x=11, y=22)>
attribute:__module__ = __main__

每个字段在python官方文档中都有详细的说明，这里先不深入介绍了，后面我会给出一些具体的使用介绍。

1.3. 再次说明: 为什么要用 `namedtuple` ？

我们再仔细想想，要使用 namedtuple 还是挺麻烦的(要先定义一个namedtuple对象)，不像tuple一样，直接 p = (11, 22) 就定义了一个元组对象，那什么场景下会用到 namedtuple 呢？

答案是在使用tuple可读性不强，但是又不希望用class来自定义类的时候。

比如有这样一组数据：

bob = ('Bob', 30, 'male')

看值其实你是不知道这里面的3个元素分别表示什么意思的，也许你能猜出来，但也仅仅是靠猜，那怎么样可读性更好一点呢？其实，我们可以自定义一个类来抽象化这组数据:

class Person:

    def __init__(name, age, gender):
        self.name = name
        self.age = age
        self.gender = gender

bob = Person('Bob', 30, 'male')

通过Person类，你可以一目了然，知道 Bob 对应的就是name，30对应的是 age，male 对应的 gender 字段。

可是这样做，虽然可读性更强一点了，但是代码更麻烦，更重要的是创建一个这样的对象消耗的成本会比纯粹的元组高很多。

而 namedtuple 正好可以解决这种问题，它即继承了tuple良好的性能，又有可读性的特点:

from collections import namedtuple

# 下面这行声明方式中，python会将"name age gender"按照空格自动分隔为三个属性：
Person = namedtuple("Person", "name age gender")
# 这是一种常见的实例化方式：
bob = Person(name='Bob', age=30, gender='male')

>>> bob[0]
'Bob'
>>> bob.name
'Bob'
>>> bob.age
30
>>> bob[1]
30

1.4. 继续举例使用场景

这里我们介绍几个非常典型的使用场景

1.4.1. 只有属性没有方法的对象

现在我们需要定义一个对象用来描述城市的信息，很明显城市它只有属性没有方法，它的属性包括：

城市名称： name
城市属于的国家： country
城市的人口数：population
城市的地理坐标：coordinates

我们可以使用namedtuple可以很好的完成这个目标，我们在下面以东京为例，实例化了这个namedtuple, 具体代码如下:

from collections import namedtuple

# 直接一条命令就定义了一个“微缩版的类”：City  （既满足题目的要求，又非常简单）
City = namedtuple('City', 'name country population coordinates')
# 一条命令就可以实例化一个“微缩版的City类”的对象：tokyo
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))

print(tokyo)
print(tokyo.population)
print(tokyo.coordinates)
print(tokyo[1])

执行结果：

City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))
36.933
(35.689722, 139.691667)
JP

可以看到这种方式非常简单与优雅。

1.4.2. `namedtuple` 的其他属性

除了从普通元组那里继承来的属性之外，具名元组还有一些自己专有的属性。下面展示几个最有用的：

from collections import namedtuple

City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
print(f"{City._fields = }")

LatLong = namedtuple('LatLong', 'lat long')
delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))
delhi = City._make(delhi_data)
print(f"{delhi._asdict() = }")

for key, value in delhi._asdict().items():
    print(key + ':', value)

执行结果：

City._fields = ('name', 'country', 'population', 'coordinates')
delhi._asdict() = {'name': 'Delhi NCR', 'country': 'IN', 'population': 21.935, 'coordinates': LatLong(lat=28.613889, long=77.208889)}
name: Delhi NCR
country: IN
population: 21.935
coordinates: LatLong(lat=28.613889, long=77.208889)

解释：

类属性: _fields --> 是一个包含这个类所有字段名称的元组
类方法: _make(iterable) --> 用 _make() 通过接受一个可迭代对象来生成这个类的一个实例,它的作用跟 City(*delhi_data) 是一样的。
实例方法: _asdict() --> _asdict() 把命名元组以 collections.OrderedDict 的形式返回，我们可以利用它来把元组里的信息友好地呈现出

1.5. `namedtuple` 的总结

类似的，如果要用坐标和半径表示一个圆，也可以用namedtuple定义：

# 语法： namedtuple('名称', [属性list]):
Circle = namedtuple('Circle', ['x', 'y', 'r'])

你有没有注意到，namedtuple可认为是一种简单的自定义类，可以指定属性，但是不能像class定义的类一样定义方法。

因此，在考虑到如果定义一个类，类里面不需要定义方法时，其实就可以用namedtuple来代替。

2. 双向队列（deque）：类似于list，允许两端操作元素

使用list存储数据时，按索引访问元素很快，但是插入和删除元素就很慢了，因为list是线性存储，数据量大的时候，插入和删除效率很低。

2.1. 双向队列(deque) 的简单实用说明

deque是为了高效实现插入和删除操作的双向列表，适合用于队列和栈：

>>> from collections import deque
>>> q = deque(['a', 'b', 'c'])
>>> q.append('x')
>>> q.appendleft('y')
>>> q
deque(['y', 'a', 'b', 'c', 'x'])

deque除了实现list的append()和pop()外，还支持appendleft()和popleft()等方法，这样就可以非常高效地往头部添加或删除元素。

2.2. deque新增的属性与方法

deque新增的属性与方法：

from collections import deque

attr_set = set(dir(deque)).difference(set(dir(list)))
attr_tuple = tuple(attr_set)
print(f"{len(attr_tuple) = }")
print(f"{attr_tuple = }")
q = deque(['a', 'b', 'c'])
for attr_str in attr_tuple:
    print(f"attribute:{attr_str} = {getattr(q, attr_str)}")

执行结果：

len(attr_tuple) = 7
attr_tuple = ('__bool__', 'maxlen', 'rotate', 'popleft', 'appendleft', '__copy__', 'extendleft')
attribute:__bool__ = <method-wrapper '__bool__' of collections.deque object at 0x7f003e243520>
attribute:maxlen = None
attribute:rotate = <built-in method rotate of collections.deque object at 0x7f003e243520>
attribute:popleft = <built-in method popleft of collections.deque object at 0x7f003e243520>
attribute:appendleft = <built-in method appendleft of collections.deque object at 0x7f003e243520>
attribute:__copy__ = <built-in method __copy__ of collections.deque object at 0x7f003e243520>
attribute:extendleft = <built-in method extendleft of collections.deque object at 0x7f003e243520>

2.3. 常用方法总结

deque常用方法的总结：

from collections import deque

d = deque([])  # 创建一个空的双队列
d.append(item)  # 在d的右边(末尾)添加项目item
d.appendleft(item)  # 从d的左边(开始)添加项目item
d.clear()  # 清空队列,也就是删除d中的所有项目
d.extend(iterable)  # 在d的右边(末尾)添加iterable中的所有项目
d.extendleft(item)  # 在d的左边(开始)添加item中的所有项目
d.pop()  # 删除并返回d中的最后一个(最右边的)项目。如果d为空，则引发IndexError
d.popleft()  # 删除并返回d中的第一个(最左边的)项目。如果d为空，则引发IndexError
d.rotate(n=1)  # 将d向右旋转n步(如果n<0,则向左旋转)
d.count(n)  # 在队列中统计元素的个数，n表示统计的元素
d.remove(n)  # 从队列中删除指定的值
d.reverse()  # 翻转队列

phoenix_wangxd

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python中的collections模块介绍（第一部分）

python提供的一些内置数据类型（`int`、`float`、`str`、`list`、`tuple`、`dict`），这些都是很常用的，同时python 在标准库里面也提供了一个`collections`模块, 这里包含了一些常用的数据结构
复制链接

扫一扫