序列的修改、散列和切片

最新推荐文章于 2022-12-25 19:15:17 发布

阿拉辉

最新推荐文章于 2022-12-25 19:15:17 发布

阅读量157

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/weixin_38492159/article/details/107400099

版权

python 专栏收录该内容

68 篇文章 1 订阅

订阅专栏

序列的修改、散列和切片

Vector类：用户定义的序列类型
Vector类第1版：与Vector2d类兼容
协议和鸭子类型
Vector类第2版：可切片的序列
- 切片原理
- 能处理切片的__getitem__方法

Vector类：用户定义的序列类型

将使用组合模式实现Vector类，而不使用继承。向量的分量存储在浮点数数组中，而且还将实现不可变扁平序列所需的方法。
不过，在实现序列方法之前，要确保Vector类与前一章定义的Vector2d类兼容，除非有些地方让二者兼容没有什么意义。

Vector类第1版：与Vector2d类兼容

#示例1：Vector类的实现代码
from array import array
import reprlib
import math
class Vector:
    typecode = 'd'
    def __init__(self, components):  
        self._components = array(self.typecode, components)  #➊  
    def __iter__(self):
        return iter(self._components)  #➋
    def __repr__(self):
        components = reprlib.repr(self._components)   #➌ 
        components = components[components.find('['):-1]   #➍
        return 'Vector({})'.format(components)  
    def __str__(self):
        return str(tuple(self))
    def __bytes__(self):
        return (bytes([ord(self.typecode)]) +
                bytes(self._components))  #➎ 
    def __eq__(self, other):
        return tuple(self) == tuple(other)
    def __abs__(self):
        return math.sqrt(sum(x * x for x in self))   #➏ 
    def __bool__(self):
        return bool(abs(self))
    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(memv)   #➐

➊ self._components是“受保护的”实例属性，把Vector的分量保存在一个数组中。
➋ 为了迭代，我们使用self._components构建一个迭代器。1
➌ 使用reprlib.repr()函数获取self._components的有限长度表示形式（如array(‘d’, [0.0, 1.0, 2.0, 3.0, 4.0, …])）。
➍ 把字符串插入Vector的构造方法调用之前，去掉前面的array('d’和后面的)。
➎ 直接使用self._components构建bytes对象。
➏ 不能使用hypot方法了，因此我们先计算各分量的平方之和，然后再使用sqrt方法开平方。
➐ 我们只需在Vector2d.frombytes方法的基础上改动最后一行：直接把memoryview传给构造方法，不用像前面那样使用*拆包。

reprlib.repr的方式需要做些说明。这个函数用于生成大型结构或递归结构的安全表示形式，它会限制输出字符串的长度，用’…'表示截断的部分。

#示例2：测试Vector.__init__和Vector.__repr__方法
>>> Vector([3.1, 4.2]) 
Vector([3.1, 4.2]) 
>>> Vector((3, 4, 5)) 
Vector([3.0, 4.0, 5.0]) 
>>> Vector(range(10)) 
Vector([0.0, 1.0, 2.0, 3.0, 4.0, ...])

除了新构造方法的签名外，还确保了传入两个分量（如Vector([3, 4])）时，Vector2d类（如Vector2d(3, 4)）的每个测试都能通过，而且得到相同的结果。

协议和鸭子类型

在面向对象编程中，协议是非正式的接口，只在文档中定义，在代码中不定义。例如，Python的序列协议只需要__len__和__getitem__两个方法。任何类（如Spam），只要使用标准的签名和语义实现了这两个方法，就能用在任何期待序列的地方。

#示例3
import collections 
 
Card = collections.namedtuple('Card', ['rank', 'suit']) 
 
class FrenchDeck: 
    ranks = [str(n) for n in range(2, 11)] + list('JQKA') 
    suits = 'spades diamonds clubs hearts'.split() 
 
    def __init__(self): 
        self._cards = [Card(rank, suit) for suit in self.suits 
                                        for rank in self.ranks] 
 
    def __len__(self): 
        return len(self._cards) 
 
    def __getitem__(self, position): 
        return self._cards[position]

示例3中的FrenchDeck类能充分利用Python的很多功能，因为它实现了序列协议，不过代码中并没有声明这一点。

Vector类第2版：可切片的序列

如FrenchDeck类所示，如果能委托给对象中的序列属性（如self._components数组），支持序列协议特别简单。下述只有一行代码的__len__和__getitem__方法是个好的始：

class Vector: 
    # 省略了很多行 
    # ... 
 
    def __len__(self): 
        return len(self._components) 
 
    def __getitem__(self, index): 
        return self._components[index]

将上述两行添加到Vector类

#示例4：有切片功能的Vector类
from array import array
import reprlib
import math
class Vector:
    typecode = 'd'
    def __init__(self, components):
        self._components = array(self.typecode, components)  
    def __iter__(self):
        return iter(self._components)  
    def __repr__(self):
        components = reprlib.repr(self._components)  
        components = components[components.find('['):-1]  
        return 'Vector({})'.format(components)
    def __str__(self):
        return str(tuple(self))
    def __bytes__(self):
        return (bytes([ord(self.typecode)]) +
                bytes(self._components))  
    def __eq__(self, other):
        return tuple(self) == tuple(other)
    def __abs__(self):
        return math.sqrt(sum(x * x for x in self))  
    def __bool__(self):
        return bool(abs(self))
    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(memv)
    def __len__(self):
        return len(self._components)
    def __getitem__(self, index):
        return self._components[index]

>>> v1 = Vector([3, 4, 5]) 
>>> len(v1) 
3 
>>> v1[0], v1[-1] 
(3.0, 5.0) 
>>> v7 = Vector(range(7)) 
>>> v7[1:4] 
array('d', [1.0, 2.0, 3.0])

可以看到现在连切片也支持了，不过尚不完美。

切片原理

下面来看Python如何把my_seq[1:3]句法变成传给my_seq.getitem(…)的参数。

#示例5：了解__getitem__和切片的行为
>>> class MySeq: 
...     def __getitem__(self, index): 
...         return index  # ➊ 
... 
>>> s = MySeq() 
>>> s[1]  # ➋ 
1 
>>> s[1:4]  # ➌ 
slice(1, 4, None) 
>>> s[1:4:2]  # ➍ 
slice(1, 4, 2) 
>>> s[1:4:2, 9]  # ➎ 
(slice(1, 4, 2), 9) 
>>> s[1:4:2, 7:9]  # ➏ 
(slice(1, 4, 2), slice(7, 9, None))

➊ 在这个示例中，__getitem__直接返回传给它的值。
➋ 单个索引，没什么新奇的。
➌ 1:4表示法变成了slice(1, 4, None)。
➍ slice(1, 4, 2)的意思是从1开始，到4结束，步幅为2。
➎ 神奇的事发生了：如果[]中有逗号，那么__getitem__收到的是元组。
➏ 元组中甚至可以有多个切片对象。

#示例6：查看slice类的属性
>>> slice  # ➊ 
<class 'slice'> 
>>> dir(slice)  # ➋ 
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', 
 '__format__', '__ge__', '__getattribute__', '__gt__', 
 '__hash__', '__init__', '__le__', '__lt__', '__ne__', 
 '__new__', '__reduce__', '__reduce_ex__', '__repr__', 
 '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 
 'indices', 'start', 'step', 'stop']

➊ slice是内置的类型。
➋ 通过审查slice，发现它有start、stop和step数据属性，以及indices方法。

给定长度为len的序列，计算S表示的扩展切片的起始（start）和结尾（stop）索引，以及步幅（stride）。超出边界的索引会被截掉，这与常规切片的处理方式一样。

indices方法开放了内置序列实现的棘手逻辑，用于优雅地处理缺失索引和负数索引，以及长度超过目标序列的切片。这个方法会“整顿”元组，把start、stop和stride都变成非负数，而且都落在指定长度序列的边界内。

#示例7
>>> slice(None, 10, 2).indices(5)  # ➊ 
(0, 5, 2) 
>>> slice(-3, None, None).indices(5)  # ➋ 
(2, 5, 1)

➊ ‘ABCDE’[:10:2]等同于’ABCDE’[0:5:2]
➋ ‘ABCDE’[-3:]等同于’ABCDE’[2:5:1]

能处理切片的getitem方法

#示例8：为Vector类添加__len__和__getitem__方法
    def __len__(self): 
        return len(self._components) 
 
    def __getitem__(self, index): 
        cls = type(self)  ➊ 
        if isinstance(index, slice):  ➋ 
            return cls(self._components[index])  ➌ 
        elif isinstance(index, numbers.Integral):  ➍ 
            return self._components[index]  ➎ 
        else: 
            msg = '{cls.__name__} indices must be integers' 
            raise TypeError(msg.format(cls=cls))  ➏

➊ 获取实例所属的类（即Vector），供后面使用。
➋ 如果index参数的值是slice对象……
➌ ……调用类的构造方法，使用_components数组的切片构建一个新Vector实例。
➍ 如果index是int或其他整数类型……3
➎ ……那就返回_components中相应的元素。
➏ 否则，抛出异常。

大量使用isinstance可能表明面向对象设计得不好，不过在__getitem__方法中使用它处理切片是合理的。

#示例9：添加了示例8之后的Vector类
from array import array
import reprlib
import math
import numbers
class Vector:
    typecode = 'd'
    def __init__(self, components):
        self._components = array(self.typecode, components)  
    def __iter__(self):
        return iter(self._components)  
    def __repr__(self):
        components = reprlib.repr(self._components)  
        components = components[components.find('['):-1]  
        return 'Vector({})'.format(components)
    def __str__(self):
        return str(tuple(self))
    def __bytes__(self):
        return (bytes([ord(self.typecode)]) +
                bytes(self._components))  
    def __eq__(self, other):
        return tuple(self) == tuple(other)
    def __abs__(self):
        return math.sqrt(sum(x * x for x in self))  
    def __bool__(self):
        return bool(abs(self))
    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(memv)
    def __len__(self):
        return len(self._components)
    def __getitem__(self, index):
        cls = type(self)  
        if isinstance(index, slice):  
            return cls(self._components[index])  
        elif isinstance(index, numbers.Integral):  
            return self._components[index]  
        else:
            msg = '{cls.__name__} indices must be integers'
            raise TypeError(msg.format(cls=cls))

测试示例9：中改进的Vector.__getitem__方法
    >>> v7 = Vector(range(7)) 
    >>> v7[-1]  #➊ 
    6.0 
    >>> v7[1:4]  #➋ 
    Vector([1.0, 2.0, 3.0]) 
    >>> v7[-1:]  #➌ 
    Vector([6.0]) 
    >>> v7[1,2]  #➍ 
    Traceback (most recent call last): 
      ... 
    TypeError: Vector indices must be integers