前言
本文以Array类数据结构为例,围绕如何利用Python实现包接口以及抽象集合的开发展开介绍,其中对说明文档的编写以及在实际开发流程中的设计思想作了详细介绍。本文面向于对OOP编程有一定基础的朋友,建议您在阅读本文章前参考学习文章 <面向对象编程>,如果您对【接口函数、模块、包、库 】等概念仍模糊,文章 Python数据结构之基础概念讲解(一)已给出了具象化的详细解释。
注:本系列文章参考自书籍 Fundamentals of Python Data Structure ,博客内容上如有错误,还请大家指出🙏
引入
在OOP编程中我们了解到一个很重要的概念——抽象,也通过编写代码实现类的实例化了解到继承、多态对于接口函数的复用起到的重大作用;基于类方法的接口函数设计经验上,在本系列文章中您将会和笔者一起,采用自顶向下的设计思想,完成一整个从基于数组的包接口开发到适用于所有数据结构对象的抽象类集合的设计过程。
👉一览自顶向下的设计流程:
👉层级目录如下:
>DATA_STRUCTURES
>arrays.py
>arrays_test.py
>arraybag.py
>arraybag_test.py
>baginterface.py
>arraysortedbag.py
>arrayset.py
>testset.py
完成好设计框架构建之后,在正式写接口函数之前,需要明确对该函数的说明文档包含内容:
- 在创建接口前要使用文档列出每一个方法头,并用一条单个的
pass
或return
语句来结束每一个方法(注:pass
语句不返回任何值的修改器方法,而每个访问器方法都返回一个简单的默认值,例如False、0或None) - 简略的文档字符串应该包括但不限于:
- 调用时在常规条件下期望输出什么
- 异常情况下输出什么,并显示报错信息(异常处理)
- 详细的说明文档应该包括但不限于:
- 方法的参数
- 先验条件(precondition):当该语句为真时,方法正确执行的操作
- 后验条件(postcondition):(假设先验条件为真)方法执行完毕后,什么条件为真
- 说明未满足先验条件而导致的结果(try…except…)
硬件资源/配置环境: MacBookPro12,1/VS Code
注:下文涉及Python异常处理👉详见文章 Python入门之异常处理、文本进度条更新和日期时间操作;yield
关键字语法以及数组的ADT操作,请读者自行补充学习。
一、Array 类的实现
根据先前文章的分析,Python中底层的数据结构是数组(也可以说基本的数据类型列表,但综合考虑适用的范围,数组更为贴切,这也是为什么数据科学库NumPy和Pandas会选用数组作为数据处理的基本数据结构),那么基于数组类的操作会有哪些?我们可以参考一下表格:
用户的数组操作 | Array类中的方法 |
---|---|
a = Array(10) | __init__(capacity, fillValue=None) |
len(a) | __len__() |
str(a) | __str__() |
for item in a | __iter__() |
a[index] | __getitem__() |
a[index] = newItem | __setitem__() |
基于上述表格,我们可以设计出如下所示的Array类👇(保存在arrays.py的文件中)
"""
File: arrays.py
"""
class Array(object):
"""Represents an array."""
def __init__(self, capacity, fillValue = None):
"""Capacity is the static size of the array.
fillValue is placed at each position."""
self.items = list()
for count in range(capacity):
self.items.append(fillValue)
def __len__(self):
"""-> The capacity of the array."""
return len(self.items)
def __str__(self):
"""-> The string representation of the array."""
return str(self.items)
def __iter__(self):
"""Supports traversal with a for loop."""
return iter(self.items)
def __getitem__(self, index):
"""Subscript operator for access at index."""
return self.items[index]
def __setitem__(self, index, newItem):
"""Subscript operator for replacement at index."""
self.items[index] = newItem
现在我们在arrays_test.py文件中创建实例化对象和调用类方法,并在终端运行该文件。如下:
"""
File: arrays_test.py
to initialize the Array class
"""
from arrays import Array
a = Array(5)
print("Len of a is: " ,a.__len__())
print("\nBefore Initialing: " ,a.__str__())
for i in range(len(a)):
a.__setitem__(i, i)
print("\nAfter Initialing: " ,a.__str__())
封装在Array类中对数组的处理方法效果等价于Python下标运算符 []
、len
函数、str
函数、for
循环,如下所示。对比传统的Python对列表的操作方法,代码的复用性在很大程度上提高,同时,当我们想调用该数据结构的方法时,只需按照 from arrays import Array
导入该函数方法所在的模块,实例化后通过 实例变量名.类方法()
的形式即可实现对该数据结构的处理。
二、ArrayBag类的实现
在完成数据结构array类的设计后,我们再来考虑创建一个基于数组类的包以实现ADT基本操作。数组在内存中是连续存储的,在源文件arrays.py中考虑了对数组的初始化、取长度、取其字符串输出、迭代器、获取和设置数据项,都是以列表的形式对数组类进行处理;基于此,我们还要开发增加数据项、删除数据项(如果没有就异常处理)、判断是否为空、清空数组、判断两个数组是否相等的功能。
在arraybag.py源文件中的代码如下:
"""
File: arraybag.py
"""
from arrays import Array
class ArrayBag(object):
"""An array-based bag implementation."""
# 类变量——默认的数组大小
DEFAULT_CAPACITY = 10
# 初始化数据结构——实例化并添加数据项(来自sourceCollection)
def __init__(self, sourceCollection = None):
"""Sets the initial state of self, which includes the
contents of sourceCollection, if it's present."""
self.items = Array(ArrayBag.DEFAULT_CAPACITY)
self.size = 0
self.modCount = 0
if sourceCollection:
for item in sourceCollection:
self.add(item)
# 基于数据结构的包接口
def isEmpty(self):
"""Returns True if len(self) == 0, or False otherwise."""
return len(self) == 0
def __len__(self):
"""Returns the number of items in self."""
return self.size
def __str__(self):
"""Returns the string representation of self."""
return "{" + ", ".join(map(str, self)) + "}"
def __iter__(self):
"""Supports iteration over a view of self.
Raises Attribute error if mutation occurs
within the loop."""
modCount = self.modCount
cursor = 0
while cursor < len(self):
yield self.items[cursor]
if modCount != self.modCount:
raise AttributeError("Mutation not allowed in loop")
cursor += 1
def __add__(self, other):
"""Returns a new bag containing the contents
of self and other."""
result = ArrayBag(self)
for item in other:
result.add(item)
return result
def clone(self):
"""Returns a copy of self."""
return ArrayBag(self)
def __eq__(self, other):
"""Returns True if self equals other,
or False otherwise."""
if self is other: return True
if type(self) != type(other) or \
len(self) != len(other):
return False
for item in self:
if self.count(item) != other.count(item):
return False
return True
def count(self, item):
"""Returns the number of instances of item in self."""
total = 0
for nextItem in self:
if nextItem == item:
total += 1
return total
# 在基础包接口上对array数据结构的扩充,即计算内存的功能函数
def clear(self):
"""Makes self become empty."""
self.size = 0
modCount = 0
self.items = Array(ArrayBag.DEFAULT_CAPACITY)
def add(self, item):
"""Adds item to self."""
# Check array memory here and increase it if necessary
modCount = self.modCount
if len(self) == len(self.items):
temp = Array(2 * len(self))
for i in range(len(self)):
temp[i] = self.items[i]
self.items = temp
self.items[len(self)] = item
self.size += 1
modCount += 1
def remove(self, item):
"""Precondition: item is in self.
Raises: KeyError if item in not in self.
Postcondition: item is removed from self."""
# Check precondition and raise if necessary
modCount = self.modCount
if not item in self:
raise KeyError(str(item) + " not in bag")
# Search for the index of the target item
targetIndex = 0
for targetItem in self:
if targetItem == item:
break
targetIndex += 1
# Shift items to the left of target up by one position
for i in range(targetIndex, len(self) - 1):
self.items[i] = self.items[i + 1]
# Decrement logical size
self.size -= 1
modCount += 1
# Check array memory here and decrease it if necessary
if len(self) <= len(self.items) // 4 and \
2 * len(self) >= ArrayBag.DEFAULT_CAPACITY:
temp = Array(len(self.items) // 2)
for i in range(len(self)):
temp[i] = self.items[i]
self.items = temp
在arraybag_test.py源文件中的测试代码如下:
"""
File: arraybag_test.py
"""
from arraybag import ArrayBag
a = ArrayBag([1, 2, 3, 4])
b = a # 实际上是深拷贝,即b是a在的视图,对a的修改都会映射到b中
c = ArrayBag([1, 2, 3, 4]) # Python在堆中另外申请了一个内存空间以存储对象c
print("Setting items when class being created: ", a)
a.__init__([22, 33, 44]) # 初始化赋值后会覆盖之前在创建对象时传入的列表,即恢复了出厂设置
print("\nAfter Initialing: ", a)
a.__add__(b) # 该方法不会直接将b中的值添加到a中,该类方法只起到验证可以将b对象添加到a对象的作用
print("\n\'b __Add__ a?\' Now a: ", a)
print("\nCan b Add into a? Now a: ", a.__add__(b))
print("\na Equals To b: ", a.__eq__(b))
print("\na Equals To c: ", a.__eq__(c))
a.add(4)
print("\nAfter adding【4】to a: ", a)
print("\nRemove an item not existing: \n", c.remove(5))
运行结果如下:
三、BagInterface类的实现
Ok,至此先暂停一下arrays数据结构的包开发,从上述过程中,我们可以多少总结出一些适用于所有数据结构的操作,如初始化、判断包是否为空、取长度、取其字符串输出、迭代器、添加元素、判断两个包是否相等以及清空、增加和删除数据,那么我们可以将其封装在一个BagInterface类中。如下所示:
"""
File: baginterface.py
"""
class BagInterface(object):
"""Interface for all bag types."""
# Constructor
def __init__(self, sourceCollection = None):
"""Sets the initial state of self, which includes the
contents of sourceCollection, if it's present."""
pass
# Accessor methods
def isEmpty(self):
"""Returns True if len(self) == 0, or False otherwise."""
return True
def __len__(self):
"""-Returns the number of items in self."""
return 0
def __str__(self):
"""Returns the string representation of self."""
return ""
def __iter__(self):
"""Supports iteration over a view of self."""
return None
def __add__(self, other):
"""Returns a new bag containing the contents
of self and other."""
return None
def __eq__(self, other):
"""Returns True if self equals other,
or False otherwise."""
return False
def count(self, item):
"""Returns the number of instances of item in self."""
return 0
# Mutator methods
def clear(self):
"""Makes self become empty."""
pass
def add(self, item):
"""Adds item to self."""
pass
def remove(self, item):
"""Precondition: item is in self.
Raises: KeyError if item in not in self.
Postcondition: item is removed from self."""
pass
完成了所有数据结构的ADT包接口的设计,那么很自然地思考,会不会有一种适用于所有数据集合的最底层操作?我们是否可以将其抽象出来同样制作成一个包?答案是当然可以的,但考虑到篇幅长度以及目前只完成了arrays数据结构的包开发,基础有待打牢,关于所有集合的ADT操作包开发就留到链表之后。
四、ArraySortedBag类的实现
现在关于arrays数据结构的包开发基本完成,在此基础上做一个小拓展:对有序数组用二分查找优化搜索数据项以及添加元素操作。在arraysortedbag.py文件中的代码如下:
"""
File: arraysortedbag.py
Adds __eq__ to ArraySortedBag, with O(n) running time.
"""
from arrays import Array
from arraybag import ArrayBag
class ArraySortedBag(ArrayBag):
"""An array-based sorted bag implementation."""
# Constructor
def __init__(self, sourceCollection = None):
"""Sets the initial state of self, which includes the
contents of sourceCollection, if it's present."""
ArrayBag.__init__(self, sourceCollection)
# Accessor methods
def __contains__(self, item):
left = 0
right = len(self) - 1
while left <= right:
midPoint = (left + right) // 2
if self.items[midPoint] == item:
return True
elif self.items[midPoint] > item:
right = midPoint - 1
else:
left = midPoint + 1
return False
五、制作测试包
在 arrayset.py 文件中实现Array测试集:
"""
File: arrayset.py
"""
from arraybag import ArrayBag
class ArraySet(ArrayBag):
"""An array-based set implementation."""
# Constructor
def __init__(self, sourceCollection = None):
"""Sets the initial state of self, which includes the
contents of sourceCollection, if it's present."""
ArrayBag.__init__(self, sourceCollection)
# Mutator methods
def add(self, item):
"""Adds item to self."""
if not item in self:
ArrayBag.add(self, item)
在 testset.py 文件中实现ArraySet测试:
"""
File: testset.py
A tester program for set implementations.
"""
from arrayset import ArraySet
def test(setType):
"""Expects a set type as an argument and runs some tests
on objects of that type."""
lyst = [2013, 61, 1973]
print("The list of items added is:", lyst)
s1 = setType(lyst)
print("Expect 3:", len(s1))
print("Expect the set's string:", s1)
print("Expect True:", 2013 in s1)
print("Expect False:", 2012 in s1)
print("Expect the items on separate lines:")
for item in s1:
print(item)
s1.clear()
print("Expect {}:", s1)
s1.add(25)
s1.remove(25)
print("Expect {}:", s1)
s1 = setType(lyst)
s2 = setType(s1)
print("Expect True:", s1 == s2)
print("Expect False:", s1 is s2)
print("Expect one of each item:", s1 + s2)
for item in lyst:
s1.remove(item)
print("Expect {}:", s1)
test(ArraySet)
输出如下: