Python CookBook —— Chapter 4 （个人笔记）

最新推荐文章于 2024-04-22 12:54:46 发布

Gozen Sanji

最新推荐文章于 2024-04-22 12:54:46 发布

阅读量228

点赞数

分类专栏： Python 进阶个人笔记文章标签： python

本文链接：https://blog.csdn.net/jaychang9/article/details/108535483

版权

个人笔记同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

Python 进阶

7 篇文章 0 订阅

订阅专栏

文章目录

Chap 4 迭代器与生成器

Chap 4 迭代器与生成器

4.1 手动遍历迭代器 — next(), iter(), StopIteration

你想遍历一个可迭代对象中的所有元素，但却不想使用 for 循环：

# 为手动地遍历可迭代对象, 可使用 next() 函数并在代码中捕获 StopIteration 异常
# 不想捕获异常来处理的话, 可为 next() 函数指定第二个位置参数为 None

def manual_iter():
    """
        “手动遍历地” 读取一个文件中的所有行(2 种实现方式)
    """
    with open('./h.txt', encoding='utf-8') as f:
        try:
            while True:
                # next() 方法返回迭代器的下一个项目, 若没有下一个项目则触发 StopIteration 异常
                line = next(f)    
                print(line, end='')
        # StopIteration 异常一般用于指示迭代的结尾
        except StopIteration:    
            pass

    print(format('下面是另一种实现方法', '*^60'))

    with open('./h.txt', encoding='utf-8') as f:
        while True:
            # next 方法中第二个参数用于设置在没有下一个元素时返回的默认值
            line = next(f, None)    
            if line is None:
                break
            print(line, end='')


manual_iter()

下面演示迭代期间发生的基本细节：

items = [1, 2, 3]

# 1. Get the iterator
it = iter(items)    # 此时调用了 items.__iter__() 方法得到迭代器

# 2. Run the iterator
print(next(it))    # 此时调用了 it.__next__() 方法以返回迭代器的下一个项目
print(next(it))
print(next(it))
print(next(it))    # 这里抛出 StopIteration 异常, 因为已经没有下一个项目了

4.2 代理迭代 — iter ()

你构建了一个自定义容器对象，其中包含列表、元组 or 其他可迭代对象。你想直接在这个新容器对象上执行迭代操作。为此，只要定义一个 __ iter __() 方法，将迭代操作【代理】到容器内的对象上去：

class Node:
    def __init__(self, value):
        self._value = value
        self._children = []

    def __repr__(self):
        return f'Node({self._value})'

    def add_child(self, node):
        self._children.append(node)

    def __iter__(self):
        # __iter__() 方法将【迭代请求】传递给内部的 _children 属性
        return iter(self._children)


# Example
if __name__ == '__main__':
    root = Node(0)
    child1 = Node(1)
    child2 = Node(2)

    # 将实例 child1, child2 放进 root 实例的 _children 属性(which is a list)中
    root.add_child(child1)
    root.add_child(child2)

    # 可以遍历 root 实例（是因为上面的 __iter__() 方法将迭代请求传递给了 Node 类中的 _children 属性）
    for ch in root:
        print(ch)

4.3 使用生成器创建新的迭代模式 — create a generator through a yield statement

你想实现一个与 range()，reversed() 不同的自定义迭代模式：

# 1. 若想实现一种【新的迭代模式】, 可使用一个生成器函数来定义它:

def frange(start, stop, increment):
    x = start
    while x < stop:
        yield x    # 通过 yield 定义 Generator
        x += increment

# 2. 为使用此函数, 可用 for 循环迭代它 or 使用其他接受可迭代对象的函数（如 sum(), list() 等）:
for n in frange(0, 4, 0.5):
    print(format(n, '->10'))

print(list(frange(0, 1, 0.125)))
print(sum(frange(0, 1, 0.125)))

函数中需要有一个 yield 语句即可将其转换为一个 Generator，与普通函数不同的是，Generator 只能用于迭代操作。

# 下面展示 Generator 的【底层工作机制】:

def countdown(n):
    print('Starting to count from', n)
    while n > 0:
        yield n
        n -= 1
    print('Done!')

# 1. Create the generator, notice no output appears:
c = countdown(3)
print(c)

# 2. Run to first yield and emit a value
print(next(c))

# 3. Run to the next yield
print(next(c))

# 4. Run to next yield
print(next(c))

# Run to next yield (iteration stops)
print(next(c))    # 这里弹出 StopIteration

一个 Generator 的主要特征是它只会回应在迭代中使用到的 next 操作。一旦 Generator 返回退出 (StopIteration)，迭代就终止。在迭代中通常使用的 for loop 会自动处理这些细节，所以无需担心。

4.4 实现迭代器协议 — ???

以【深度优先方式】遍历树形节点的生成器的例子 & yield from 没有看懂

Python 的迭代协议要求一个 _ _ iter _ _() 方法返回一个特殊的迭代器对象，该迭代器对象实现了 _ _ next _ _() 方法并通过 StopIteration 异常标识迭代的完成。这种繁琐的实现方式我一点儿也没看懂0.0

4.5 反向迭代 — reversed() & reversed () in class definition

有时你想倒序遍历一个对象：

# 1. 想反向迭代一个序列, 可使用内置的 reversed() 函数:
a = [x for x in range(1, 6)]

for x in reversed(a):    # reversed() 实现反向迭代
    print(x)

reversed() 仅当对象大小可预先确定 or 对象实现了 __ reversed __() 方法时才能生效，这两个条件都不满足，则必须先将对象转换为一个列表：

# 2. Print a file backwards
f = open('./h.txt', encoding='utf-8')

# 这里若不将 f 转换为 list 会报错: TypeError: '_io.TextIOWrapper' object is not reversible
for line in reversed(list(f)):
    print(line, end='')

Remark: 若可迭代对象元素很多时, 将其预先转换为一个列表要消耗大量内存！

In Fact，可以通过在自定义类上实现 __ reversed __() 方法来实现反向迭代：

class Countdown:
    def __init__(self, start):
        self.start = start

    # Forward iterator
    def __iter__(self):
        n = self.start
        while n > 0:
            yield n
            n -= 1

    # Reverse iterator
    def __reversed__(self):
        # 在类中实现了此方法后, 才能将实例放进 reversed() 中
        n = 1
        while n <= self.start:
            yield n
            n += 1

# 3. 对 Countdown 逆序迭代, 即递增
for c in reversed(Countdown(10)):    # reversed() 中是 Countdown 类的实例
    print(c)

# 4. 对 Countdown 迭代, 即递减
for c in Countdown(10):
    print(c)

定义一个反向迭代器可使得代码非常高效，因为它不再需要将数据填充到一个列表中然后再去反向迭代这个列表。

4.6 带有外部状态的生成器函数 — yield in iter (self) in class definition

定义一个生成器函数，使它可以调用某个外部状态值供用户使用：

# 1. 若想让生成器暴露外部状态给用户, 可简单地将它实现为一个类, 然后把生成器函数放到 __iter__() 方法中:
from collections import deque

class linehistory:
    def __init__(self, lines, histlen=3):
        self.lines = lines
        self.history = deque(maxlen=histlen)

    def __iter__(self):
        """ 将 Generator 放进 __iter__() 方法 """
        for lineno, line in enumerate(self.lines, 1):    # 第二个位置参数可选, 表示下标起始位置
            self.history.append((lineno, line))
            yield line

    def clear(self):
        self.history.clear()    # 清空队列

# 为使用该类, 可将它当做一个普通 Generator; 
# 然而由于可创建一个实例, 故可访问内部属性值（如 history 属性 or clear 方法）
with open('./h.txt', encoding='utf-8') as f:
    lines = linehistory(f, 4)
    # 遍历 linehistory 类的实例 lines（可以这么遍历是因为 __iter__() 方法中的 yield 语句）
    for line in lines:
        if '鬼灭之刃' in line:
            # 遍历此实例的 history 属性 (which is a deque)
            for lineno, hline in lines.history:
                print(f"{lineno}: {hline}", end='')
                
             
# 2. 注意: 若在迭代操作时不使用 for loop, 则必须先调用 iter() 函数:
f = open('./h.txt', encoding='utf-8')
lines = linehistory(f)  
# 不能直接对 linehistory 的实例调用 next() 函数: TypeError: 'linehistory' object is not an iterator
next(lines)

# 需通过调用 iter() 函数生成迭代器后, 再开始迭代:
it = iter(lines)
print(next(it), end='')
print(next(it), end='')
print(next(it), end='')

4.7 迭代器切片 — itertools.islice()

你想得到一个由迭代器生成的切片对象，但是标准切片操作无能为力。

# 函数 itertools.islice() 适用于【在迭代器 & 生成器上】做切片操作:

def count(n):
    """ 从 n 开始计数的 Generator """
    while True:
        yield n
        n += 1

c = count(0)
print(c[10:20])    # Generator 不可直接做切片: TypeError: 'generator' object is not subscriptable

import itertools

# 通过 itertools.islice(iterable, start, stop) 实现 Generator 上的切片:
for x in itertools.islice(c, 10, 20):    
    print(x)

Remark： islice() 函数会消耗掉传入的可迭代对象中的数据，故若要再次访问该迭代器，就得先将其中的数据放入一个列表中。

4.8 跳过可迭代对象的开始部分 — itertools.dropwhile() & itertools.islice()

想在遍历时跳过开头的某些元素：

# 1. itertools. dropwhile() 函数可以实现上面的需求:

# 1.1 g.txt 开头是几行注释
with open('./g.txt', encoding='utf-8') as f:
    for line in f:
        print(line, end='')

from itertools import dropwhile

# 1.2 使用 dropwhile() 在遍历时跳过开始的注释行
with open('./g.txt', encoding='utf-8') as f:
    # 匿名函数作用于可迭代对象 f 的项目上, 返回 True 的项目将在迭代中被跳过
    for line in dropwhile(lambda line: line.startswith('#'), f):
        print(line, end='')

在明确知道要跳过的元素个数时可使用 itertools.islice() 来实现：

from itertools import islice

items = ['a', 'b', 'c', 1, 4, 10, 15]

# 2. 迭代时跳过 items 中的字符项目:
for item in islice(items, 3, None):    # 这里给 stop 传入 None 以表示 “取到最后一个元素” (类似 items[3:])
    print(item)

4.9 排列组合的迭代 — permutations, combinations, combinations_with_replacement

想迭代遍历一个集合中元素の所有可能的排列 or 组合：

from itertools import permutations, combinations, combinations_with_replacement

items = ['a', 'b', 'c']

# 1. 通过 itertools.permutations() 函数迭代集合中元素の所有可能的【排列】:
for item in permutations(items):
    print(item)

for p in permutations(items, 2):    # 可选的第二参数用于指定“排列的长度”
    print(p)

# 2. 通过 itertools.combinations() 函数迭代集合中元素の所有可能的【组合】:
for c in combinations(items, 1):
    print(c)

# 3. 通过 itertools.combinations_with_replacement() 函数迭代集合中元素の所有可能的【组合】(允许元素重复):
for c in combinations_with_replacement(items, 3):
    print(c)

4.10 序列上索引值迭代 — enumerate()

想在迭代一个序列的同时跟踪正在被处理的元素索引：

# 1. 内置的 enumerate() 函数可同步返回可迭代对象中项目的索引:

my_list = ['a', 'b', 'c']
for idx, val in enumerate(my_list, 1):    # 可选的第二参数指定索引起始值
    print(idx, val)

在遍历文件时想在错误消息中使用行号定位错误原因：

def parse_data(filename):
    """
    遍历文件时想在错误消息中使用行号定位
    """
    with open(filename, 'r', encoding='utf-8') as f:
        for line_no, line in enumerate(f, 1):
            fields = line.split()
            try:
                count = int(fields[1])
                # 一些 parse 操作......
            except ValueError as e:
                print(f'Line: {line_no} Parse error: {e}')

例子：enumerate() 对于跟踪某些值在列表中出现的位置是很有用的。所以，若你想将一个文件中出现的单词映射到它出现的行号上去，可以很容易的利用 enumerate() 来完成：

from collections import defaultdict

# 1. 实例化一个 defaultdict 
word_summary = defaultdict(list)

# 2. 获取文件内容
with open('myfile.txt', 'r') as f:
    lines = f.readlines()

# 3. 生成 {单词: [行号i, 行号j]} 字典
for idx, line in enumerate(lines):
    # Create a list of words in current line
    words = [w.strip().lower() for w in line.split()]
    for word in words:
        # 对上面单词列表中每个单词, 将其行号写入 defaultdict 
        word_summary[word].append(idx)

print(word_summary)

enumerate() 函数返回的是一个 enumerate 对象实例，它是一个迭代器，返回连续的【包含一个计数和一个值的元组】，元组中的值通过在传入序列上调用 next() 得到。

4.11 同时迭代多个序列 — zip(), itertools.zip_longest()

想要同时迭代多个序列，每次分别从一个序列中取一个元素。

# 1. 为了同时迭代多个序列, 可使用 zip() 函数:
x_list = [x for x in range(1, 6)]
y_list = [y for y in reversed(range(1, 6))]

for x, y in zip(x_list, y_list):    # 这里对 zip 返回的元组进行了解压赋值
    print(x, '~~~', y)

# 2. zip(a, b) 生成一个迭代器以返回元组(x, y), 其中 x in a, y in b
# 一旦其中某个序列迭代至末尾元素, 整个迭代就宣告结束, 因此迭代长度跟“参数中最短序列长度”一致。
a = [1, 2, 3]
b = 'abcd'

for i in zip(a, b):
    print(i)    # 这里不会出现 b 中的字符 ‘d’
    
# 3. 使用 itertools.zip_longest() 函数时, 迭代长度为“参数中最长序列的长度”:
from itertools import zip_longest

a = [1, 2, 3]
b = 'abcd'

for i in zip_longest(a, b):
    print(i)    # 最后一项 (None, 'd')

for i in zip_longest(a, b, fillvalue='略略略'):    # 可通过 fillvalue 参数指定填充值
    print(i)    # 最后一项 ('略略略', 'd')
    
# 4. 成对地处理数据的例子:
headers = ['name', 'shares', 'price']
values = ['ACME', 100, 490.1]

# 4.1 zip 和 dict 从 list 生成字典:
s = dict(zip(headers, values))
print(s)

# 4.2 成对打印:
for name, val in zip(headers, values):
    print(name, '=', val)
    
# 5. 事实上 zip() 可接受多个序列作为参数:
a, b, c = [1, 2, 3], ['x', 'y', 'z'], ['α', 'β', 'γ']
for i in zip(a, b, c):
    print(i)

# 6. 注意: zip() 会创建一个迭代器来作为结果返回。若需要将成对的值存储在列表中, 要使用 list() 函数:
print(zip(a, b))
print(list(zip(a, b)))

4.12 不同集合上元素的迭代 — itertools.chain()

想在多个对象上执行相同操作，但这些对象在不同容器中，你希望代码在不失可读性的情况下避免写重复的循环：

from itertools import chain

# 1. itertools.chain() 接受一个 or 多个可迭代对象作为输入参数。然后创建一个迭代器, 依次连续地返回每个可迭代对象中的元素, 有效地屏蔽掉在多个容器中迭代的细节:
a = [1, 2, 3, 4]    # list
b = {'x', 'y', 'z'}    # set
c = ('Natsume Soseki', 'Haruki Murakami')    # tuple

for x in chain(a, b, c):
    print(x, end='\t')

# 2. 想对不同集合中所有元素执行某些操作时, 使用 chain() 可能是个好方法:
odds = {1, 3, 5, 7, 9}
evens = {2, 4, 6, 8, 0}

# Iterate over odd numbers & even numbers
for item in sorted(chain(odds, evens)):    # 顺便还能排个序
    print(str(item) + ' is an integer')

""" 这种解决方案要比使用两个单独的循环更优雅 """

# 3. itertools.chain() 方案要比“先将序列合并再迭代”要高效得多:
a = [1, 2, 3, 4]
b = [985, 211, 0, 'XXX', ('hello', 'world')]

for x in a + b:    # Inefficient
    print(x, end='\t')

for x in chain(a, b):    # Better
    print(x, end='\t')

"""
第一种方案中, a + b 操作会创建一个全新的序列; chian() 函数不会有这一步, 因此若输入序列非常大时会很省内存。
"""

4.13 创建数据处理管道 — ???

这一小节的内容不太理解0.0

xxx

4.14 展开嵌套的序列 — 包含 yield from 的 “递归生成器”

如何将一个多层嵌套的序列展开成一个单层列表？

# 可以写一个包含 yield from 语句的 “递归生成器” 来解决此问题:

# (DeprecationWarning will be shown if no '.abc' in this statement)
from collections.abc import Iterable

def flatten(items, ignore_types=(str, bytes)):
    for x in items:
        # 当 x 为可迭代对象且不属于 str/bytes 类型时:
        if isinstance(x, Iterable) and not isinstance(x, ignore_types):
            yield from flatten(x)    # 递归调用
        else:
            yield x

items1 = [1, 2, [3, 4, [5, 6], 7], 8]
items2 = ['Dave', 'Paula', ['Thomas', 'Lewis']]

# 多层嵌套的 list 通过 flatten() 展开后打印出来
for x in flatten(items1):
    print(x)

# str 和 bytes 对象不会被进一步展开为字符
for x in flatten(items2):
    print(x)

这里的 yield from 语句还不是很理解

4.15 顺序迭代合并后的排序迭代对象 — heapq.merge()

你有一系列排序好的序列，想将它们合并后得到一个排序序列并在上面迭代遍历：

import heapq
from itertools import chain

# 1. heapq.merge() 函数可以解决此问题:
a = [x for x in range(1, 11) if (x % 2) == 1]
b = [x for x in range(1, 11) if (x % 2) == 0]

for c in heapq.merge(a, b):    # 此处的 a, b 必须是排序完毕的
    print(c, end='\t')

''' heapq.merge 可迭代特性意味着它不会立刻读取所有序列, 因此可在非常长的序列中使用它而不会有太大的内存开销 '''

# 2. 也可使用 itertools.chain() + sorted() 来实现, 其性能较 heapq.merge() 更优。
for d in sorted(chain(a, b)):
    print(d, end='\t')
    
# 3. 下面这个例子演示如何合并两个“排序好的”文件:
with open('r1.txt', 'r') as file1, \
     open('r2.txt', 'r') as file2, \
     open('merged_file', 'w') as merged_file:    # 没想到吧, 还可以一次打开这么多文件呢~~~

    for line in heapq.merge(file1, file2):
        merged_file.write(line)