[python]cook_book 第四章：手动遍历迭代器-CSDN博客

本文链接：https://blog.csdn.net/weixin_38805083/article/details/136706909

4.1 手动遍历迭代器

def mannual_ite():
    with open('./pwd') as f:
        try:
            while True:
                line=next(f)
                print(line,end='')
        except StopIteration:
            pass

StopIteration 用来指示迭代的结尾
如果你手动使用上面演示的 next() 函数的话，你还可以通过返回一个指定值来标记结尾，比如 None

with open('./pwd') as f:
    while True:
        line=next(f,None)
        if line is None:
            break
        print(line,end = '')

items=[1,2,3]
it=iter(items)
for item in it:
    if item is  None:
        break    
    print(item)

1
2
3

4.2 代理迭代

定义一个 iter() 方法，将迭代操作代理到容器内部的对象上去

format() {!r} 是一个占位符，用于将 self._value 的字符串表示插入到字符串中

class Node:
    def __init__(self,value) -> None:
      self._value=value
      self._children=[]
    
    def __repr__(self):
      return 'Node({!r})'.format(self._value)
    def add_child(self, node):
      self._children.append(node)
    def __iter__(self):
        return iter(self._children)
root=Node(0)
child1=Node(1)
child2=Node(2)
root.add_child(child1)
root.add_child(child2)

for ch in root:
   print(ch)

Node(1)
Node(2)

Python的迭代器协议需要 iter() 方法返回一个实现了 next() 方法的迭代器对象
iter(s) 只是简单的通过调用 s.iter() 方法来返回对应的迭代器对象，就跟 len(s) 会调用 s.len() 原理是一样的

4.3 使用生成器创建新的迭代模式

实现一个自定义迭代模式，跟普通的内置函数比如 range() , reversed() 不一样

def dfragne(start,end,incremental):
    x=start
    while x <end:
        yield x
        x+=incremental

for dot_n in dfragne(0,4,0.5):
    print(dot_n)

0
0.5
1.0
1.5
2.0
2.5
3.0
3.5

list(dfragne(0,4,0.5))

[0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5]

跟普通函数不同的是，生成器只能用于迭代操作

def countdown(n):
    print('start countdown')
    while n>0:
        yield n
        n-=1
    print('done')

c=countdown(3)
next(c)

start countdown





3

next(c)

next(c)

next(c)

done



---------------------------------------------------------------------------

StopIteration                             Traceback (most recent call last)

/Users/allen/python/python_learn/book/python_cook/04第四章：迭代器与生成器/04迭代器和生成器.ipynb Cell 19 line 1
----> <a href='vscode-notebook-cell:/Users/allen/python/python_learn/book/python_cook/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9A%E8%BF%AD%E4%BB%A3%E5%99%A8%E4%B8%8E%E7%94%9F%E6%88%90%E5%99%A8/04%E8%BF%AD%E4%BB%A3%E5%99%A8%E5%92%8C%E7%94%9F%E6%88%90%E5%99%A8.ipynb#X25sZmlsZQ%3D%3D?line=0'>1</a> next(c)


StopIteration:

一个生成器函数主要特征是它只会回应在迭代中使用到的 next 操作。一旦生成器函数返回退出，迭代终止。我们在迭代中通常使用的for语句会自动处理这些细节，所以你无需担心

4.4 实现迭代器协议

构建一个能支持迭代操作的自定义对象，并希望找到一个能实现迭代协议的简单方法

使用Node类来表示树形数据结构。你可能想实现一个以深度优先方式遍历树形节点的生成器
depth_first() 方法简单直观。它首先返回自己本身并迭代每一个子节点并通过调用子节点的 depth_first() 方法(使用 yield from 语句)返回对应元素
yield from 将子生成器的值直接传递给当前生成器

class Node:
    def __init__(self,value) -> None:
      self._value=value
      self._children=[]
    
    def __repr__(self):
      return 'Node({!r})'.format(self._value)
    def add_child(self, node):
      self._children.append(node)
    def __iter__(self):
        return iter(self._children)
    def depth_first(self):
       yield self
       for c in self:
          yield from c.depth_first()

root=Node(0)
node1=Node(1)
node2=Node(2)
root.add_child(node1)
root.add_child(node2)
node1.add_child(Node(3))
node1.add_child(Node(4))
node2.add_child(Node(5))

for ch in root.depth_first():
   print(ch)

Node(0)
Node(1)
Node(3)
Node(4)
Node(2)
Node(5)

4.5 反向迭代

反方向迭代一个序列

a=[1,2,3,4 ]
for x in reversed(a):
    print(x)

反向迭代仅仅当对象的大小可预先确定或者对象实现了 reversed() 的特殊方法时才能生效。如果两者都不符合，那你必须先将对象转换为一个列表才行

f=open('./pwd')
for line in reversed(list(f)):
    print(line,end='')

实现 reversed() 方法来实现反向迭代

class CountDown:
    def __init__(self,start) -> None:
        self._start=start
    
    def __iter__(self):
        n=self._start
        while n>0:
            yield n
            n-=1
    def __reversed__(self):
        n=1
        while n<=self._start:
            yield n
            n +=1


for rr in reversed(CountDown(30)):
    print(rr)

for rr in CountDown(30):
    print(rr)

定义一个反向迭代器可以使得代码非常的高效，因为它不再需要将数据填充到一个列表中然后再去反向迭代这个列表

4.6 带有外部状态的生成器函数

from collections import deque
class line_history:
    def __init__(self,lines,histlen=3) -> None:
        self._lines=lines
        self.history=deque(maxlen=histlen)
    
    def __iter__(self):
        for line_no,line in enumerate(self._lines,1):
            self.history.append((line_no,line))
            yield line
    def clear(self):
        self.history.clear()
with open('./pwd') as f:
    lines=line_history(f)
    for line in lines:
        if 'python' in line:
            for lineno, hline in lines.history:
                print('{}:{}'.format(lineno,hline),end='')

2:2
3:3
4:python

在 iter() 方法中定义你的生成器不会改变你任何的算法逻辑

一个需要注意的小地方是，如果你在迭代操作时不使用for循环语句，那么你得先调用 iter() 函数

f =open('./pwd')
lines=line_history(f)
next(lines)

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

/Users/allen/python/python_learn/book/python_cook/04第四章：迭代器与生成器/04迭代器和生成器.ipynb Cell 37 line 3
      <a href='vscode-notebook-cell:/Users/allen/python/python_learn/book/python_cook/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9A%E8%BF%AD%E4%BB%A3%E5%99%A8%E4%B8%8E%E7%94%9F%E6%88%90%E5%99%A8/04%E8%BF%AD%E4%BB%A3%E5%99%A8%E5%92%8C%E7%94%9F%E6%88%90%E5%99%A8.ipynb#X52sZmlsZQ%3D%3D?line=0'>1</a> f =open('./pwd')
      <a href='vscode-notebook-cell:/Users/allen/python/python_learn/book/python_cook/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9A%E8%BF%AD%E4%BB%A3%E5%99%A8%E4%B8%8E%E7%94%9F%E6%88%90%E5%99%A8/04%E8%BF%AD%E4%BB%A3%E5%99%A8%E5%92%8C%E7%94%9F%E6%88%90%E5%99%A8.ipynb#X52sZmlsZQ%3D%3D?line=1'>2</a> lines=line_history(f)
----> <a href='vscode-notebook-cell:/Users/allen/python/python_learn/book/python_cook/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9A%E8%BF%AD%E4%BB%A3%E5%99%A8%E4%B8%8E%E7%94%9F%E6%88%90%E5%99%A8/04%E8%BF%AD%E4%BB%A3%E5%99%A8%E5%92%8C%E7%94%9F%E6%88%90%E5%99%A8.ipynb#X52sZmlsZQ%3D%3D?line=2'>3</a> next(lines)


TypeError: 'line_history' object is not an iterator

it = iter(lines)
next(it)

'1\n'

next(it)

'2\n'

4.7 迭代器切片

得到一个由迭代器生成的切片对象，但是标准切片操作并不能做到

函数 itertools.islice() 正好适用于在迭代器和生成器上做切片操作

def count(n):
    while True:
        yield n 
        n+=1

c=count(0)
c[10:20]

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

/Users/allen/python/python_learn/book/python_cook/04第四章：迭代器与生成器/04迭代器和生成器.ipynb Cell 42 line 7
      <a href='vscode-notebook-cell:/Users/allen/python/python_learn/book/python_cook/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9A%E8%BF%AD%E4%BB%A3%E5%99%A8%E4%B8%8E%E7%94%9F%E6%88%90%E5%99%A8/04%E8%BF%AD%E4%BB%A3%E5%99%A8%E5%92%8C%E7%94%9F%E6%88%90%E5%99%A8.ipynb#X56sZmlsZQ%3D%3D?line=3'>4</a>         n+=1
      <a href='vscode-notebook-cell:/Users/allen/python/python_learn/book/python_cook/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9A%E8%BF%AD%E4%BB%A3%E5%99%A8%E4%B8%8E%E7%94%9F%E6%88%90%E5%99%A8/04%E8%BF%AD%E4%BB%A3%E5%99%A8%E5%92%8C%E7%94%9F%E6%88%90%E5%99%A8.ipynb#X56sZmlsZQ%3D%3D?line=5'>6</a> c=count(0)
----> <a href='vscode-notebook-cell:/Users/allen/python/python_learn/book/python_cook/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9A%E8%BF%AD%E4%BB%A3%E5%99%A8%E4%B8%8E%E7%94%9F%E6%88%90%E5%99%A8/04%E8%BF%AD%E4%BB%A3%E5%99%A8%E5%92%8C%E7%94%9F%E6%88%90%E5%99%A8.ipynb#X56sZmlsZQ%3D%3D?line=6'>7</a> c[10:20]


TypeError: 'generator' object is not subscriptable

import itertools
 
for x in itertools.islice(c,10,20):
    print(x)

迭代器和生成器不能使用标准的切片操作，因为它们的长度事先我们并不知道(并且也没有实现索引)
函数 islice() 返回一个可以生成指定元素的迭代器
它通过遍历并丢弃直到切片开始索引位置的所有元素
然后才开始一个个的返回元素，并直到切片结束索引位置
islice() 会消耗掉传入的迭代器中的数据
必须考虑到迭代器是不可逆的这个事实

如果你需要之后再次访问这个迭代器的话，那你就得先将它里面的数据放入一个列表中

4.8 跳过可迭代对象的开始部分

你想遍历一个可迭代对象，但是它开始的某些元素你并不感兴趣，想跳过它们

itertools.dropwhile() 函数。使用时，你给它传递一个函数对象和一个可迭代对象
它会返回一个迭代器对象，丢弃原有序列中直到函数返回Flase之前的所有元素，然后返回后面所有元素

with open('./pwd') as f:
    for line in f:
        print(line, end='')

##
# User Database
#
# Note that this file is consulted directly only when the system is running
# in single-user mode. At other times, this information is provided by
# Open Directory.
...
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh

如果你想跳过开始部分的注释行的话

from itertools import dropwhile
with open('./pwd') as f:
    # 使用 lambda 表达式来检查行是否以 '#' 开头
    condition = lambda line: not line.startswith('#')
    
    # 使用 dropwhile 删除不符合条件的行，直到找到以 '#' 开头的行
    filtered_lines = dropwhile(condition, f)
    
    # 打印余下的行
    for line in filtered_lines:
        print(line, end='')

##
# User Database
#
# Note that this file is consulted directly only when the system is running
# in single-user mode. At other times, this information is provided by
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh

from itertools import islice
items=['a', 'b', 'c', 1, 4, 10, 15]
for x in islice(items,3,None):
    print(x)

函数 dropwhile() 和 islice() 其实就是两个帮助函数，为的就是避免写出下面这种冗余代码

with open('./pwd') as f:
    lines = (line for line in f if not line.startswith('#'))
    for line in lines:
        print(line, end='')

nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh

4.9 排列组合的迭代

itertools模块提供了三个函数来解决这类问题。其中一个是 itertools.permutations() ，它接受一个集合并产生一个元组序列，每个元组由集合中所有元素的一个可能排列组成

items = ['a', 'b', 'c']
from itertools import permutations
for p in permutations(items):
    print(p)

('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')

for  p in permutations(items,2):
    print(p)

('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')

使用 itertools.combinations() 可得到输入集合中元素的所有的组合

from itertools import combinations
for c in combinations(items,3):
    print(c)

('a', 'b', 'c')

for c in combinations(items,2):
    print(c)

('a', 'b')
('a', 'c')
('b', 'c')

for c in combinations(items,1):
    print(c)

('a',)
('b',)
('c',)

对于 combinations() 来讲，元素的顺序已经不重要

for c in combinations(items, 1):
    print(c)

('a',)
('b',)
('c',)

for c in itertools.combinations_with_replacement(items, 3):
    print(c)

('a', 'a', 'a')
('a', 'a', 'b')
('a', 'a', 'c')
('a', 'b', 'b')
('a', 'b', 'c')
('a', 'c', 'c')
('b', 'b', 'b')
('b', 'b', 'c')
('b', 'c', 'c')
('c', 'c', 'c')

4.10 序列上索引值迭代

迭代一个序列的同时跟踪正在被处理的元素索引

my_list = ['a', 'b', 'c']
for idx,val in enumerate(my_list):
    print(idx,val)

0 a
1 b
2 c

为了按传统行号输出(行号从1开始)，你可以传递一个开始参数

my_list = ['a', 'b', 'c']
for idx,val in enumerate(my_list,1):
    print(idx,val)

1 a
2 b
3 c

这种情况在你遍历文件时想在错误消息中使用行号定位时候非常有用

enumerate() 对于跟踪某些值在列表中出现的位置是很有用
如果你想将一个文件中出现的单词映射到它出现的行号上去，可以很容易的利用 enumerate() 来完成

from collections import defaultdict
word_summary = defaultdict(list)

with open('./pwd', 'r') as f:
    lines = f.readlines()

for idx, line in enumerate(lines):
    # Create a list of words in current line
    words = [w.strip().lower() for w in line.split()]
    for word in words:
        word_summary[word].append(idx)

额外定义一个计数变量的时候，使用 enumerate() 函数会更加简单

lineno=1
for line in f:
    lineno+=1

data = [ (1, 2), (3, 4), (5, 6), (7, 8) ]

# correct 
# Correct!
for n, (x, y) in enumerate(data):
    print(n)
    print(x,y)

# error
for n, x, y in enumerate(data):
    print(n)
    print(x)
    print(y)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

/Users/allen/python/python_learn/book/python_cook/04第四章：迭代器与生成器/04迭代器和生成器.ipynb Cell 77 line 2
      <a href='vscode-notebook-cell:/Users/allen/python/python_learn/book/python_cook/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9A%E8%BF%AD%E4%BB%A3%E5%99%A8%E4%B8%8E%E7%94%9F%E6%88%90%E5%99%A8/04%E8%BF%AD%E4%BB%A3%E5%99%A8%E5%92%8C%E7%94%9F%E6%88%90%E5%99%A8.ipynb#Y141sZmlsZQ%3D%3D?line=0'>1</a> # error
----> <a href='vscode-notebook-cell:/Users/allen/python/python_learn/book/python_cook/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9A%E8%BF%AD%E4%BB%A3%E5%99%A8%E4%B8%8E%E7%94%9F%E6%88%90%E5%99%A8/04%E8%BF%AD%E4%BB%A3%E5%99%A8%E5%92%8C%E7%94%9F%E6%88%90%E5%99%A8.ipynb#Y141sZmlsZQ%3D%3D?line=1'>2</a> for n, x, y in enumerate(data):
      <a href='vscode-notebook-cell:/Users/allen/python/python_learn/book/python_cook/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9A%E8%BF%AD%E4%BB%A3%E5%99%A8%E4%B8%8E%E7%94%9F%E6%88%90%E5%99%A8/04%E8%BF%AD%E4%BB%A3%E5%99%A8%E5%92%8C%E7%94%9F%E6%88%90%E5%99%A8.ipynb#Y141sZmlsZQ%3D%3D?line=2'>3</a>     print(n)
      <a href='vscode-notebook-cell:/Users/allen/python/python_learn/book/python_cook/04%E7%AC%AC%E5%9B%9B%E7%AB%A0%EF%BC%9A%E8%BF%AD%E4%BB%A3%E5%99%A8%E4%B8%8E%E7%94%9F%E6%88%90%E5%99%A8/04%E8%BF%AD%E4%BB%A3%E5%99%A8%E5%92%8C%E7%94%9F%E6%88%90%E5%99%A8.ipynb#Y141sZmlsZQ%3D%3D?line=3'>4</a>     print(x)


ValueError: not enough values to unpack (expected 3, got 2)

4.11 同时迭代多个序列

同时迭代多个序列，每次分别从一个序列中取一个元素

同时迭代多个序列，使用 zip() 函数

xpts = [1, 5, 4, 2, 10, 7]
ypts = [101, 78, 37, 15, 62, 99]
for x, y in zip(xpts, ypts):
    print(x)
    print(y)

zip(a, b) 会生成一个可返回元组 (x, y) 的迭代器,x来自a，y来自b
迭代长度跟参数中最短序列长度一致

a=[1,2,3,4]
b=['w','x','y','z']

for i in zip(a,b):
    print(i)

(1, 'w')
(2, 'x')
(3, 'y')
(4, 'z')

如果两个list长度不一样，只能输出能配对的部分

a=[1,2,3]
b=['w','x','y','z']

for i in zip(a,b):
    print(i)

(1, 'w')
(2, 'x')
(3, 'y')

如果想即使不配对也想输出，使用itertools.zip_longest()

from itertools import zip_longest
a=[1,2,3]
b=['w','x','y','z']

for i in zip_longest(a,b):
    print(i)

(1, 'w')
(2, 'x')
(3, 'y')
(None, 'z')

zip的用法

headers = ['name', 'shares', 'price']
values = ['ACME', 100, 490.1]

dict_data=dict(zip(headers,values))
for k,v in dict_data.items():
    print(k,v)

name ACME
shares 100
price 490.1

zip() 会创建一个迭代器来作为结果返回。如果你需要将结对的值存储在列表中，要使用 list()

zip_data=zip(a,b)
list_data=list(zip_data)
print(list_data)

[(1, 'w'), (2, 'x'), (3, 'y')]

4.12 不同集合上元素的迭代

在多个对象执行相同的操作，但是这些对象在不同的容器中，你希望代码在不失可读性的情况下避免写重复的循环。

from itertools import chain
a=[1,2,3,4]
b=['x','y','z']
for x in chain(a,b):
    print(x)

1
2
3
4
x
y
z

chain() 的一个常见场景是当你想对不同的集合中所有元素执行某些操作的时候

# Various working sets of items
active_items = set()
inactive_items = set()

# Iterate over all items
for item in chain(active_items, inactive_items):
    # Process item

itertools.chain() 接受一个或多个可迭代对象作为输入参数。然后创建一个迭代器，依次连续的返回每个可迭代对象中的元素
这种方式要比先将序列合并再迭代要高效的多

# Inefficent
for x in a + b:
    ...

# Better
for x in chain(a, b):

a + b 操作会创建一个全新的序列并要求a和b的类型一致

4.13 创建数据处理管道

想以数据管道(类似Unix管道)的方式迭代处理数据。比如，你有个大量的数据需要处理，但是不能将它们一次性放入内存中

定义一个由多个执行特定任务独立任务的简单生成器函数组成的容器

import os
import fnmatch
import gzip
import bz2
import re

def gen_find(filepat,top):
    """ Find all file names in a directory tree that  mattch a shell wildcard pattern"""
    for path,dirlist,filelist in os.walk(top=top):
        for name in fnmatch.filter(filelist,filepat):
            yield os.path.join(path,name)

def gen_opener(filenames):
    '''
    Open a sequence of filenames one at a time producing a file object.
    The file is closed immediately when proceeding to the next iteration.
    '''
    for filename in filenames:
        if filename.endswith('.gz'):
            f=gzip.open(filename,'rt')
        elif filename.endswith('.bz2'):
            f=bz2.open(filename=filename,'rt')
        else:
            f=open(filename,'rt')
        yield f
        f.close 

def gen_concatenate(iterators):
    '''
    Chain a sequence of iterators together into a single sequence.
    '''
    for it in iterators:
        yield from it

def gen_grep(pattern,lines):
    '''
    Look for a regex pattern in a sequence of lines
    '''
    pat=re.compile(pattern=pattern)
    for line in lines:
        if pat.match(line):
            yield line

现在你可以很容易的将这些函数连起来创建一个处理管道

为了查找包含单词python的所有日志行

lognames=gen_find('access-log*','www')
files=gen_opener(lognames)
lines = gen_concatenate(files)
pylines=gen_grep('(?i)python',files)
for line in pylines:
    print(line)

以管道方式处理数据可以用来解决各类其他问题，包括解析，读取实时数据，定时轮询等。
重点是要明白 yield 语句作为数据的生产者而 for 循环语句作为数据的消费者
每个 yield 会将一个单独的数据元素传递给迭代处理管道的下一阶段。
这种方式一个非常好的特点是每个生成器函数很小并且都是独立的
使用了迭代方式处理，代码运行过程中只需要很小很小的内存

4.14

你想将一个多层嵌套的序列展开成一个单层列表

from collections.abc import Iterable

def flatten(items, ignore_types=(str, bytes)):
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, ignore_types):
            yield from flatten(x)
        else:
            yield x

items = [1, 2, [3, 4, [5, 6], 7], 8]
# Produces 1 2 3 4 5 6 7 8
for x in flatten(items):
    print(x)

isinstance(x, Iterable) 检查某个元素是否是可迭代的

方法2

def flatten(items, ignore_types=(str,bytes)):
    for x in items:
        if isinstance(x,Iterable) and not isinstance(x,ignore_types):
            for i in flatten(items=x):
                yield i

        else:
            yield x

items = [1, 2, [3, 4, [5, 6], 7], 8]
# Produces 1 2 3 4 5 6 7 8
for x in flatten(items):
    print(x)

4.15 顺序迭代合并后的排序迭代对象

你有一系列排序序列，想将它们合并后得到一个排序序列并在上面迭代遍历

import heapq
a=[1,4,7,10]
b=[2,5,6,11]
for c in heapq.merge(a,b):
    print(c)

heapq.merge 可迭代特性意味着它不会立马读取所有序列。这就意味着你可以在非常长的序列中使用它，而不会有太大的开销

4.16 迭代器代替while无限循环

iter 函数一个鲜为人知的特性是它接受一个可选的 callable 对象和一个标记(结尾)值作为输入参数。
当以这种方式使用的时候，它会创建一个迭代器，这个迭代器会不断调用 callable 对象直到返回值和标记值相等为止

如果你想从套接字或文件中以数据块的方式读取数据，通常你得要不断重复的执行 read() 或 recv(),并在后面紧跟一个文件结尾测试来决定是否终止

iter() 调用就可以将两者结合起来了。其中 lambda 函数参数是为了创建一个无参的 callable 对象，并为 recv 或 read() 方法提供了 size 参数