Python itertools模块实用教程

最新推荐文章于 2025-05-20 18:44:14 发布

cugleem

最新推荐文章于 2025-05-20 18:44:14 发布

阅读量830

点赞数 21

分类专栏： python内置模块文章标签： python 开发语言

本文链接：https://blog.csdn.net/qq_17275369/article/details/148085863

版权

python内置模块专栏收录该内容

28 篇文章

订阅专栏

Python itertools模块实用教程

itertools是Python标准库中一个非常强大的模块，它提供了一系列用于高效循环的迭代器构建块。这些工具在内存使用和性能方面都经过了优化，可以帮助开发者编写更简洁、更高效的Python代码。

1. 无限迭代器

itertools提供了三个无限迭代器，可以生成无限序列：

1.1 count(start=0, step=1)

从start开始，每次增加step，无限循环：

import itertools

counter = itertools.count(start=10, step=2)
print(next(counter))  # 10
print(next(counter))  # 12
print(next(counter))  # 14
# 无限继续...

1.2 cycle(iterable)

无限循环给定的可迭代对象：

colors = ['red', 'green', 'blue']
color_cycle = itertools.cycle(colors)

print(next(color_cycle))  # 'red'
print(next(color_cycle))  # 'green'
print(next(color_cycle))  # 'blue'
print(next(color_cycle))  # 'red' (又从头开始)

1.3 repeat(object[, times])

重复给定的对象，可以指定重复次数：

repeater = itertools.repeat('hello', 3)

print(next(repeater))  # 'hello'
print(next(repeater))  # 'hello'
print(next(repeater))  # 'hello'
print(next(repeater))  # StopIteration (因为只重复3次)

2. 有限迭代器

2.1 accumulate(iterable[, func])

计算累积值，默认是累加：

numbers = [1, 2, 3, 4, 5]
accumulated = itertools.accumulate(numbers)
print(list(accumulated))  # [1, 3, 6, 10, 15]

# 也可以自定义函数
import operator
accumulated_mul = itertools.accumulate(numbers, operator.mul)
print(list(accumulated_mul))  # [1, 2, 6, 24, 120]

2.2 chain(*iterables)

将多个可迭代对象连接成一个：

chained = itertools.chain([1, 2], ['a', 'b'], [True, False])
print(list(chained))  # [1, 2, 'a', 'b', True, False]

2.3 chain.from_iterable(iterable)

类似于chain，但参数是一个可迭代对象，其中每个元素也是可迭代的：

lists = [[1, 2], [3, 4], [5, 6]]
flattened = itertools.chain.from_iterable(lists)
print(list(flattened))  # [1, 2, 3, 4, 5, 6]

2.4 compress(data, selectors)

根据selectors中的布尔值筛选data中的元素：

data = ['a', 'b', 'c', 'd']
selectors = [1, 0, 1, 0]  # 1表示True，0表示False
compressed = itertools.compress(data, selectors)
print(list(compressed))  # ['a', 'c']

2.5 dropwhile(predicate, iterable)

当predicate为True时丢弃元素，然后返回剩余的所有元素：

numbers = [1, 4, 6, 2, 1, 7, 4, 2]
filtered = itertools.dropwhile(lambda x: x < 5, numbers)
print(list(filtered))  # [6, 2, 1, 7, 4, 2]

2.6 filterfalse(predicate, iterable)

返回predicate为False的元素：

numbers = [1, 2, 3, 4, 5, 6]
filtered = itertools.filterfalse(lambda x: x % 2, numbers)
print(list(filtered))  # [2, 4, 6]

2.7 groupby(iterable, key=None)

按照key函数分组相邻的相同元素：

data = ['a', 'a', 'b', 'c', 'c', 'c', 'd']
grouped = itertools.groupby(data)
for key, group in grouped:
    print(f"{key}: {list(group)}")
# 输出:
# a: ['a', 'a']
# b: ['b']
# c: ['c', 'c', 'c']
# d: ['d']

2.8 islice(iterable, start, stop[, step])

对迭代器进行切片操作：

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
sliced = itertools.islice(numbers, 2, 8, 2)
print(list(sliced))  # [2, 4, 6]

2.9 starmap(function, iterable)

将iterable中的元素作为参数解包后传给function：

data = [(2, 5), (3, 2), (10, 3)]
result = itertools.starmap(pow, data)
print(list(result))  # [32, 9, 1000]

2.10 takewhile(predicate, iterable)

与dropwhile相反，当predicate为True时保留元素，一旦为False就停止：

numbers = [1, 4, 6, 2, 1, 7, 4, 2]
filtered = itertools.takewhile(lambda x: x < 5, numbers)
print(list(filtered))  # [1, 4]

2.11 tee(iterable, n=2)

将一个迭代器拆分为n个独立的迭代器：

data = [1, 2, 3, 4]
iter1, iter2, iter3 = itertools.tee(data, 3)

print(list(iter1))  # [1, 2, 3, 4]
print(list(iter2))  # [1, 2, 3, 4]
print(list(iter3))  # [1, 2, 3, 4]

2.12 zip_longest(*iterables, fillvalue=None)

类似于内置的zip，但以最长的可迭代对象为准，不足的用fillvalue填充：

a = [1, 2, 3]
b = ['a', 'b']
zipped = itertools.zip_longest(a, b, fillvalue='-')
print(list(zipped))  # [(1, 'a'), (2, 'b'), (3, '-')]

3. 组合迭代器

3.1 product(*iterables, repeat=1)

计算笛卡尔积：

colors = ['red', 'green']
sizes = ['S', 'M', 'L']
products = itertools.product(colors, sizes)
print(list(products))
# 输出:
# [('red', 'S'), ('red', 'M'), ('red', 'L'),
#  ('green', 'S'), ('green', 'M'), ('green', 'L')]

# 重复自身
result = itertools.product([0, 1], repeat=3)
print(list(result))
# [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1),
#  (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]

3.2 permutations(iterable, r=None)

返回长度为r的所有可能排列：

items = ['a', 'b', 'c']
perms = itertools.permutations(items, 2)
print(list(perms))
# [('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]

3.3 combinations(iterable, r)

返回长度为r的所有可能组合（不考虑顺序，不重复）：

items = ['a', 'b', 'c']
combs = itertools.combinations(items, 2)
print(list(combs))  # [('a', 'b'), ('a', 'c'), ('b', 'c')]

3.4 combinations_with_replacement(iterable, r)

返回长度为r的所有可能组合（考虑顺序，可重复）：

items = ['a', 'b', 'c']
combs = itertools.combinations_with_replacement(items, 2)
print(list(combs))
# [('a', 'a'), ('a', 'b'), ('a', 'c'),
#  ('b', 'b'), ('b', 'c'), ('c', 'c')]

4. 实用示例

4.1 扁平化嵌套列表

nested = [[1, 2], [3, 4], [5, 6]]
flattened = itertools.chain.from_iterable(nested)
print(list(flattened))  # [1, 2, 3, 4, 5, 6]

4.2 批量处理数据

def batch(iterable, n=1):
    it = iter(iterable)
    while True:
        chunk = list(itertools.islice(it, n))
        if not chunk:
            return
        yield chunk

data = range(10)
for batch_data in batch(data, 3):
    print(batch_data)
# [0, 1, 2]
# [3, 4, 5]
# [6, 7, 8]
# [9]

4.3 滑动窗口

def sliding_window(iterable, n=2):
    iters = itertools.tee(iterable, n)
    for i, it in enumerate(iters):
        for _ in range(i):
            next(it, None)
    return zip(*iters)

numbers = [0, 1, 2, 3, 4, 5]
windows = sliding_window(numbers, 3)
print(list(windows))  # [(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5)]

4.4 多集合操作

# 合并多个排序后的集合
a = [1, 3, 5, 7]
b = [2, 4, 6, 8]
merged = list(itertools.chain.from_iterable(zip(a, b)))
print(merged)  # [1, 2, 3, 4, 5, 6, 7, 8]

5. 注意事项

itertools模块中的函数返回的都是迭代器，这意味着：

内存高效：它们不会一次性生成所有结果，而是按需生成
惰性求值：只有在请求下一个元素时才会进行计算
一次性使用：大多数itertools迭代器只能使用一次，使用后需要重新创建

6. 总结

itertools模块提供了构建迭代器的强大工具，可以：

处理无限序列
组合和排列数据
高效地过滤和转换数据
创建内存高效的管道处理大数据集

通过组合这些基本构建块，开发者可以创建复杂的数据处理管道，同时保持代码的简洁和高效。