Python itertools模块实用教程
itertools是Python标准库中一个非常强大的模块,它提供了一系列用于高效循环的迭代器构建块。这些工具在内存使用和性能方面都经过了优化,可以帮助开发者编写更简洁、更高效的Python代码。
1. 无限迭代器
itertools提供了三个无限迭代器,可以生成无限序列:
1.1 count(start=0, step=1)
从start开始,每次增加step,无限循环:
import itertools
counter = itertools.count(start=10, step=2)
print(next(counter)) # 10
print(next(counter)) # 12
print(next(counter)) # 14
# 无限继续...
1.2 cycle(iterable)
无限循环给定的可迭代对象:
colors = ['red', 'green', 'blue']
color_cycle = itertools.cycle(colors)
print(next(color_cycle)) # 'red'
print(next(color_cycle)) # 'green'
print(next(color_cycle)) # 'blue'
print(next(color_cycle)) # 'red' (又从头开始)
1.3 repeat(object[, times])
重复给定的对象,可以指定重复次数:
repeater = itertools.repeat('hello', 3)
print(next(repeater)) # 'hello'
print(next(repeater)) # 'hello'
print(next(repeater)) # 'hello'
print(next(repeater)) # StopIteration (因为只重复3次)
2. 有限迭代器
2.1 accumulate(iterable[, func])
计算累积值,默认是累加:
numbers = [1, 2, 3, 4, 5]
accumulated = itertools.accumulate(numbers)
print(list(accumulated)) # [1, 3, 6, 10, 15]
# 也可以自定义函数
import operator
accumulated_mul = itertools.accumulate(numbers, operator.mul)
print(list(accumulated_mul)) # [1, 2, 6, 24, 120]
2.2 chain(*iterables)
将多个可迭代对象连接成一个:
chained = itertools.chain([1, 2], ['a', 'b'], [True, False])
print(list(chained)) # [1, 2, 'a', 'b', True, False]
2.3 chain.from_iterable(iterable)
类似于chain,但参数是一个可迭代对象,其中每个元素也是可迭代的:
lists = [[1, 2], [3, 4], [5, 6]]
flattened = itertools.chain.from_iterable(lists)
print(list(flattened)) # [1, 2, 3, 4, 5, 6]
2.4 compress(data, selectors)
根据selectors中的布尔值筛选data中的元素:
data = ['a', 'b', 'c', 'd']
selectors = [1, 0, 1, 0] # 1表示True,0表示False
compressed = itertools.compress(data, selectors)
print(list(compressed)) # ['a', 'c']
2.5 dropwhile(predicate, iterable)
当predicate为True时丢弃元素,然后返回剩余的所有元素:
numbers = [1, 4, 6, 2, 1, 7, 4, 2]
filtered = itertools.dropwhile(lambda x: x < 5, numbers)
print(list(filtered)) # [6, 2, 1, 7, 4, 2]
2.6 filterfalse(predicate, iterable)
返回predicate为False的元素:
numbers = [1, 2, 3, 4, 5, 6]
filtered = itertools.filterfalse(lambda x: x % 2, numbers)
print(list(filtered)) # [2, 4, 6]
2.7 groupby(iterable, key=None)
按照key函数分组相邻的相同元素:
data = ['a', 'a', 'b', 'c', 'c', 'c', 'd']
grouped = itertools.groupby(data)
for key, group in grouped:
print(f"{key}: {list(group)}")
# 输出:
# a: ['a', 'a']
# b: ['b']
# c: ['c', 'c', 'c']
# d: ['d']
2.8 islice(iterable, start, stop[, step])
对迭代器进行切片操作:
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
sliced = itertools.islice(numbers, 2, 8, 2)
print(list(sliced)) # [2, 4, 6]
2.9 starmap(function, iterable)
将iterable中的元素作为参数解包后传给function:
data = [(2, 5), (3, 2), (10, 3)]
result = itertools.starmap(pow, data)
print(list(result)) # [32, 9, 1000]
2.10 takewhile(predicate, iterable)
与dropwhile相反,当predicate为True时保留元素,一旦为False就停止:
numbers = [1, 4, 6, 2, 1, 7, 4, 2]
filtered = itertools.takewhile(lambda x: x < 5, numbers)
print(list(filtered)) # [1, 4]
2.11 tee(iterable, n=2)
将一个迭代器拆分为n个独立的迭代器:
data = [1, 2, 3, 4]
iter1, iter2, iter3 = itertools.tee(data, 3)
print(list(iter1)) # [1, 2, 3, 4]
print(list(iter2)) # [1, 2, 3, 4]
print(list(iter3)) # [1, 2, 3, 4]
2.12 zip_longest(*iterables, fillvalue=None)
类似于内置的zip,但以最长的可迭代对象为准,不足的用fillvalue填充:
a = [1, 2, 3]
b = ['a', 'b']
zipped = itertools.zip_longest(a, b, fillvalue='-')
print(list(zipped)) # [(1, 'a'), (2, 'b'), (3, '-')]
3. 组合迭代器
3.1 product(*iterables, repeat=1)
计算笛卡尔积:
colors = ['red', 'green']
sizes = ['S', 'M', 'L']
products = itertools.product(colors, sizes)
print(list(products))
# 输出:
# [('red', 'S'), ('red', 'M'), ('red', 'L'),
# ('green', 'S'), ('green', 'M'), ('green', 'L')]
# 重复自身
result = itertools.product([0, 1], repeat=3)
print(list(result))
# [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1),
# (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]
3.2 permutations(iterable, r=None)
返回长度为r的所有可能排列:
items = ['a', 'b', 'c']
perms = itertools.permutations(items, 2)
print(list(perms))
# [('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]
3.3 combinations(iterable, r)
返回长度为r的所有可能组合(不考虑顺序,不重复):
items = ['a', 'b', 'c']
combs = itertools.combinations(items, 2)
print(list(combs)) # [('a', 'b'), ('a', 'c'), ('b', 'c')]
3.4 combinations_with_replacement(iterable, r)
返回长度为r的所有可能组合(考虑顺序,可重复):
items = ['a', 'b', 'c']
combs = itertools.combinations_with_replacement(items, 2)
print(list(combs))
# [('a', 'a'), ('a', 'b'), ('a', 'c'),
# ('b', 'b'), ('b', 'c'), ('c', 'c')]
4. 实用示例
4.1 扁平化嵌套列表
nested = [[1, 2], [3, 4], [5, 6]]
flattened = itertools.chain.from_iterable(nested)
print(list(flattened)) # [1, 2, 3, 4, 5, 6]
4.2 批量处理数据
def batch(iterable, n=1):
it = iter(iterable)
while True:
chunk = list(itertools.islice(it, n))
if not chunk:
return
yield chunk
data = range(10)
for batch_data in batch(data, 3):
print(batch_data)
# [0, 1, 2]
# [3, 4, 5]
# [6, 7, 8]
# [9]
4.3 滑动窗口
def sliding_window(iterable, n=2):
iters = itertools.tee(iterable, n)
for i, it in enumerate(iters):
for _ in range(i):
next(it, None)
return zip(*iters)
numbers = [0, 1, 2, 3, 4, 5]
windows = sliding_window(numbers, 3)
print(list(windows)) # [(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5)]
4.4 多集合操作
# 合并多个排序后的集合
a = [1, 3, 5, 7]
b = [2, 4, 6, 8]
merged = list(itertools.chain.from_iterable(zip(a, b)))
print(merged) # [1, 2, 3, 4, 5, 6, 7, 8]
5. 注意事项
itertools模块中的函数返回的都是迭代器,这意味着:
- 内存高效:它们不会一次性生成所有结果,而是按需生成
- 惰性求值:只有在请求下一个元素时才会进行计算
- 一次性使用:大多数itertools迭代器只能使用一次,使用后需要重新创建
6. 总结
itertools模块提供了构建迭代器的强大工具,可以:
- 处理无限序列
- 组合和排列数据
- 高效地过滤和转换数据
- 创建内存高效的管道处理大数据集
通过组合这些基本构建块,开发者可以创建复杂的数据处理管道,同时保持代码的简洁和高效。