Python Cookbook 学习笔记

癫狂的兔子

已于 2024-09-30 16:25:14 修改

阅读量270

点赞数 2

文章标签： python

于 2024-09-18 15:17:55 首次发布

本文链接：https://blog.csdn.net/qq_45800521/article/details/142332850

版权

Python Cookbook 学习笔记

1.数据结构和算法

1.数据结构和算法

搜索、排序、排列以及筛选等这一类常见的问题；常见的数据结构和同数据有关的算法。在collections模块中也包含了针对各种数据结构的解决方案。

1.1 将序列分解为单独的变量

# 变量赋值：要求是变量的总数和结构要与序列相吻合
name, shares, price, date = [ 'ACME', 50, 91.1, (2012, 12, 21) ]
name, shares, price, (year, mon, day) = [ 'ACME', 50, 91.1, (2012, 12, 21) ]
# 适用场景举例：丢弃不用的值
 _, shares, price, _ = [ 'ACME', 50, 91.1, (2012, 12, 21) ]

1.2 从任意长度的可迭代对象中分解元素

# 巧用*
# 适用场景：只对中间剩下的成绩做平均分统计
first, *middle, last = grades
grede_avg = avg(middle)
# 适用场景：一个人多个电话号码
for name, *phones in persons:
	pass
# 适用场景举例：丢弃不用的值
name, *ign, (*ign, year) = ('ACME', 50, 123.45, (12, 18, 2012))
# 不适用但可行场景：递归。递归真的不算是Python的强项，这是因为其内在的递归限制所致。
def sum(items):
	head, *tail = items
	return head + sum(tail) if tail else head
sum(items)

1.3 保存最后N个元素

从队列两端添加或弹出元素的复杂度都是O(1)。 这和列表不同，当从列表的头部插入或移除元素时，列表的复杂度为O(N)。
deque(maxlen=N) 创建了一个固定长度的队列。当有新记录加入而队列已满时会自动移除最老的那条记录。
如果不指定队列的大小，也就得到了一个无界限的队列，可以在两端执行添加和弹出操作。

from collections import deque
# 有限队列
q = deque(maxlen=3)
q.append(1)
q.append(2)
q.append(3)
q.append(4)		# deque([2, 3, 4], maxlen=3)
# 无限队列
q = deque()
q.append(1)
q.append(2)		# deque([1, 2])
q.appendleft(4)	# deque([4, 1, 2])
print(q.pop())	# 2
print(q)		# deque([4, 1])
print(q.popleft())	# 4

1.4 找到最大或最小的N个元素

N=1时，min()和max()更快；
N>1且相对较小时，函数nlargest()和nsmallest()才是最适用的（以堆的顺序排列的列表）；
N和集合本身的大小差不多大，通常更快的方法是先对集合排序，然后做切片操作（例如，使用sorted(items)[:N]或者sorted(items)[-N:]）。

heapq模块实现了一个小根堆。

heapify(x) : 将列表x转换为堆（小根堆）。
heappush(heap,item): 将item压入堆中。（heap使存放堆的数组）
heappop(heap)：从堆中弹出最小的项，并将其值返回。

heapq.heappush(pri_que, (freq, key)) 以 freq 的升序排列。最小的 freq 在堆顶，意味着该元素出现的频率最低。如果 freq 相同，则会比较第二个元素（key）。

import heapq
nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
print(heapq.nlargest(3, nums)) # Prints [42, 37, 23]
print(heapq.nsmallest(3, nums)) # Prints [-4, 1, 2]
# 增加参数key
portfolio = [
{'name': 'IBM', 'shares': 100, 'price': 91.1},
{'name': 'AAPL', 'shares': 50, 'price': 543.22},
{'name': 'FB', 'shares': 200, 'price': 21.09},
{'name': 'HPQ', 'shares': 35, 'price': 31.75},
{'name': 'YHOO', 'shares': 45, 'price': 16.35},
{'name': 'ACME', 'shares': 75, 'price': 115.65}
]
cheap = heapq.nsmallest(3, portfolio, key=lambda s:s['price'])
''' cheap:
[{'name': 'YHOO', 'shares': 45, 'price': 16.35}, 
{'name': 'FB', 'shares': 200, 'price': 21.09}, 
{'name': 'HPQ', 'shares': 35, 'price': 31.75}]
'''
expensive = heapq.nlargest(3, portfolio, key=lambda s:s['price'])
''' expensive:
[{'name': 'AAPL', 'shares': 50, 'price': 543.22}, 
{'name': 'ACME', 'shares': 75, 'price': 115.65}, 
{'name': 'IBM', 'shares': 100, 'price': 91.1}]
'''
# 以堆的顺序排列的列表
nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
heap = list(nums)
heapq.heapify(heap)
print(heap)	# [-4, 2, 1, 23, 7, 2, 18, 23, 42, 37, 8]
heapq.heappop(heap)	# -4
print(heap)	# [1, 2, 2, 23, 7, 8, 18, 23, 42, 37]
heapq.heappop(heap)	# 1
print(heap)	# [2, 2, 8, 23, 7, 37, 18, 23, 42]

1.6 在字典中将键映射到多个值上（defaultdict类）

from collections import defaultdict

d = defaultdict(list)
d['a'].append(1)

d = defaultdict(set)
d['a'].add(1)

# 不用考虑对第一个值做初始化操作
d = defaultdict(list)
for key, value in pairs:
	d[key].append(value)
# 普通字典对比
d = {}
for key, value in pairs:
	if key not in d:
		d[key] = []
	d[key].append(value)

defaultdict()会自动创建字典表项以待稍后的访问（即使这些表项当前在字典中还没有找到）。如果不想要这个功能，可
以在普通的字典上调用setdefault()方法来取代。但setdefault()方法不方便，每次调用它时都会创建一个初始值的新实例了（例子中的空列表[]）。

d = {} # A regular dictionary
d.setdefault('a', []).append(1)

1.7 让字典保持有序（OrderedDict类）

严格按照元素初始添加的顺序进行，以便稍后对其做序列化或编码成另一种格式（如JSON编码）。
OrderedDict内部维护了一个双向链表，它会根据元素加入的顺序来排列键的位置。第一个新加入的元素被放置在链表的末尾。接下来对已存在的键做重新赋值不会改变键的顺序。
OrderedDict的大小是普通字典的2倍多，这是由于它额外创建的链表所致。

from collections import OrderedDict
import json
d = OrderedDict()
d['foo'] = 1
d['bar'] = 2
json.dumps(d)	# '{"foo": 1, "bar": 2}'

1.8 与字典有关的计算问题（利用zip()实现值-键对）

直接处理字典，默认处理的是键，而非值。

prices = {
	'ACME': 45.23,
	'AAPL': 612.78,
	'IBM': 205.55,
	'HPQ': 37.20,
	'FB': 10.75
}
print(min(prices))			# 'AAPL'
print(min(prices.keys()))	# 'AAPL'
print(min(prices.values()))	# 10.75

利用lambda获取键值关系

min(prices, key=lambda k: prices[k]) 			# 'FB'
prices[min(prices, key=lambda k: prices[k])]	# 10.75

利用zip()实现值-键对，获取键值关系

min_price = min(zip(prices.values(), prices.keys()))	# min_price is (10.75, 'FB')
max_price = max(zip(prices.values(), prices.keys()))	# max_price is (612.78, 'AAPL')
prices_sorted = sorted(zip(prices.values(), prices.keys()))	# prices_sorted is [(10.75, 'FB'), (37.2, 'HPQ'), (45.23, 'ACME'), (205.55, 'IBM'), (612.78, 'AAPL')]

注意zip()创建了一个迭代器，它的内容只能被消费一次。

prices_and_names = zip(prices.values(), prices.keys())
print(min(prices_and_names)) # (10.75, 'FB')
print(max(prices_and_names)) # ValueError: max() arg is an empty sequence

zip()元组上执行比较操作时，元素会依次比较。

prices = {'AAA': 45.23, 'ZZZ': 45.23}
print(min(zip(prices.values(), prices.keys())))  # (45.23, 'AAA')
print(max(zip(prices.values(), prices.keys())))  # (45.23, 'ZZZ')

1.9 在两个字典中寻找相同点（字典键的集合特性）

字典的键支持常见的集合操作，比如求并集、交集和差集。字典的 keys()方法 和 items()方法 会返回keys-view对象，其中暴露了所有的键，且有唯一映射关系，故能直接使用keys-view对象而不必先将它们转化为集合。但values()方法并不支持集合操作。

a = {
    'x': 1,
    'y': 2,
    'z': 3
}
b = {
    'w': 10,
    'x': 11,
    'y': 2
}
# 交集
print(a.keys() & b.keys())  # {'y', 'x'}
# 差集
print(a.keys() - b.keys())  # {'z'}
c = {key: a[key] for key in a.keys() - {'z', 'w'}}  # {'x': 1, 'y': 2}

1.10 从序列中移除重复项且保持元素间顺序不变

如果一个对象是可哈希的，那么在它的生存期内必须是不可变的，它需要有一个__hash__()方法。

可哈希：

不可变基本数据类型：整数（int）、浮点数（float）、字符串（string）、元组（tuple）、布尔值（bool）
不可变集合类型：冻结集合（frozenset）
frozenset() -> empty frozenset object
frozenset(iterable) -> frozenset object
Build an immutable unordered collection of unique elements.（构建一个由唯一元素组成的 不可变 无序集合。）
自定义不可变对象：自定义类的实例是不可变且实现了__hash__()方法和__eq__()方法，那么这个类的是也是可哈希的。

不可哈希：

可变基本数据类型：列表（list）、集合（set）、字典（dict）

a = [1, 5, 2, 1, 9, 1, 5, 10]
b = [{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 1, 'y': 2}, {'x': 2, 'y': 4}]

# 无序去重->集合
set(a)

# 保持元素间顺序不变的去重
# 可哈希
def dedupe_hashable(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)      
print(dedupe(a))  # <generator object dedupe at 0x000001D5B71A2ED0>
print(list(dedupe(a)))  # [1, 5, 2, 9, 10]

# 兼容不可哈希
def dedupe_unhashable(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)
print(list(dedupe_unhashable(b, key=lambda d: (d['x'], d['y']))))
# [{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]

拓展：如文件行内容去重

with open(somefile,'r') as f:
	for line in dedupe(f):
		pass

1.11 对切片命名（slice）

用法：slice(stop) 或 slice(start, stop[, step])

items = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
cut = slice(2, 9, 3)	# items[cut] == items[2:9:3]
print(cut.start)  	# 2
print(cut.stop)  	# 9
print(cut.step)  	# 3
print(items[cut]) 	# [2, 5, 8]
del items[cut]
print(items)  		# [0, 1, 3, 4, 6, 7, 9, 10]

用indices(size)方法将切片映射到特定大小的序列上，返回一个(start, stop,step)元组，所有的值都已经恰当地限制在边界以内（当做索引操作时可避免出现IndexError异常）。下列结果存疑

s = '1234567890'
cut = slice(2, 8, 3)
print(cut.indices(len(s)))  # (2, 8, 3)
cut = slice(2, 8, -3)
print(cut.indices(len(s)))  # (2, 8, -3)
cut = slice(2, 18, 3)
print(cut.indices(len(s)))  # (2, 10, 3)
cut = slice(2, 18, -3)
print(cut.indices(len(s)))  # (2, 9, -3)
cut = slice(12, 8, 3)
print(cut.indices(len(s)))  # (10, 8, 3)
cut = slice(12, 8, -3)
print(cut.indices(len(s)))  # (9, 8, -3)
cut = slice(2, 8, 33)
print(cut.indices(len(s)))  # (2, 8, 33)
cut = slice(2, 8, -33)
print(cut.indices(len(s)))  # (2, 8, -33)
cut = slice(2, -8, 3)
print(cut.indices(len(s)))  # (2, 2, 3)
cut = slice(-2, 8, 3)
print(cut.indices(len(s)))  # (8, 8, 3)
cut = slice(-2, -8, 3)
print(cut.indices(len(s)))  # (8, 2, 3)
cut = slice(-2, -8, -3)
print(cut.indices(len(s)))  # (8, 2, -3)

# 实际使用时要用*将切片元组分解
for i in range(*cut.indices(len(s))):
    print(s[i])

1.12 找出序列中出现次数最多的元素（Counter类）

from collections import Counter

words = ['a', 'b', 'c', 'd', 'd', 'b', 'b', 'e', 'c', 'f', 'a']
morewords = ['c', 'c']

word_counts = Counter(words)
print(word_counts)  # Counter({'b': 3, 'a': 2, 'c': 2, 'd': 2, 'e': 1, 'f': 1})
print(word_counts.most_common(5))  # [('b', 3), ('a', 2), ('c', 2), ('d', 2), ('e', 1)]
print(word_counts.most_common(3))  # [('b', 3), ('a', 2), ('c', 2)]

# 在底层实现中，Counter是一个无序字典，键值对就是 元素-计数。计数值可以是任意的Interger（包括0和负数）。
print(word_counts['a'])  # 2
print(word_counts['z'])  # 0	# 查询不存在的键也不会报错

# 键值可直接计算
word_counts['a'] = word_counts['a'] + 5
print(word_counts['a'])  # 7
print(word_counts.most_common(3))  # [('a', 7), ('b', 3), ('c', 2)]

# 字典的update()会替换键值，而Counter的update()会累加
word_counts.update(morewords)
print(word_counts.most_common(3))  # [('a', 7), ('c', 4), ('b', 3)]

# Counter的数学计算
morewords_count = Counter(morewords)
print(word_counts)  # Counter({'a': 7, 'c': 4, 'b': 3, 'd': 2, 'e': 1, 'f': 1})
print(morewords_count)  # Counter({'c': 2})
print(word_counts + morewords_count)  # Counter({'a': 7, 'c': 6, 'b': 3, 'd': 2, 'e': 1, 'f': 1})
print(word_counts - morewords_count)  # Counter({'a': 7, 'b': 3, 'c': 2, 'd': 2, 'e': 1, 'f': 1})

# 重新建一个Counter实例，计数恢复
word_counts = Counter(words)
print(word_counts)  # Counter({'b': 3, 'a': 2, 'c': 2, 'd': 2, 'e': 1, 'f': 1})

1.13 通过公共键对字典列表排序（itemgetter）

性能上，itemgetter()通常会比lambda表达式运行得更快一些

rows = [
    {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
    {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
    {'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
    {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
]

# 利用lambda
rows_by_fname = sorted(rows, key=lambda x: x['fname'])
rows_by_uid = sorted(rows, key=lambda x: x['uid'])
rows_by_lfname = sorted(rows, key=lambda x: (x['lname'], x['fname'],))
print(rows_by_fname)
print(rows_by_uid)
print(rows_by_lfname)
'''
[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]
[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]
[{'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}]
'''

# 利用itemgetter,性能上通常会运行得更快一些
from operator import itemgetter
rows_by_fname = sorted(rows, key=itemgetter('fname'))
rows_by_uid = sorted(rows, key=itemgetter('uid'))
rows_by_lfname = sorted(rows, key=itemgetter('lname', 'fname'))
print(rows_by_fname)
print(rows_by_uid)
print(rows_by_lfname)
'''
[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]
[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]
[{'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}]
'''

# 还使用与参数带查询标记key的，如max/min
print(min(rows, key=itemgetter('uid')))  # {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}
print(max(rows, key=itemgetter('uid')))  # {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}