S1&S2 性能测试 & 纯粹python内优化
这是关于《Python 高性能》这本书的读书笔记,书本相关代码可在Repository中下载。
This is my reading notes for Python high performance. Related codes in this book are available in Repository.
本文为第一章(测试)与第二章(纯粹的python优化)的内容。
This article covers chapter 1 (benchmark) and chapter 2 (optimization in python).本文首发于本csdn博主私人博客:Timing is Fun
S1 - BenchMark
相关文献 Timing
time & timeit - 文件级的benchmark
- time only in unix bash
time simul.py
- timeit in Ipython, bash, or inside python
# Ipython
from simul import benchmark
%timeit benchmark()
# bash
python -m timeit -s 'from simul import benchmark' 'benchmark()'
# python
import timeit
result = timeit.repeat('benchmark()', setup='from simul import benchmark', number = 10, repeat = 3)
print(result)
result = timeit.timeit('benchmark()', setup='from simul import benchmark', number = 10)
print(result)
pytest & pytest-benchmark - 文件级的benchmark
- add benchmark to the funciton args, e.g. test_evolve in test_simul.py
# bash
pytest test_simul.py::test_evolve
cProfile - 函数级的benchmark
- function analysis in bash
# bash
python -m cProfile simul.py
python -m cProfile -s tottime simul.py
python -m cProfile -s tottime -o prof.out simul.py #输出可由status模块解析的文件
- function analysis in .py
# bash
# code show in cprofile.py
python cprofile.py
- function analysis in Ipython
# Ipython
from simul import benchmark
%prun benchmark()
-
analysis resullt
- ncalls: 函数被调用次数
- tottime: 执行花费总时间,不考虑其他函数调用
- cumtime: 执行花费总时间,考虑其他函数调用
- percall: 单次执行时间,不考虑其他函数调用
- filename:lineno: 文件名和响应的行号
-
结果可视化 - KCachegrind(with pyprof2calltree)
# Bash
python -m cProfile -o prof.out taylor.py
pyprof2calltree -i prof.out -o prof.calltree
qcachegrind prof.calltree # ??? Call Graph not usable
line_profiler - 行级的分析
- .py 文件 + 命令行
# .py文件
@profile
def evolve(self, dt):
# 代码
#base
kernprof -l -v simul.py
- 在Ipython中
# Ipython
%load_ext line_profiler
from simul import benchmark, ParticleSimulator
%lprun -f ParticleSimulator.evolve benchmark()
-
analysis resullt
- Line # :行号
- Hits : 次数
- Time : 执行时间,us
- Per Hit : Time/Hits
- % Time : 时间百分比
- Line Contents : 内容
dis - disassemble模块,反汇编为字节码
- 在命令行中
# bash
import dis
from simul import ParticleSimulator
dis.dis(ParticleSimulator.evolve)
memory_profiler - 内存使用情况
- 在Ipython中使用
# Ipython
%load_ext memory_profiler
from simul import benchmark_memory, ParticleSimulator
%mprun -f ParticleSimulator.evolve benchmark_memory()
- slots : 通过避免将实例储存在内部字典中,从而节省一些内存,但不能添加没有指定的属性
class Particle:
__slots__ = ('x', 'y', 'ang_vel')
def __init__(self, x, y, ang_vel):
self.x = x
self.y = y
self.ang_vel = ang_vel
S2 - python optimal
S2.1 useful structures & algorithms
list & deque - 列表和双端队列
- list
- 访问:O(1)
- 尾部插入、删除元素(append(1), pop()):O(1)。(如果list所有位置都被占,会触发内存重新分配,此时为O(N))
- 头部或中间插入、删除元素(insert(0,1), pop(0)):O(N)
- 查询:O(N)
- 如果list有序,使用bisect(二分)查找:O(log(N))
import bisect collection = [1,2,3,4,5,6] bisect.bisect(collection, 3) # 返回值为 3 def index_bisect(a, x): i = bisect.bisect(a, x) if i != len(a) and a[i] == x: return i raise ValueError index_bisect(collection, 3) # 返回值为 2
- 如果list有序,使用bisect(二分)查找:O(log(N))
- deque(collections.deque)
- 访问:O(N) - (因此不常用)
- 尾部插入、删除(pop(), append(1)):O(1)
- 头部插入、删除(popleft(), appendleft(1)): O(1)
dict - 字典
- 访问、插入、删除:O(1)
- demo
- 计数独特值的出现次数
def conter_dict(items): counter = {} for item in items: if item not in counter: counter[item] = 0 else: counter[item] += 1 return counter from collections import defaultdict def counter_defaultdict(items): counter = defaultdict(int) # 默认初始化为0值,但是效率没有方法一高 for item in items: counter[item] += 1 return counter from collections import Counter counter = Counter(item) # item 为列表,效率最高
- 索引化查找(O(1),但空间复杂度高,灵活性低)
docs = ["the cat is under the table", "the dog is under the table", "cats and dogs smell roses", "Carla eats an apple"] matches = [doc for doc in docs if "table" in doc] # O(N) index = {} for i, doc in enumerate(docs): for word in doc.split(): if word not in index: index[word] = [i] else: index[word].append(i) results = index["table"] result_documents = [docs[i] for i in results] # O(1)
- 计数独特值的出现次数
set - 集
- 插入、删除、成员资格测试:O(1)
- 并、交、差集
- 并:s.union(t) - O(S+T)
- 交:s.intersection(t) - O(min(S,T))
- 差集:s.difference(t) - O(S)
- demo
- 剔除集合中重复元素 - O(N)
x = list(range(1000))+list(range(500)) x_unique = set(x)
- 布尔查询,索引化查找的可交可并可差版本 - O(1)
index = {} for i, doc in enumerate(docs): for word in doc.split(): if word not in index: index[word] = {i} # 创建set else: index[word].append(i) # 后续可以通过多个关键词的交、并、差操作进行高级化查找
- 剔除集合中重复元素 - O(N)
heapq - 堆
- 用作查找最大最小值
- 有序list用作查找最大最小值时,提取最大值(pop)-O(1); 插入(insert)-O(N);查找(bisect)-O(log(N))
- 插入和提取最大值 - O(log(N))
- demo
- heapq
import heapq collection = [10,3,3,4,5,6] heapq.heapify(collection) heapq.heappop(collection) # 返回最小值 3 heapq.heappush(collection, 1) # 压入 1
- queue.priorityQueue - 线程和进程安全
from queue import PriorityQueue queue = PriorityQueue() for element in collection: que.put(element) # 压入 queue.get() # 返回最小值 3, 若要获得最大值,可以乘以-1 '''将数字关联到一个对象上,(number, object)元组''' queue1 = PriortyQueue() queue1.put((3, "priority 3")) queue1.put((2, "priority 2")) queue1.put((1, "priority 1")) queue1.get() # 返回:(1, "priority 1")
strings_dict - 字典树(前缀树)
- 用来在列表中查找与前缀匹配的字符串
- 需要pip安装patricia-trie(进一步可以使用C语言编写的datrie和marisa-trie)
- demo
from random import choice from string import ascii_uppercase def random_string(length): return ''.join(choice(ascii_uppercase) for i in range(length)) strings = [random_string(32) for i in range(10000)] matches = [s for s in strings if s.startswith('AA')] # 线性扫描 - O(N) # %timeit [s for s in strings if s.startswith('AA')] from patricia import trie # 字典树 strings_dict = {s:0 for s in strings} # 一个所有值为0的字典 strings_trie = trie(**strings_dict) # 初始化为字典树 matches = [list(strings_trie.iter('AA'))] # 使用迭代器查找 - O(S):S为集合中最长的字符串 # %timeit [list(strings_trie.iter('AA'))]
S2.2 缓存和memoization
- Memoization:存储并重用以前的函数调用结果 - 动态规划
- 基于内存的缓存 - functools.lru_cache
- demo1
from functools import lru_cache @lru_cache(max_size = 16) def sum2(a, b): print("Calculating {} + {}".format(a, b)) return a + b print(sum2(1, 2)) # 输出: # Calculating 1 + 2 # 3 print(sum2(1,2)) # 输出: # 3 sum2.cache_info() # 输出: # CacheInfo(hits=0, misses=1, maxsize=128, currsize=1) sum2.cache_clear()
- demo2: fibonacci数列
# 未使用memoization版本 def fibonacci(n): # O(2^N) if n < 1: return 1 else: return fibonacci(n-1) + fibonacci(n-2) %timeit fibonacci(20) # 输出: 5.57ms per loop # 使用memoization版本 - O(N) import timeit setup_code = ''' from functools import lru_cache from __main__ import fibonacci finonacci_memoized = lru_cache(maxsize-None)(fibonacci) ''' results = timeit.repeat('fibonacci_memoized(20)', setup=setup_code, repeat=1000, number=1) print("Fibonacci took {:.2f} us".format(min(results))) # 输出: Fibonacci took 0.01us
- 基于磁盘的缓存 - joblib(需要pip安装)
- 使用了智能散列算法
- demo
from joblib import Memory memory = Memory(cachedir='/path/to/cachedir') @memory.cache def sum2(a, b): return a + b
S2.3 推导和生成器
- 列表、字典推导和生成器的速度比显式循环快
- demo1 - 列表推导和生成器
def loop(): # 显式 res = [] for i in range(100000): res.append(i * i) return sum(res) def comprehension(): # 列表推导 return sum([i * i for i in range(100000)]) def generator(): # 生成器 return sum(i * i for i in range(100000)) %timeit loop() # 100 loops, best of 3: 16.1 ms per loop %timeit comprehension() # 100 loops, best of 3: 10.1 ms per loop %timeit generator() # 100 loops, best of 3: 12.4 ms per loop
- demo2 - 字典推导
def loop(): # 显式 res = {} for i in range(100000): res[i] = i return res def comprehension(): # 字典推导 return {i: i for i in range(100000)} %timeit loop() # 100 loops, best of 3: 13.2 ms per loop %timeit comprehension() # 100 loops, best of 3: 12.8 ms per loop
- 结合使用迭代器和filter、map等函数在内存使用方面更加高效
- demo
def map_comprehension(numbers): # numbers - 迭代器 a = [n * 2 for n in numbers] b = [n ** 2 for n in a] c = [n ** 0.33 for n in b] return max(c) def map_normal(numbers): a = map(lambda n: n * 2, numbers) b = map(lambda n: n ** 2, a) c = map(lambda n: n ** 0.33, b) return max(c) %load_ext memory_profiler numbers = range(1000000) %memit map_comprehension(numbers) # peak memory: 166.33 MiB, increment:102.54 MiB %memit map_normal(numbers) # peak memory: 71.04 MiB, increment:0.00 MiB
- 注:更多返回迭代器的函数在模块itertools中