Python High Performance 读书笔记1

最新推荐文章于 2024-05-01 13:19:31 发布

假小牙

最新推荐文章于 2024-05-01 13:19:31 发布

阅读量278

点赞数

分类专栏： python 文章标签： python 高性能读书笔记

本文链接：https://blog.csdn.net/sinat_21258931/article/details/100781766

版权

python 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

S1&S2 性能测试 & 纯粹python内优化

这是关于《Python 高性能》这本书的读书笔记，书本相关代码可在Repository中下载。

This is my reading notes for Python high performance. Related codes in this book are available in Repository.

本文为第一章（测试）与第二章（纯粹的python优化）的内容。
This article covers chapter 1 (benchmark) and chapter 2 (optimization in python).

本文首发于本csdn博主私人博客：Timing is Fun

S1 - BenchMark

相关文献 Timing

time & timeit - 文件级的benchmark

time only in unix bash

time simul.py

timeit in Ipython, bash, or inside python

# Ipython
from simul import benchmark
%timeit benchmark()

# bash
python -m timeit -s 'from simul import benchmark' 'benchmark()'

# python
 import timeit
 result = timeit.repeat('benchmark()', setup='from simul import benchmark', number = 10, repeat = 3)
 print(result)
 result = timeit.timeit('benchmark()', setup='from simul import benchmark', number = 10)
 print(result)

pytest & pytest-benchmark - 文件级的benchmark

add benchmark to the funciton args, e.g. test_evolve in test_simul.py

# bash
pytest test_simul.py::test_evolve

cProfile - 函数级的benchmark

function analysis in bash

# bash
python -m cProfile simul.py
python -m cProfile -s tottime simul.py
python -m cProfile -s tottime -o prof.out simul.py #输出可由status模块解析的文件

function analysis in .py

# bash
# code show in cprofile.py
python cprofile.py

function analysis in Ipython

# Ipython
from simul import benchmark
%prun benchmark()

analysis resullt
1. ncalls: 函数被调用次数
2. tottime: 执行花费总时间，不考虑其他函数调用
3. cumtime: 执行花费总时间，考虑其他函数调用
4. percall: 单次执行时间，不考虑其他函数调用
5. filename:lineno: 文件名和响应的行号
结果可视化 - KCachegrind(with pyprof2calltree)

# Bash
python -m cProfile -o prof.out taylor.py
pyprof2calltree -i prof.out -o prof.calltree
qcachegrind prof.calltree  # ??? Call Graph not usable

line_profiler - 行级的分析

.py 文件＋命令行

# .py文件
@profile
def evolve(self, dt):
    # 代码

#base
kernprof -l -v simul.py

在Ipython中

# Ipython
%load_ext line_profiler
from simul import benchmark, ParticleSimulator
%lprun -f ParticleSimulator.evolve benchmark()

analysis resullt
1. Line # ：行号
2. Hits : 次数
3. Time : 执行时间，us
4. Per Hit : Time/Hits
5. % Time : 时间百分比
6. Line Contents : 内容

dis - disassemble模块，反汇编为字节码

在命令行中

# bash
import dis
from simul import ParticleSimulator
dis.dis(ParticleSimulator.evolve)

memory_profiler - 内存使用情况

在Ipython中使用

# Ipython
%load_ext memory_profiler
from simul import benchmark_memory, ParticleSimulator
%mprun -f ParticleSimulator.evolve benchmark_memory()

slots : 通过避免将实例储存在内部字典中，从而节省一些内存，但不能添加没有指定的属性

class Particle:
    __slots__ = ('x', 'y', 'ang_vel')

    def __init__(self, x, y, ang_vel):
        self.x = x
        self.y = y
        self.ang_vel = ang_vel

S2 - python optimal

S2.1 useful structures & algorithms

list & deque - 列表和双端队列

list

访问：O(1)
尾部插入、删除元素(append(1), pop())：O(1)。（如果list所有位置都被占，会触发内存重新分配，此时为O(N)）
头部或中间插入、删除元素(insert(0,1), pop(0))：O(N)

查询：O(N)

如果list有序，使用bisect（二分）查找：O(log(N))

    import bisect
    collection = [1,2,3,4,5,6]
    bisect.bisect(collection, 3) # 返回值为 3

    def index_bisect(a, x):
        i = bisect.bisect(a, x)
        if i != len(a) and a[i] == x:
            return i
        raise ValueError
    
    index_bisect(collection, 3) # 返回值为 2

deque(collections.deque)
- 访问：O(N) - (因此不常用)
- 尾部插入、删除(pop(), append(1))：O(1)
- 头部插入、删除(popleft(), appendleft(1)): O(1)

dict - 字典

访问、插入、删除：O(1)

demo

计数独特值的出现次数

def conter_dict(items):
    counter = {}
    for item in items:
        if item not in counter:
            counter[item] = 0
        else:
            counter[item] += 1
    return counter

from collections import defaultdict
def counter_defaultdict(items):
    counter = defaultdict(int) # 默认初始化为0值，但是效率没有方法一高
    for item in items:
        counter[item] += 1
    return counter

from collections import Counter
counter = Counter(item) # item 为列表，效率最高

索引化查找（O(1),但空间复杂度高，灵活性低）

docs = ["the cat is under the table",
        "the dog is under the table",
        "cats and dogs smell roses",
        "Carla eats an apple"]
matches = [doc for doc in docs if "table" in doc] # O(N)

index = {}
for i, doc in enumerate(docs):
    for word in doc.split():
        if word not in index:
            index[word] = [i]
        else:
            index[word].append(i)

results = index["table"]
result_documents = [docs[i] for i in results] # O(1)

set - 集

插入、删除、成员资格测试：O(1)
并、交、差集
- 并：s.union(t) - O(S+T)
- 交：s.intersection(t) - O(min(S,T))
- 差集：s.difference(t) - O(S)

demo

剔除集合中重复元素 - O(N)

x = list(range(1000))+list(range(500))
x_unique = set(x)

布尔查询，索引化查找的可交可并可差版本 - O(1)

index = {}
for i, doc in enumerate(docs):
    for word in doc.split():
        if word not in index:
            index[word] = {i} # 创建set
        else:
            index[word].append(i)
# 后续可以通过多个关键词的交、并、差操作进行高级化查找

heapq - 堆

用作查找最大最小值
- 有序list用作查找最大最小值时，提取最大值(pop)-O(1); 插入(insert)-O(N);查找(bisect)-O(log(N))
插入和提取最大值 - O(log(N))

demo

heapq

import heapq
collection = [10,3,3,4,5,6]
heapq.heapify(collection)

heapq.heappop(collection) # 返回最小值 3
heapq.heappush(collection, 1) # 压入 1

queue.priorityQueue - 线程和进程安全

from queue import PriorityQueue

queue = PriorityQueue()
for element in collection:
    que.put(element) # 压入
queue.get() # 返回最小值 3， 若要获得最大值，可以乘以-1

'''将数字关联到一个对象上，（number, object）元组'''
queue1 = PriortyQueue()
queue1.put((3, "priority 3"))
queue1.put((2, "priority 2"))
queue1.put((1, "priority 1"))
queue1.get() # 返回：(1, "priority 1")

strings_dict - 字典树(前缀树)

用来在列表中查找与前缀匹配的字符串
需要pip安装patricia-trie（进一步可以使用C语言编写的datrie和marisa-trie）

demo

from random import choice
from string import ascii_uppercase

def random_string(length):
		return ''.join(choice(ascii_uppercase) for i in range(length))

strings = [random_string(32) for i in range(10000)]
matches = [s for s in strings if s.startswith('AA')] # 线性扫描 - O(N)
# %timeit [s for s in strings if s.startswith('AA')]

from patricia import trie # 字典树
strings_dict = {s:0 for s in strings} # 一个所有值为0的字典
strings_trie = trie(**strings_dict) # 初始化为字典树
matches = [list(strings_trie.iter('AA'))] # 使用迭代器查找 - O(S):S为集合中最长的字符串
# %timeit [list(strings_trie.iter('AA'))]

S2.2 缓存和memoization

Memoization：存储并重用以前的函数调用结果 - 动态规划

基于内存的缓存 - functools.lru_cache

demo1

from functools import lru_cache

@lru_cache(max_size = 16)
def sum2(a, b):
    print("Calculating {} + {}".format(a, b))
    return a + b

print(sum2(1, 2))
# 输出：
# Calculating 1 + 2
# 3

print(sum2(1,2))
# 输出：
# 3

sum2.cache_info()
# 输出：
# CacheInfo(hits=0, misses=1, maxsize=128, currsize=1)
sum2.cache_clear()

demo2: fibonacci数列

# 未使用memoization版本
def fibonacci(n): # O(2^N)
    if n < 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)
%timeit fibonacci(20)
# 输出： 5.57ms per loop

# 使用memoization版本 - O(N)
import timeit
setup_code = '''
from functools import lru_cache
from __main__ import fibonacci
finonacci_memoized = lru_cache(maxsize-None)(fibonacci)
'''

results = timeit.repeat('fibonacci_memoized(20)',
                        setup=setup_code,
                        repeat=1000,
                        number=1)
print("Fibonacci took {:.2f} us".format(min(results)))
# 输出： Fibonacci took 0.01us

基于磁盘的缓存 - joblib(需要pip安装)

使用了智能散列算法
demo

from joblib import Memory
memory = Memory(cachedir='/path/to/cachedir')

@memory.cache
def sum2(a, b):
    return a + b

S2.3 推导和生成器

列表、字典推导和生成器的速度比显式循环快

demo1 - 列表推导和生成器

def loop(): # 显式
    res = []
    for i in range(100000):
        res.append(i * i)
    return sum(res)

def comprehension(): # 列表推导
    return sum([i * i for i in range(100000)])

def generator(): # 生成器
    return sum(i * i for i in range(100000))

%timeit loop()
# 100 loops, best of 3: 16.1 ms per loop
%timeit comprehension()
# 100 loops, best of 3: 10.1 ms per loop
%timeit generator()
# 100 loops, best of 3: 12.4 ms per loop

demo2 - 字典推导

def loop(): # 显式
    res = {}
    for i in range(100000):
        res[i] = i
    return res

def comprehension(): # 字典推导
    return {i: i for i in range(100000)}
%timeit loop()
# 100 loops, best of 3: 13.2 ms per loop
%timeit comprehension()
# 100 loops, best of 3: 12.8 ms per loop

结合使用迭代器和filter、map等函数在内存使用方面更加高效

demo

    def map_comprehension(numbers): # numbers - 迭代器
        a = [n * 2 for n in numbers]
        b = [n ** 2 for n in a]
        c = [n ** 0.33 for n in b]
        return max(c)

    def map_normal(numbers):
        a = map(lambda n: n * 2, numbers)
        b = map(lambda n: n ** 2, a)
        c = map(lambda n: n ** 0.33, b)
        return max(c)
    
    %load_ext memory_profiler
    numbers = range(1000000)
    %memit map_comprehension(numbers)
    # peak memory: 166.33 MiB, increment：102.54 MiB
    %memit map_normal(numbers)
    # peak memory: 71.04 MiB, increment：0.00 MiB

注：更多返回迭代器的函数在模块itertools中

假小牙

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python High Performance 读书笔记1

S1&S2 性能测试 & 纯粹python内优化这是关于《Python 高性能》这本书的读书笔记，书本相关代码可在Repository中下载。This is my reading notes for Python high performance. Related codes in this book are available in Repository.本文为第一章...
复制链接

扫一扫

专栏目录