怎么优化python代码_分析并优化你的python代码

最新推荐文章于 2022-03-15 15:10:02 发布

weixin_39623805

最新推荐文章于 2022-03-15 15:10:02 发布

阅读量131

点赞数

文章标签：怎么优化python代码

本文通过实例展示了如何使用Python的性能分析工具cProfile和line_profiler找出程序瓶颈，并通过优化算法和引入numpy库提升程序运行速度。文章详细介绍了分析过程，从磁盘访问到random.choice函数的优化，最终将程序运行时间从11.362秒降低到1秒左右。此外，还强调了性能优化应有针对性，并提供了其他性能优化技巧。

摘要由CSDN通过智能技术生成

[Are you losing your time in a loop?](https://popkey.co/u/q4omg?ref=embed)

性能分析

只要找到性能瓶颈，采用更好的算法和合适的工具，大多数情况下Python就足以满足我们的生产环境需求。

通过查看源码来找到程序缓慢的原因是低效的，即使像下面的例子那样微不足道的代码也可能是一个难题：

"""Sorting a large, randomly generated string and writing it to disk"""

import random

def write_sorted_letters(nb_letters=10**7):

random_string = ''

for i in range(nb_letters):

random_string += random.choice('abcdefghijklmnopqrstuvwxyz')

sorted_string = sorted(random_string)

with open("sorted_text.txt", "w") as sorted_text:

for character in sorted_string:

sorted_text.write(character)

write_sorted_letters()

瓶颈显然是磁盘访问，对吧？好，让我们用性能分析器看看。

命令行运行：

python -m cProfile -s tottime your_program.py

结果如下：

40000054 function calls in 11.362 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)

10000000 4.137 0.000 5.166 0.000 random.py:273(choice)

1 3.442 3.442 11.337 11.337 sort.py:5(write_sorted_letters)

1 1.649 1.649 1.649 1.649 {sorted}

10000000 0.960 0.000 0.960 0.000 {method 'write' of 'file' objects}

10000000 0.547 0.000 0.547 0.000 {method 'random' of '_random.Random' objects}

10000000 0.482 0.000 0.482 0.000 {len}

1 0.121 0.121 0.121 0.121 {range}

1 0.021 0.021 11.362 11.362 sort.py:1()

...

-s tottime使得结果按总花费时间排序。头几个就是耗时大户。

所以看tottime列，我们发现，random模块的choice()函数几乎占用了总运行时间的三分之一。

在我们优化之前，再进一步剖析下。

有的放矢

以上的命令会分析你的整个程序，如果你想要更精确，用下面的代码段包裹住你想要分析的地方：

import cProfile

cp = cProfile.Profile()

cp.enable()

与

cp.disable()

cp.print_stats()

输出和之前类似，但减少了不必要的干扰。

由于很难知道程序运行情况，一般策略是先分析整个程序，然后逐步缩小分析区域。

更多关于cProfile和Profile模块的信息看这里。

逐行分析

有时，我们需要逐行分析代码，我们可以使用line_profiler，安装：

pip install line_profiler

然后用@profile装饰我们要分析的函数：

@profile

def write_sorted_letters(nb_letters=10**7):

...

再在命令行运行：

kernprof -l -v your_program.py

-l 用于逐行分析

-v 用于立刻显示结果

结果如下：

Total time: 21.4412 s

File: ./sort.py

Function: write_sorted_letters at line 5

Line # Hits Time Per Hit % Time Line Contents

================================================================

5 @profile

6 def write_sorted_letters(nb_letters=10**7):

7 1 1 1.0 0.0 random_string = ''

8 10000001 3230206 0.3 15.1 for _ in range(nb_letters):

9 10000000 9352815 0.9 43.6 random_string += random.choice('abcdefghijklmnopqrstuvwxyz')

10 1 1647254 1647254.0 7.7 sorted_string = sorted(random_string)

12 1 1334 1334.0 0.0 with open("sorted_text.txt", "w") as sorted_text:

13 10000001 2899712 0.3 13.5 for character in sorted_string:

14 10000000 4309926 0.4 20.1 sorted_text.write(character)

要注意的是这个分析工具使得我们的程序慢了近一倍，但我们看到了每一行对性能的影响。

大型多线程web应用的分析

上面的工具对单线程本地开发的性能分析足够简单有效，但是应对大型多线程应用就很不一样了，这时我们需要非常赞的Profiling module。

sudo pip install profiling安装，profiling your_program.py运行。要记得移除@profile，那只会在line_profiler下工作。

在程序运行结束时，它给出了一个详细的树状视图，而且是可交互的：

对于一个长期运行的程序如web服务器，你需要这样启动它来及时地查看性能分析：

profiling live-profile your_server_program.py

性能分析资源

优化

现在我们知道了程序是怎么占用cpu的，可以相应地优化它们。

小警告：

你应当只在必要时进行优化，因为优化后的代码的可读性和可维护性一般都会差上不少。

优化是可维护性和性能的交换。

救场的numpy

看起来random.choice函数让我们变慢不少。

让我们用鼎鼎大名的numpy库的相似函数来替换它，参数略有不同：

"""Sorting a large, randomly generated string and writing it to disk"""

from numpy import random

def write_sorted_letters(nb_letters=10**7):

letters = tuple('abcdefghijklmnopqrstuvwxyz')

random_letters = random.choice(letters, nb_letters)

random_letters.sort()

sorted_string = random_letters.tostring()

with open("sorted_text.txt", "w") as sorted_text:

for character in sorted_string:

sorted_text.write(character)

write_sorted_letters()

Numpy的数值函数强大而快速，甚至可以并行处理。没有的话使用pip install numpy安装。

让我们看看最新的性能分析结果：

10011861 function calls (10011740 primitive calls) in 3.357 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)

10000000 1.272 0.000 1.272 0.000 {method 'write' of 'file' objects}

1 1.268 1.268 3.321 3.321 numpy_sort.py:5(write_sorted_letters)

1 0.657 0.657 0.657 0.657 {method 'sort' of 'numpy.ndarray' objects}

1 0.120 0.120 0.120 0.120 {method 'choice' of 'mtrand.RandomState' objects}

4 0.009 0.002 0.047 0.012 __init__.py:1()

1 0.003 0.003 0.003 0.003 {method 'tostring' of 'numpy.ndarray' objects}

...

很棒，快了3倍左右(3.3s vs 11.362s)

现在，tottime时间列上，读写操作是最大的瓶颈了。让我们解决它，替换

with open("sorted_text.txt", "w") as sorted_text:

for character in sorted_string:

sorted_text.write(character)

为

with open("sorted_text.txt", "w") as sorted_text:

sorted_text.write(sorted_string)

这避免了一个字符一个字符地写入磁盘，而是一次性写入整个字符串，利用磁盘缓存和缓冲区加速文件写入。

最后，简单地统计我们的代码的运行时间：

time python your_program.py

输出为：

real 0m0.874s

user 0m0.852s

sys 0m0.280s

只花了1秒不到！

其他性能技巧

请记住计算机中的这些延迟数：

Latency Comparison Numbers

--------------------------

L1 cache reference 0.5 ns

Branch mispredict 5 ns

L2 cache reference 7 ns 14x L1 cache

Mutex lock/unlock 25 ns

Main memory reference 100 ns 20x L2 cache, 200x L1 cache

Compress 1K bytes with Zippy 3,000 ns 3 us

Send 1K bytes over 1 Gbps network 10,000 ns 10 us

Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD

Read 1 MB sequentially from memory 250,000 ns 250 us

Round trip within same datacenter 500,000 ns 500 us

Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory

Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip

Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD

Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms

优化资源

weixin_39623805

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫