OpenCV-Python官方文档中文翻译12:Performance Measurement and Improvement Techniques性能评估和提高技术-CSDN博客

本文介绍了在图像处理中如何衡量代码性能并进行优化。通过OpenCV的cv.getTickCount和cv.getTickFrequency函数来测量执行时间，并对比了Python的time和profile模块。文章强调了优化的重要性，特别是对于OpenCV函数和Numpy操作的选择。此外，还提到了避免循环、向量化代码以及利用缓存的一些建议，以提升代码效率。

摘要由CSDN通过智能技术生成

Performance Measurement and Improvement Techniques

Goal

In image processing, since you are dealing with a large number of operations per second, it is mandatory that your code is not only providing the correct solution, but that it is also providing it in the fastest manner. So in this chapter, you will learn:

To measure the performance of your code.
Some tips to improve the performance of your code.
You will see these functions: cv.getTickCount, cv.getTickFrequency, etc.

Apart from OpenCV, Python also provides a module time which is helpful in measuring the time of execution. Another module profile helps to get a detailed report on the code, like how much time each function in the code took, how many times the function was called, etc. But, if you are using IPython, all these features are integrated in an user-friendly manner. We will see some important ones, and for more details, check links in the Additional Resources section.

在图像处理中，因为每一秒你都在处理大量的操作，你的代码必须要不仅能提供正确的结果，还要以最快的方式提供。所以在这一章，你将学习：

评估你代码的表现
一些技巧来提高代码的性能
你将学习这些函数：cv.getTickCount, cv.getTickFrequency等等。

除了OpenCV，Python也提供一个模块time,这对评估执行的时间也非常有用。另一个模块profile帮助得到代码的细节报告，比如代码中的每个函数花了多少时间，函数被调用了多少次等等。但是，如果你用的是IPython，所有的这些特性都被集成在用户友好界面中。我们将看到一些重要信息，至于更多的细节，请看在Additional Resources章节的链接。

Measuring Performance with OpenCV

The cv.getTickCount function returns the number of clock-cycles after a reference event (like the moment the machine was switched ON) to the moment this function is called. So if you call it before and after the function execution, you get the number of clock-cycles used to execute a function.

The cv.getTickFrequency function returns the frequency of clock-cycles, or the number of clock-cycles per second. So to find the time of execution in seconds, you can do following:

cv.getTickCount函数返回从参考事件（比如机器打开的时刻）到函数被调用的时间周期的个数。所以如果你在函数执行之前和之后调用它，你将得到用于执行函数的时间周期的个数。

cv.getTickFrequency函数返回时间周期的频率，或者每秒时间周期的数量。所以想要找到执行时间（以秒为单位），你可以这样做：

e1 = cv.getTickCount()
#your code execution
e2 = cv.getTickCount()
time = (e2-e1)/cv.getTickFrequency()

We will demonstrate with following example. The following example applies median filtering with kernels of odd sizes ranging from 5 to 49. (Don’t worry about what the result will look like - that is not our goal):

我们将用下面的例子展示。下面的例子应用中值滤波，内核的奇数范围为5到49。（别担心结果看起来怎么样，这不是我们的目标）：

img1 = cv.imread("messi5.jpg")
e1 = cv.getTickCount()
for i in range(5,49,2):
	img1 = cv.medianBlur(img1,i)
e2 = cv.getTickCount()
t = (e2-e1)/cv.getTickFrequency()
print(t)
#Result i got is 0.521107655 seconds

Note

You can do the same thing with the time module. Instead of cv.getTickCount, use the time.time() function. Then take the difference of the two times.

note

你可以用时间模块做同样的事。代替cv.getTickCount，用time.time()函数。然后取两次相差。

Default Optimization in OpenCV

Many of the OpenCV functions are optimized using SSE2, AVX, etc. It contains the unoptimized code also. So if our system support these features, we should exploit them (almost all modern day processors support them). It is enabled by default while compiling. So OpenCV runs the optimized code if it is enabled, otherwise it runs the unoptimized code. You can use cv.useOptimized() to check if it is enabled/disabled and cv.setUseOptimized() to enable/disable it. Let’s see a simple example.

很多OpenCV函数都是用SSE2,AVX等优化的。它还包含未优化的代码。所以如果我们的系统支持这些特性，我们应该利用它们（几乎所有的现代处理器都支持它们）。在编译的时候默认启动它。所以如果启用了，OpenCV将运行优化代码，否则它将运行未优化代码。你可以用 cv.useOptimized()来检查是否启用，用 cv.setUseOptimized()来启用或者关闭。让我们看一个简单的例子。

#check if optimization is enabled
In [5]:cv.useOptimized()
Out[5]:True
In[6]:%timeit res = cv.medianBlur(img,49)
10 loops,best of 3:34.9 ms per loop
#Disable it
In [7]:cv.setUseOptimized(False)
In [8]:cv.useOptimized()
Out[8]:False
In [9]:%timeit res = cv.medianBlur(img,49)
10 loops,best of 3:64.1 ms per loop

如你所见，优化的中值滤波比没优化的版本快两倍。如果你检查它的源，你将看到中值滤波是SIMD优化的。所以你可以在你的代码顶部使用它来启用优化（记住它是默认启用的）

As you can see, optimized median filtering is2x faster than the unoptimized version. If you check its source, you can see that median filtering is SIMD optimized. So you can use this to enable optimization at the top of your code (remember it is enabled by default).

Measuring Performance in IPython在IPython中评估性能

Sometimes you may need to compare the performance of two similar operations. IPython gives you a magic command timeit to perform this. It runs the code several times to get more accurate results. Once again, it is suitable to measuring single lines of code.

For example, do you know which of the following addition operations is better, x = 5; y = x**2, x = 5; y = xx, x = np.uint8([5]); y = xx, or y = np.square(x)? We will find out with timeit in the IPython shell.

有时你需要比较两个相似操作的性能。IPython给你一个魔术般的命令timeit来执行这个。它多次运行代码来得到更准确的结果。同样，它也适用于评价单行代码。

例如，你是否知道下面哪一个加法操作更好？x= 5;y=x2;y=xx,x=np.uint8([5]);y=xx,或者y=np.square(x)?我们将在IPython shell中使用timeit来找到答案。

In [10]:x = 5
In [11]:%timeit y=x**2
10000000loops,best of 3:58.3 ns per loop
In [12]:%timtit y=x*x
10000000loops,best of 3:58.3 ns per loop
In [15]:z = np.uint8([5])
In [17]:%timeit y =z*z
1000000 loops,best of 3:1.25 us per loop
In [19]:%timeit y= np.square(z)
1000000 loops,best of 3:1.16 us per loop

You can see that, x = 5 ; y = x*x is fastest and it is around 20x faster compared to Numpy. If you consider the array creation also, it may reach up to 100x faster. Cool, right? (Numpy devs are working on this issue)

Note

Python scalar operations are faster than Numpy scalar operations. So for operations including one or two elements, Python scalar is better than Numpy arrays. Numpy has the advantage when the size of the array is a little bit bigger.

We will try one more example. This time, we will compare the performance of cv.countNonZero() and np.count_nonzero() for the same image.

你能看见，x=5;y=x*x更快，差不多比Numpy快20倍。如果你还考虑数组的创建，它可能会快100倍。cool，对吧？（Numpy开发人员在研究这个问题）

note

Python标量操作比Numpy快。所以对于包含一两个元素的操作，Python标量比Numpy数组更好。当数组规模变大一点的时候Numpy占优。

我们再试一个例子。这次，我们将用同样的图片比较 cv.countNonZero()和np.count_nonzero()的性能。

In [35]:%timeit z =cv.countNonZero(img)
100000 loops,best of 3:15.8 us per loop
In [36]:%timeit z np.count_nonzero(img)
1000 loops,best of 3:340 us per loop

看见没，OpenCV函数比Numpy函数几乎快20倍。

note

通常，OpenCV函数比Numpy函数快。所以对于一些操作，OpenCV函数更为推荐。但是这里有例外，尤其当Numpy处理视图而不是副本的时候。

See, the OpenCV function is nearly 25x faster than the Numpy function.

Note

Normally, OpenCV functions are faster than Numpy functions. So for same operation, OpenCV functions are preferred. But, there can be exceptions, especially when Numpy works with views instead of copies.

More IPython magic commands

There are several other magic commands to measure performance, profiling, line profiling, memory measurement, and etc. They all are well documented. So only links to those docs are provided here. Interested readers are recommended to try them out.

这里还有一些其他的魔术命令可以来评估性能，性能分析，行性能分析，内存测量等等。它们都有很好的文档记录。所以这里只提供通往这些文档的链接。推荐感兴趣的读者试一试。

Performance Optimization Techniques

There are several techniques and coding methods to exploit maximum performance of Python and Numpy. Only relevant ones are noted here and links are given to important sources. The main thing to be noted here is, first try to implement the algorithm in a simple manner. Once it is working, profile it, find the bottlenecks, and optimize them.

Avoid using loops in Python as much as possible, especially double/triple loops etc. They are inherently slow.
Vectorize the algorithm/code to the maximum extent possible, because Numpy and OpenCV are optimized for vector operations.
Exploit the cache coherence.
Never make copies of an array unless it is necessary. Try to use views instead. Array copying is a costly operation.

If your code is still slow after doing all of these operations, or if the use of large loops is inevitable, use additional libraries like Cython to make it faster.

这里有一些技术和代码方式来利用Python和Numpy的最大性能。这里只标注相关信息，重要源的链接在这里给出。这里标注的最重要的是，首先尝试用一种简单方法来实现算法。一旦运行起来，分析它，找到瓶颈，优化它。

1.尽最大可能避免在Python中使用循环，尤其双重/三重循环等等。它们本来就很慢。

2.将算法/代码向量化到最大程度，因为Numpy和OpenCV对向量操作做出了优化。

3.利用缓存一致性。

4.除非必须，不要创建数组的副本。代替的，尝试使用视图。数组复制是一项很消耗的操作。

如果做了这些操作，你的代码还是很慢，或者不可避免地要使用大循环，用额外地库来加使其更快，比如Cython。