numba python_python代码分析和使用numba加速计算

最新推荐文章于 2023-02-28 17:26:28 发布

weixin_26705651

最新推荐文章于 2023-02-28 17:26:28 发布

阅读量156

点赞数

文章标签： python java 算法 leetcode 机器学习

原文链接：https://medium.com/@lpryszcz/python-code-profiling-and-accelerating-your-calculations-with-numba-50d9e5535b08

版权

numba python

Just in case you can’t read it here, you can find this also in my personal blog.

万一您在这里看不到它，也可以在我的个人博客中找到它。

You wrote up your excellent idea as a Python program/module but you are unsatisfied with its performance. The chances are high most of us have been there at least once. I’ve been there last week.

您将自己的出色创意写成Python程序/模块，但对其性能不满意。我们大多数人至少去过一次的机会很高。我上周去过那里。

I found an excellent method for outlier detection (Enhanced Isolation Forest). eIF was initially written in Python and later optimised in Cython (using C++). C++ is ~40x faster than the vanilla Python version, but it lacks the possibility to save the model (which is crucial for my project). Since adding model saving to C++ version is a rather complicated business, I’ve decided to optimise Python code. Initially, I hoped for ~5–10x speed improvement. The final effect surprised me, as rewritten Python code was ~40x faster than initial version matching C++ version performance!

我发现了一种出色的异常值检测方法( 增强隔离林 )。 eIF最初是用Python编写的，后来在Cython中进行了优化(使用C ++)。 C ++比普通Python版本快40倍，但是它缺乏保存模型的可能性(这对我的项目至关重要)。由于将模型保存添加到C ++版本是一项相当复杂的工作，因此我决定优化Python代码。最初，我希望将速度提高约5-10倍。最终效果使我感到惊讶，因为重写的Python代码比匹配C ++版本性能的初始版本快40倍！

代码剖析 (Code profiling)

Traditionally I’ve been relying on %timeit which reports precise execution time for expressions in Python.

传统上，我一直依靠%timeit来报告Python中表达式的精确执行时间。

%timeit F3.fit(X)
# 1.25 s ± 792 µs per loop (mean ± std. dev. of 7 runs, 1 l oop each)

As awesome as %timeit is, it won’t really tell you which parts of your code are time consuming. At least not directly. For that you’ll need something more advanced.

%timeit真是棒极了，它并不能真正告诉您代码的哪些部分很耗时。至少不是直接。为此，您需要更高级的功能。

Code profiling became easier thanks to line_profiler. You can install, load and use it in Jupyter notebook as follows:

多亏了line_profiler，代码分析变得更加容易。您可以按以下步骤在Jupyter笔记本中安装，加载和使用它：

# install line_profiler in your system
!pip install line_profiler 
# load the module into current Jupyter notebook
%load_ext line_profiler
# evaluate populate_nodes function of F3.fit program
%lprun -f F3.populate_nodes F3.fit(X)

The example above tells you that although line 134 takes just 11.7 µs per single execution, overall it takes 42.5% of the execution time as it’s executed over 32k times. So starting optimisation of the code from this single line could have dramatic effect on overall execution time.

上面的示例告诉您，尽管第134行单次执行仅花费11.7 µs，但由于它执行了32k次，因此总体花费了42.5％的执行时间。因此，从这一行开始对代码进行优化可能会对整体执行时间产生巨大影响。

代码优化 (Code optimisation)

First thing I’ve noticed in the original Python code was that in order to calculate outlier score individual samples were streamed through individual trees in the iForest.

我在原始Python代码中注意到的第一件事是，为了计算离群值，将单个样本流过iForest中的单个树。

for i in  range(len(X_in)):
    h_temp = 0
    for j in range(self.ntrees):
        h_temp += PathFactor(X_in[i],self.Trees[j]).path*1.0            # Compute path length for each point
    Eh = h_temp/self.ntrees                                             # Average of path length travelled by the point in all trees.
    S[i] = 2.0**(-Eh/self.c)                                            # Anomaly Score
return S

Since those are operations on arrays, lots of time can be saved if either all samples are processed by individual trees or if individual samples are processed by all trees. Implementing this wasn’t difficult and, combined with cleaning the code from unnecessary variables & classes, resulted in ~6–7x speed-up.

由于这些操作是对数组的操作，因此，如果所有样本都由单个树处理或单个样本由所有树处理，则可以节省大量时间。实现这一点并不困难，并且与从不必要的变量和类中清除代码相结合，可以使速度提高约6-7倍。

使用numba加速数组操作 (Speeding array operations with numba)

Further improvements were much more mild and required detailed code profiling. As mentioned above, single line took 42% overall execution time. Upon closer inspection, I’ve realised that calling X.min(axis=0) and X.max(axis=0) was really time consuming.

进一步的改进要温和得多，并且需要详细的代码配置文件。如上所述，单行花费了42％的总执行时间。经过仔细检查，我意识到调用X.min(axis=0)和X.max(axis=0)确实很耗时。

x = np.random.random(size=(256, 12))
%timeit x.min(axis=0), x.max(axis=0)
# 15.6 µs ± 43.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Python code can be optimised with numba. For example calculating min and max simultaneously using numba just-in-time compiler results in over 7x faster execution!

可以使用numba优化Python代码。例如，使用numba即时编译器同时计算最小值和最大值将使执行速度提高7倍以上！

from numba import jit@jit
def minmax(x):
    """np.min(x, axis=0), np.max(x, axis=0) for 2D array but faster"""
    m, n = len(x), len(x[0])
    mi, ma = np.empty(n), np.empty(n)
    mi[:] = ma[:] = x[0]
    for i in range(1, m):
        for j in range(n):
            if x[i, j]>ma[j]: ma[j] = x[i, j]
            elif x[i, j]<mi[j]: mi[j] = x[i, j]
    return mi, ma
%timeit minmax(x) 
# 2.19 µs ± 4.61 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# make sure the results are the same
np.all([minmax(x), (x.min(axis=0), x.max(axis=0))], axis=0)

Apart from that, there have been several other parts that could be optimised with numba. You can have a look at eif_new.py and compare it with older and C++ version using this notebook. If you want to know details, just comment below — I’ll be more than happy to discuss them

除此之外，还有其他一些部分可以使用numba进行优化。您可以使用此笔记本查看eif_new.py并将其与旧版本和C ++版本进行比较。如果您想了解详细信息，请在下面发表评论-我非常乐于讨论它们

If you’re looking for ways of speeding up array operations, definitely check numexpr beside numba. eIF case didn’t really need numexpr optimisations, but it’s really impressive project and I can imagine many people could benefit from it. So spread the word!

如果您正在寻找加快阵列操作速度的方法，请务必在numba旁边检查numexpr 。 eIF案例实际上并不需要numexpr优化，但这确实是一个令人印象深刻的项目，我可以想象有很多人可以从中受益。所以传播这个词！