numba用户手册
性能提升
------------------------------------------------------------------------------------------------------------------------------------------------------
# 实例1:循环
@njit
def fun_np(x): #没有Python模式 0.581s
return np.cos(x) ** 2 + np.sin(x) ** 2 #numpy矢量运算高效
@njit #numba循环效率不错
def fun_loops(x): #对象模式 0.670s
r = np.empty_like(x)
n = len(x)
for i in range(n):
r[i] = np.cos(x[i]) ** 2 + np.sin(x[i]) ** 2
return r
------------------------------------------------------------------------------------------------------------------------------------------------------
# 实例2:fastmath(=False,严格的IEEE 754)
@njit(fastmath=False)
def sum_0(A):
acc = 0.
for x in A: #没有速算fastmath这个循环必须按严格的顺序积累
acc += np.sqrt(x)
return acc #35.2毫秒
@njit(fastmath=True)
def sum_0_fast(A):
acc = 0.
for x in A: #对于速算reduction可以矢量化因浮点再关联是允许
acc += np.sqrt(x)
return acc #17.8毫秒
------------------------------------------------------------------------------------------------------------------------------------------------------
# 实例3:提供一组LLVM快速操作标志来完成fastmath:
def add_llvm(x, y):
return (x - y) + y
print(njit(fastmath=False)(add_llvm)(0, np.inf)) # nan
print(njit(fastmath=True) (add_llvm)(0, np.inf)) # 0.0
print(njit(fastmath={'reassoc', 'nsz'})(add_llvm)(0, np.inf)) # 0.0
print(njit(fastmath={'reassoc'}) (add_llvm)(0, np.inf)) # nan
print(njit(fastmath={'nsz'}) (add_llvm)(0, np.inf)) # nan
------------------------------------------------------------------------------------------------------------------------------------------------------
# 实例4:parallel=True如代码含可并行化操作则将在多个本机线程上并行(没GIL)
@njit(parallel=True) #函数执行速度大约是等效numpy的5倍,是standard的6倍 @njit
def fun_parallel(x):
return np.cos(x) ** 2 + np.sin(x) ** 2 #112毫秒
------------------------------------------------------------------------------------------------------------------------------------------------------
# 实例5:
numba并行执行还支持类似于OpenMP的显式并行循环声明.prange函数类似range,如parallel=True未设置则仅作为的别名range.
假设对总和进行无序累积是安全的,则n可以使用来并行化循环prange
@njit(parallel=True)
def fun_parallel_loops(A): # 每个线程都可以累积自己的部分和,然后执行交叉线程约简以获得返回的结果
n = len(A)
acc = 0.
for i in prange(n):
acc += np.sqrt(A[i])
return acc #9 .81毫秒
@njit(parallel=True, fastmath=True)
def fun_parallel_fast(A):
n = len(A)
acc = 0.
for i in prange(n):
acc += np.sqrt(A[i])
return acc #5.37毫秒
------------------------------------------------------------------------------------------------------------------------------------------------------
英特尔SVML
英特尔提供了一个简短的矢量数学库(SVML),其中包含大量优化的超越函数,可用作编译器内部函数.
conda install -c numba icc_rt
------------------------------------------------------------------------------------------------------------------------------------------------------