python使用jit没加速_numba @jit会降低纯Python的速度吗?

so i need to improve the execution time for a script that i have been working on. I started working with numba jit decorator to try parallel computing however it throws me

KeyError: "Does not support option: 'parallel'"

so i decided to test the nogil if it unlocks the whole capabilities from my cpu but it was slower than pure python i dont understand why this happened, and if someone can help me or guide me i will be very grateful

import numpy as np

from numba import *

@jit(['float64[:,:],float64[:,:]'],'(n,m),(n,m)->(n,m)',nogil=True)

def asd(x,y):

return x+y

u=np.random.random(100)

w=np.random.random(100)

%timeit asd(u,w)

%timeit u+w

10000 loops, best of 3: 137 µs per loop

The slowest run took 7.13 times longer than the fastest. This could mean that an intermediate result is being cached

1000000 loops, best of 3: 1.75 µs per loop

解决方案

You cannot expect numba to outperform numpy on such a simple vectorized operation. Also your comparison isn't exactly fair since the numba function includes the cost of the outside function call. If you sum a larger array, you'll see that the performance of the two converge and what you are seeing is just overhead on a very fast operation:

import numpy as np

import numba as nb

@nb.njit

def asd(x,y):

return x+y

def asd2(x, y):

return x + y

u=np.random.random(10000)

w=np.random.random(10000)

%timeit asd(u,w)

%timeit asd2(u,w)

The slowest run took 17796.43 times longer than the fastest. This could mean

that an intermediate result is being cached.

100000 loops, best of 3: 6.06 µs per loop

The slowest run took 29.94 times longer than the fastest. This could mean that

an intermediate result is being cached.

100000 loops, best of 3: 5.11 µs per loop

As far as parallel functionality, for this simple operation, you can use nb.vectorize:

@nb.vectorize([nb.float64(nb.float64, nb.float64)], target='parallel')

def asd3(x, y):

return x + y

u=np.random.random((100000, 10))

w=np.random.random((100000, 10))

%timeit asd(u,w)

%timeit asd2(u,w)

%timeit asd3(u,w)

But again, if you operate on small arrays, you are going to be seeing the overhead of thread dispatch. For the array sizes above, I see the parallel giving me a 2x speedup.

Where numba really shines is doing operations that are difficult to do in numpy using broadcasting, or when operations would result in a lot of temporary intermediate array allocations.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值