python怎么用numpy_如何使用numpy.where()加速我的numpy循环

最新推荐文章于 2024-01-02 10:28:20 发布

weixin_39774044

最新推荐文章于 2024-01-02 10:28:20 发布

阅读量523

点赞数

文章标签： python怎么用numpy

我想问问如何仅使用np.where来实现它,其中有点 X/Y problem.

因此,我将尝试解释如何优化此功能.

我的第一个直觉是摆脱for循环,无论如何这是痛苦点：

import numpy as np

from scipy.stats import logistic

def func1(y, X, thresholds):

ll = 0.0

for row in zip(y, X):

if row[0] == 0:

ll += logistic.logcdf(thresholds[0] - row[1])

elif row[0] == len(thresholds):

ll += logistic.logcdf(row[1] - thresholds[-1])

else:

diff_prob = logistic.cdf(thresholds[row[0]] - row[1]) - \

logistic.cdf(thresholds[row[0] - 1] - row[1])

diff_prob = 10 ** -5 if diff_prob < 10 ** -5 else diff_prob

ll += np.log(diff_prob)

return ll

y = np.array([0, 1, 2])

X = [2, 2, 2]

thresholds = np.array([2, 3])

print(func1(y, X, thresholds))

我只是用row [0]代替了i,而没有改变循环的语义.所以少了一个for循环.

现在,我希望if-else的不同分支中的语句形式相同.为此：

import numpy as np

from scipy.stats import logistic

def func2(y, X, thresholds):

ll = 0.0

for row in zip(y, X):

if row[0] == 0:

ll += logistic.logcdf(thresholds[0] - row[1])

elif row[0] == len(thresholds):

ll += logistic.logcdf(row[1] - thresholds[-1])

else:

ll += np.log(

np.maximum(

10 ** -5,

logistic.cdf(thresholds[row[0]] - row[1]) -

logistic.cdf(thresholds[row[0] - 1] - row[1])

)

return ll

y = np.array([0, 1, 2])

X = [2, 2, 2]

thresholds = np.array([2, 3])

print(func2(y, X, thresholds))

现在,每个分支中的表达式的形式为ll = expr.

在这一点上,优化可以采取几种不同的途径.您可以通过将其编写为一个理解来尝试优化循环,但是我怀疑它不会大大提高速度.

另一种方法是将if条件退出循环.那也是您对np.where的意图：

import numpy as np

from scipy.stats import logistic

def func3(y, X, thresholds):

y_0 = y == 0

y_end = y == len(thresholds)

y_rest = ~(y_0 | y_end)

ll_1 = logistic.logcdf(thresholds[0] - X[ y_0 ])

ll_2 = logistic.logcdf(X[ y_end ] - thresholds[-1])

ll_3 = np.log(

np.maximum(

10 ** -5,

logistic.cdf(thresholds[y[ y_rest ]] - X[ y_rest ]) -

logistic.cdf(thresholds[ y[y_rest] - 1 ] - X[ y_rest])

)

return np.sum(ll_1) + np.sum(ll_2) + np.sum(ll_3)

y = np.array([0, 1, 2])

X = np.array([2, 2, 2])

thresholds = np.array([2, 3])

print(func3(y, X, thresholds))

请注意,我将X转换为np.array以便能够在其上使用花式索引.

在这一点上,我敢打赌它足够快达到我的目的.但是,根据您的要求,您可以在此之前或更早停止.

在我的计算机上,我得到以下结果：

y = np.random.random_integers(0, 10, size=(10000,))

X = np.random.random_integers(0, 10, size=(10000,))

thresholds = np.cumsum(np.random.rand(10))

%timeit func(y, X, thresholds) # Original

1 loops, best of 3: 1.51 s per loop

%timeit func1(y, X, thresholds) # Removed for-loop

1 loops, best of 3: 1.46 s per loop

%timeit func2(y, X, thresholds) # Standardized if statements

1 loops, best of 3: 1.5 s per loop

%timeit func3(y, X, thresholds) # Vectorized ~ 500x improvement

100 loops, best of 3: 2.74 ms per loop

weixin_39774044

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python怎么用numpy_如何使用numpy.where()加速我的numpy循环

我想问问如何仅使用np.where来实现它,其中有点 X/Y problem.因此,我将尝试解释如何优化此功能.我的第一个直觉是摆脱for循环,无论如何这是痛苦点：import numpy as npfrom scipy.stats import logisticdef func1(y, X, thresholds):ll = 0.0for row in zip(y, X):if row[0] =...
复制链接

扫一扫