pythonfor循环100次_提高Python中for循环的性能(可能使用numpy或numba)

1586010002-jmsa.png

I want to improve the performance of the for loop in this function.

import numpy as np

import random

def play_game(row, n=1000000):

"""Play the game! This game is a kind of random walk.

Arguments:

row (int[]): row index to use in the p matrix for each step in the

walk. Then length of this array is the same as n.

n (int): number of steps in the random walk

"""

p = np.array([[ 0.499, 0.499, 0.499],

[ 0.099, 0.749, 0.749]])

X0 = 100

Y0 = X0 % 3

X = np.zeros(n)

tempX = X0

Y = Y0

for j in range(n):

tempX = X[j] = tempX + 2 * (random.random() < p.item(row.item(j), Y)) - 1

Y = tempX % 3

return np.r_[X0, X]

The difficulty lies in the fact that the value of Y is computed at each step based on the value of X and that Y is then used in the next step to update the value for X.

I wonder if there is some numpy trick that could make a big difference. Using Numba is fair game (I tried it but without much success). However, I do not want to use Cython.

解决方案

A quick oberservation tells us that there is data dependency between iterations in the function code. Now, there are different kinds of data dependencies. The kind of data dependency you are looking at is indexing dependency that is data selection at any iteration depends on the previous iteration calculations. This dependency seemed difficult to trace between iterations, so this post isn't really a vectorized solution. Rather, we would try to pre-compute values that would be used within the loop, as much as possible. The basic idea is to do minimum work inside the loop.

Here's a brief explanation on how we can proceed with pre-calculations and thus have a more efficient solution :

Given, the relatively small shape of p from which row elements are to be extracted based on the input row, you can pre-select all those rows from p with p[row].

For each iteration, you are calculating a random number. You can replace this with a random array that you can setup before the loop and thus, you would have precalculated those random values as well.

Based on the precalculated values thus far, you would have the column indices for all rows in p. Note that these column indices would be a large ndarray containing all possible column indices and inside our code, only one would be chosen based on per-iteration calculations. Using the per-iteration column indices, you would increment or decrement X0 to get per-iteration output.

The implementation would look like this -

randarr = np.random.rand(n)

p = np.array([[ 0.499, 0.419, 0.639],

[ 0.099, 0.749, 0.319]])

def play_game_partvect(row,n,randarr,p):

X0 = 100

Y0 = X0 % 3

signvals = 2*(randarr[:,None] < p[row]) - 1

col_idx = (signvals + np.arange(3)) % 3

Y = Y0

currval = X0

out = np.empty(n+1)

out[0] = X0

for j in range(n):

currval = currval + signvals[j,Y]

out[j+1] = currval

Y = col_idx[j,Y]

return out

For verification against the original code, you would have the original code modified like so -

def play_game(row,n,randarr,p):

X0 = 100

Y0 = X0 % 3

X = np.zeros(n)

tempX = X0

Y = Y0

for j in range(n):

tempX = X[j] = tempX + 2 * (randarr[j] < p.item(row.item(j), Y)) - 1

Y = tempX % 3

return np.r_[X0, X]

Please note that since this code precomputes those random values, so this already would give you a good speedup over the code in the question.

Runtime tests and output verification -

In [2]: # Inputs

...: n = 1000

...: row = np.random.randint(0,2,(n))

...: randarr = np.random.rand(n)

...: p = np.array([[ 0.499, 0.419, 0.639],

...: [ 0.099, 0.749, 0.319]])

...:

In [3]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))

Out[3]: True

In [4]: %timeit play_game(row,n,randarr,p)

100 loops, best of 3: 11.6 ms per loop

In [5]: %timeit play_game_partvect(row,n,randarr,p)

1000 loops, best of 3: 1.51 ms per loop

In [6]: # Inputs

...: n = 10000

...: row = np.random.randint(0,2,(n))

...: randarr = np.random.rand(n)

...: p = np.array([[ 0.499, 0.419, 0.639],

...: [ 0.099, 0.749, 0.319]])

...:

In [7]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))

Out[7]: True

In [8]: %timeit play_game(row,n,randarr,p)

10 loops, best of 3: 116 ms per loop

In [9]: %timeit play_game_partvect(row,n,randarr,p)

100 loops, best of 3: 14.8 ms per loop

Thus, we are seeing a speedup of about 7.5x+, not bad!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值