python for循环嵌套 存储,Python Numpy向量化嵌套的for循环用于组合

Given an nxn array A of real positive numbers, I'm trying to find the minimum of the maximum of the element-wise minimum of all combinations of three rows of the 2-d array. Using for-loops, that comes out to something like this:

import numpy as np

n = 100

np.random.seed(2)

A = np.random.rand(n,n)

global_best = np.inf

for i in range(n-2):

for j in range(i+1, n-1):

for k in range(j+1, n):

# find the maximum of the element-wise minimum of the three vectors

local_best = np.amax(np.array([A[i,:], A[j,:], A[k,:]]).min(0))

# if local_best is lower than global_best, update global_best

if (local_best < global_best):

global_best = local_best

save_rows = [i, j, k]

print global_best, save_rows

In the case for n = 100, the output should be this:

Out[]: 0.492652949593 [6, 41, 58]

I have a feeling though that I could do this much faster using Numpy vectorization, and would certainly appreciate any help on doing this. Thanks.

解决方案

Don't try to vectorize loops that are not simple to vectorize. Instead use a jit compiler like Numba or use Cython. Vectorized solutions are good if the resulting code is more readable, but in terms of performance a compiled solution is usually faster or in a worst case scenario as fast as a vectorized solution (except BLAS routines).

Single-threaded example

import numba as nb

import numpy as np

#Min and max library calls may be costly for only 3 values

@nb.njit()

def max_min_3(A,B,C):

max_of_min=-np.inf

for i in range(A.shape[0]):

loc_min=A[i]

if (B[i]

loc_min=B[i]

if (C[i]

loc_min=C[i]

if (max_of_min

max_of_min=loc_min

return max_of_min

@nb.njit()

def your_func(A):

n=A.shape[0]

save_rows=np.zeros(3,dtype=np.uint64)

global_best=np.inf

for i in range(n):

for j in range(i+1, n):

for k in range(j+1, n):

# find the maximum of the element-wise minimum of the three vectors

local_best = max_min_3(A[i,:], A[j,:], A[k,:])

# if local_best is lower than global_best, update global_best

if (local_best < global_best):

global_best = local_best

save_rows[0] = i

save_rows[1] = j

save_rows[2] = k

return global_best, save_rows

Performance of single-threaded version

n=100

your_version: 1.56s

compiled_version: 0.0168s (92x speedup)

n=150

your_version: 5.41s

compiled_version: 0.08122s (66x speedup)

n=500

your_version: 283s

compiled_version: 8.86s (31x speedup)

The first call has a constant overhead of about 0.3-1s. For performance measurement of the calculation time itself, call it once and then measure performance.

With a few code changes this task can also be parallelized.

Multi-threaded example

@nb.njit(parallel=True)

def your_func(A):

n=A.shape[0]

all_global_best=np.inf

rows=np.empty((3),dtype=np.uint64)

save_rows=np.empty((n,3),dtype=np.uint64)

global_best_Temp=np.empty((n),dtype=A.dtype)

global_best_Temp[:]=np.inf

for i in range(n):

for j in nb.prange(i+1, n):

row_1=0

row_2=0

row_3=0

global_best=np.inf

for k in range(j+1, n):

# find the maximum of the element-wise minimum of the three vectors

local_best = max_min_3(A[i,:], A[j,:], A[k,:])

# if local_best is lower than global_best, update global_best

if (local_best < global_best):

global_best = local_best

row_1 = i

row_2 = j

row_3 = k

save_rows[j,0]=row_1

save_rows[j,1]=row_2

save_rows[j,2]=row_3

global_best_Temp[j]=global_best

ind=np.argmin(global_best_Temp)

if (global_best_Temp[ind]

rows[0] = save_rows[ind,0]

rows[1] = save_rows[ind,1]

rows[2] = save_rows[ind,2]

all_global_best=global_best_Temp[ind]

return all_global_best, rows

Performance of multi-threaded version

n=100

your_version: 1.56s

compiled_version: 0.0078s (200x speedup)

n=150

your_version: 5.41s

compiled_version: 0.0282s (191x speedup)

n=500

your_version: 283s

compiled_version: 2.95s (96x speedup)

Edit

In a newer Numba Version (installed through the Anaconda Python Distribution) I have to manually install tbb to get a working parallelization.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值