python的内存调优_优化Python:大型数组,内存问题

I'm having a speed problem running a python / numypy code. I don't know how to make it faster, maybe someone else?

Assume there is a surface with two triangulation, one fine (..._fine) with M points, one coarse with N points. Also, there's data on the coarse mesh at every point (N floats). I'm trying to do the following:

For every point on the fine mesh, find the k closest points on coarse mesh and get mean value. Short: interpolate data from coarse to fine.

My code right now goes like that. With large data (in my case M = 2e6, N = 1e4) the code runs about 25 minutes, guess due to the explicit for loop not going into numpy. Any ideas how to solve that one with smart indexing? M x N arrays blowing the RAM..

import numpy as np

p_fine.shape => m x 3

p.shape => n x 3

data_fine = np.empty((m,))

for i, ps in enumerate(p_fine):

data_fine[i] = np.mean(data_coarse[np.argsort(np.linalg.norm(ps-p,axis=1))[:k]])

Cheers!

解决方案

First of all thanks for the detailed help.

First, Divakar, your solutions gave substantial speed-up. With my data, the code ran for just below 2 minutes depending a bit on the chunk size.

I also tried my way around sklearn and ended up with

def sklearnSearch_v3(p, p_fine, k):

neigh = NearestNeighbors(k)

neigh.fit(p)

return data_coarse[neigh.kneighbors(p_fine)[1]].mean(axis=1)

which ended up being quite fast, for my data sizes, I get the following

import numpy as np

from sklearn.neighbors import NearestNeighbors

m,n = 2000000,20000

p_fine = np.random.rand(m,3)

p = np.random.rand(n,3)

data_coarse = np.random.rand(n)

k = 3

yields

%timeit sklearv3(p, p_fine, k)

1 loop, best of 3: 7.46 s per loop

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值