python 计量,在python最快的成对距离度量

博客讨论了如何在NumPy、SciPy和Scikit-Learn中高效计算超过10,000个数值的一维数组之间的所有对角欧氏距离。作者比较了几种方法,包括使用广播和`scipy.spatial.distance.pdist`,并指出后者在效率上优于直接的NumPy操作,尤其是在处理大规模数据时。此外,注意到距离矩阵是对称的,这可能进一步提高计算速度。
摘要由CSDN通过智能技术生成

I have an 1D array of numbers, and want to calculate all pairwise euclidean distances. I have a method (thanks to SO) of doing this with broadcasting, but it's inefficient because it calculates each distance twice. And it doesn't scale well.

Here's an example that gives me what I want with an array of 1000 numbers.

import numpy as np

import random

r = np.array([random.randrange(1, 1000) for _ in range(0, 1000)])

dists = np.abs(r - r[:, None])

What's the fastest implementation in scipy/numpy/scikit-learn that I can use to do this, given that it has to scale to situations where the 1D array has >10k values.

Note: the matrix is symmetric, so I'm guessing that it's possible to get at least a 2x speedup by addressing that, I just don't know how.

解决方案

Neither of the other answers quite answered the question - 1 was in Cython, one was slower. But both provided very useful hints. Following up on them suggests that scipy.spatial.distance.pdist is the way to go.

Here's some code:

import numpy as np

import random

import sklearn.metrics.pairwise

import scipy.spatial.distance

r = np.array([random.randrange(1, 1000) for _ in range(0, 1000)])

c = r[:, None]

def option1(r):

dists = np.abs(r - r[:, None])

def option2(r):

dists = scipy.spatial.distance.pdist(r, 'cityblock')

def option3(r):

dists = sklearn.metrics.pairwise.manhattan_distances(r)

Timing with IPython:

In [36]: timeit option1(r)

100 loops, best of 3: 5.31 ms per loop

In [37]: timeit option2(c)

1000 loops, best of 3: 1.84 ms per loop

In [38]: timeit option3(c)

100 loops, best of 3: 11.5 ms per loop

I didn't try the Cython implementation (I can't use it for this project), but comparing my results to the other answer that did, it looks like scipy.spatial.distance.pdist is roughly a third slower than the Cython implementation (taking into account the different machines by benchmarking on the np.abs solution).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值