scipy中的spatial中有个distance包,可以很方便地用来计算距离
distance.pdist函数
计算一个二维数组中,每两行之间的距离,默认是欧氏距离
欧式距离计算公式,对一个n维度向量来说:
x
=
[
x
1
,
x
2
,
.
.
.
,
x
n
]
x=[x_1, x_2, ..., x_n]
x=[x1,x2,...,xn]
d
i
s
t
a
n
c
e
=
∑
i
=
1
n
x
i
2
=
x
1
2
+
x
2
2
+
.
.
.
+
x
n
2
distance = \sqrt{\sum_{i=1}^n x_i^2} =\sqrt{x_1^2 + x_2^2 + ...+x_n^2}
distance=i=1∑nxi2=x12+x22+...+xn2
def pdist(X, metric='euclidean', *, out=None, **kwargs):
"""
Pairwise distances between observations in n-dimensional space.
# n维空间中观测值之间的成对距离
示例:
import numpy as np
from scipy.spatial import distance
coords = [(1, 1),
(2, 2),
(3, 3),
(4, 4)]
coords = np.array(coords)
# 计算一个二维数组中,每两行之间的距离,默认是欧氏距离
distance.pdist(coords)
# 等价于如下语句:
for i in range(len(coords)):
for j in range(i+1, len(coords)):
print(np.linalg.norm(coords[i]-coords[j]))
结果如下:
array([1.41421356, 2.82842712, 4.24264069, 1.41421356, 2.82842712,1.41421356])
排布顺序为数组中第1行与第2行,第1行与第3行,…,第1行与最后1行之间的欧式距离,
然后是第2行与第3行,第2行与第4行,……,第2行与最后一行之间的欧式距离
……
distance.cdist函数
计算2个二维数组中,一个数组中的所有行向量与另一个数组中所有行向量之间的距离,默认是欧氏距离
import numpy as np
from scipy.spatial import distance
arr1 = np.random.randint(0, 10, ((5, 3)))
arr2 = np.random.randint(0, 10, ((5, 3)))
distance.cdist(arr1, arr2)
# 等价与如下计算过程
print(np.linalg.norm(arr1[0] - arr2[0]))
print(np.linalg.norm(arr1[0] - arr2[1]))
print(np.linalg.norm(arr1[0] - arr2[2]))
结果如下:
array([[ 9. , 12.4498996 , 4.69041576, 5.09901951, 5.09901951],
[ 8.48528137, 7.61577311, 3. , 6.08276253, 3.31662479],
[ 6.32455532, 4.24264069, 4.12310563, 6.70820393, 4.35889894],
[ 7.21110255, 7.07106781, 5.74456265, 8.77496439, 5.19615242],
[ 1.41421356, 6.164414 , 5.38516481, 4.12310563, 6.40312424]])
排布顺序为第一个数组中第1行与第二个数组中每一行之间的欧式距离,
然后是第一个数组中第2行与第二个数组中每一行之间的欧式距离,
……