Python欧式距离矩阵算法
根据欧式距离矩阵(Euclidean Distance Matrix, EDM) 的定义,它是一个n × \times ×n的矩阵,表示欧式空间中一组n个点的空间距离,每个元素为点之间的距离平方。假设 { x i ∈ R d : i ∈ { 1 , . . . , n } } \{x_i \in \mathbb{R}^d : i \in \{1,...,n\}\} {xi∈Rd:i∈{1,...,n}}
X = ( ∣ ∣ ∣ x 1 x 2 . . . x n ∣ ∣ ∣ ) d × n X =\begin{pmatrix} | & | & & |\\ x_1 & x_2 &... & x_n\\ | & | & & |\\ \end{pmatrix}_{d\times n} X=⎝⎛∣x1∣∣x2∣...∣xn∣⎠⎞d×n
距离矩阵,
D
i
j
=
∣
∣
x
i
−
x
j
∣
∣
2
2
D_{ij} = ||x_i-x_j||^2_2
Dij=∣∣xi−xj∣∣22
=
(
x
i
−
x
j
)
T
(
x
i
−
x
j
)
=(x_i - x_j)^T(x_i - x_j)
=(xi−xj)T(xi−xj)
=
x
i
T
x
i
+
x
j
x
j
T
−
2
x
i
T
x
j
=x_i^Tx_i + x_jx_j^T-2x_i^Tx_j
=xiTxi+xjxjT−2xiTxj
=
(
0
d
12
2
d
13
2
…
d
1
n
2
d
21
2
0
d
23
2
…
d
2
n
2
d
31
2
d
32
2
0
…
d
3
n
2
⋮
⋮
⋮
⋱
⋮
d
n
1
2
d
n
2
2
d
1
n
2
…
0
)
=\begin{pmatrix} 0 &d_{12}^2 & d_{13}^2 & \dots & d_{1n}^2\\ d_{21}^2&0 & d_{23}^2 & \dots & d_{2n}^2\\ d_{31}^2 & d_{32}^2 & 0&\dots & d_{3n}^2\\ \vdots&\vdots&\vdots &\ddots & \vdots\\ d_{n1}^2 & d_{n2}^2 & d_{1n}^2 & \dots& 0\\ \end{pmatrix}
=⎝⎜⎜⎜⎜⎜⎛0d212d312⋮dn12d1220d322⋮dn22d132d2320⋮d1n2………⋱…d1n2d2n2d3n2⋮0⎠⎟⎟⎟⎟⎟⎞
欧式距离矩阵,edm(X)定义为
e
d
m
(
X
)
=
d
e
f
d
i
a
g
(
G
)
+
d
i
a
g
(
G
)
T
−
2
G
edm(X) \overset{\mathrm{def}}{=} diag(G) + diag(G)^T - 2G
edm(X)=defdiag(G)+diag(G)T−2G
其中,格拉姆矩阵(Gram matrix)
G
=
X
T
X
G=X^TX
G=XTX,
d
i
a
g
(
G
)
diag(G)
diag(G)为含有对角线元素的列向量。
在计算两个矩阵
X
d
×
m
X_{d\times m}
Xd×m和
Y
d
×
n
Y_{d\times n}
Yd×n之间的欧式距离矩阵时,可从上式推导出两个矩阵的欧式距离矩阵公式:
G
X
=
X
T
X
G_X = X^TX
GX=XTX
G
Y
=
Y
T
Y
G_Y = Y^TY
GY=YTY
e
d
m
(
X
)
=
d
i
a
g
(
G
X
)
+
d
i
a
g
(
G
Y
)
T
−
2
X
T
Y
edm(X) = diag(G_X) + diag(G_Y)^T - 2X^TY
edm(X)=diag(GX)+diag(GY)T−2XTY
=
(
x
11
x
22
⋮
x
m
m
)
m
×
1
+
(
y
11
y
22
⋮
y
n
n
)
n
×
1
T
−
2
X
T
Y
=\begin{pmatrix} x_{11}\\ x_{22}\\ \vdots\\ x_{mm}\\ \end{pmatrix}_{m\times1} + \begin{pmatrix} y_{11}\\ y_{22}\\ \vdots\\ y_{nn}\\ \end{pmatrix}^T_{n\times1} - 2X^TY
=⎝⎜⎜⎜⎛x11x22⋮xmm⎠⎟⎟⎟⎞m×1+⎝⎜⎜⎜⎛y11y22⋮ynn⎠⎟⎟⎟⎞n×1T−2XTY
这里的 G X G_X GX和 G Y G_Y GY的对角线元素会经过numpy数组的广播机制(broadcasting),分别广播成 n × n n\times n n×n和 m × m m\times m m×m的矩阵,进行相加。EDM矩阵的大小应为 m × n m\times n m×n。
例子:
X
=
(
1
2
3
2
3
4
0
1
2
)
3
×
3
T
;
Y
=
(
1
2
3
4
3
2
)
2
×
3
T
X = \begin{pmatrix} 1&2&3\\ 2&3&4\\ 0&1&2\\ \end{pmatrix}^T_{3\times 3}; Y = \begin{pmatrix} 1&2&3\\ 4&3&2\\ \end{pmatrix}^T_{2\times 3}
X=⎝⎛120231342⎠⎞3×3T;Y=(142332)2×3T
Python代码:
import numpy as np
def euclidean_dist(x, y=None):
"""
Compute all pairwise distances between vectors in X and Y matrices.
:param x: numpy array, with size of (d, m)
:param y: numpy array, with size of (d, n)
:return: EDM: numpy array, with size of (m,n).
Each entry in EDM_{i,j} represents the distance between row i in x and row j in y.
"""
if y is None:
y = x
# calculate Gram matrices
G_x = np.matmul(x.T, x)
G_y = np.matmul(y.T, y)
# convert diagonal matrix into column vector
diag_Gx = np.reshape(np.diag(G_x), (-1, 1))
diag_Gy = np.reshape(np.diag(G_y), (-1, 1))
# Compute Euclidean distance matrix
EDM = diag_Gx + diag_Gy.T - 2*np.matmul(x.T, y) # broadcasting
print('This is matrix EDM: ')
print(EDM)
return EDM
x = np.array([[1.0, 2.0, 3.0], [2.0, 3.0, 4.0], [0.0, 1.0, 2.0]]).T
y = np.array([[1.0, 2.0, 3.0], [4.0, 3.0, 2.0]]).T
euclidean_dist(x,y)
输出: E D M = ( 0 11 3 8 3 20 ) 3 × 2 EDM = \begin{pmatrix} 0&11\\ 3&8\\ 3&20\\ \end{pmatrix}_{3\times 2} EDM=⎝⎛03311820⎠⎞3×2
备注: 欧式距离矩阵中的元素为距离的平方,若需要距离,可用np.sqrt(EDM)开方。
参考文献:
- https://www.dabblingbadger.com/blog/2020/2/27/implementing-euclidean-distance-matrix-calculations-from-scratch-in-python
- Dokmanic, Ivan, et al. “Euclidean distance matrices: essential theory, algorithms, and applications.” IEEE Signal Processing Magazine 32.6 (2015): 12-30.
- Albanie, Samuel. “Euclidean Distance Matrix Trick.” Retrieved from Visual Geometry Group, University of Oxford (2019).
- https://ccrma.stanford.edu/~dattorro/EDM.pdf