多维缩放
多维缩放(Multiple Dimensional Scaling,简称 MDS)是一种经典的降维方法,要求原始空间中样本之间的距离在低维空间中得以保持。
推导过程
假定 n 个样本在原始空间的距离矩阵为 D ∈ R n × n D \in R^{n \times n} D∈Rn×n,其中第 i 行 j 列的元素 d i s t i j dist_{ij} distij 为样本 x i x_i xi 到 x j x_j xj 的距离。我们的目标是获得样本在 d’ 维空间的表示 Z ∈ R d ′ × n , d ′ ≤ d Z \in R^{d' \times n}, \ d' \leq d Z∈Rd′×n, d′≤d,且任意两个样本在 d’ 维空间中的欧式距离等于原始空间中的距离,即 ∣ ∣ z i − z j ∣ ∣ = d i s t i j ||z_i - z_j|| = dist_{ij} ∣∣zi−zj∣∣=distij。
按照上面所说的内容,我们先创建 3 个样本,分别为 (1, 1, 1)、(2, 2, 2) 以及 (3, 3, 3),这 3 个样本在原始空间的距离矩阵为
样本编号 | 1 | 2 | 3 |
---|---|---|---|
1 | 0 | 3 \sqrt{3} 3 | 2 3 2\sqrt{3} 23 |
2 | 3 \sqrt{3} 3 | 0 | 3 \sqrt{3} 3 |
3 | 2 3 2\sqrt{3} 23 | 3 \sqrt{3} 3 | 0 |
我们现在的目标是获得样本在 2 维空间的表示,且任意两个样本在 2 维空间中的欧式距离等于原始空间(3 维)中的距离。
令
B
=
Z
T
Z
∈
R
n
×
n
B = Z^TZ \in R^{n \times n}
B=ZTZ∈Rn×n,其中 B 为降维后样本的内积矩阵,
b
i
j
=
z
i
T
z
j
b_{ij} = z_i^Tz_j
bij=ziTzj,有
d
i
s
t
i
j
=
∣
∣
z
i
−
z
j
∣
∣
d
i
s
t
i
j
2
=
∣
∣
z
i
∣
∣
2
+
∣
∣
z
j
∣
∣
2
−
2
z
i
T
z
j
=
b
i
i
+
b
j
j
−
2
b
i
j
(
1
)
dist_{ij} = ||z_i - z_j|| \\ dist_{ij}^2 = ||z_i||^2 + ||z_j||^2 - 2z_i^Tz_j = b_{ii} + b_{jj} - 2b_{ij} \quad (1)
distij=∣∣zi−zj∣∣distij2=∣∣zi∣∣2+∣∣zj∣∣2−2ziTzj=bii+bjj−2bij(1)
为便于讨论,令降维后的样本 Z 被中心化,即
∑
i
=
1
n
z
i
=
0
\sum_{i=1}^n z_i = 0
∑i=1nzi=0。显然,矩阵 B 的行与列之和均为零,即
∑
i
=
1
n
b
i
j
=
∑
j
=
1
n
b
i
j
=
0
\sum_{i=1}^n b_{ij} = \sum_{j=1}^n b_{ij} = 0
∑i=1nbij=∑j=1nbij=0。易知
∑
i
=
1
n
d
i
s
t
i
j
2
=
t
r
(
B
)
+
n
b
j
j
(
2
)
∑
j
=
1
n
d
i
s
t
i
j
2
=
t
r
(
B
)
+
n
b
i
i
(
3
)
∑
i
=
1
n
∑
j
=
1
n
d
i
s
t
i
j
2
=
2
n
t
r
(
B
)
(
4
)
\sum_{i=1}^n dist_{ij}^2 = tr(B) + nb_{jj} \quad (2) \\ \sum_{j=1}^n dist_{ij}^2 = tr(B) + nb_{ii} \quad (3) \\ \sum_{i=1}^n\sum_{j=1}^n dist_{ij}^2 = 2n \ tr(B) \quad (4) \\
i=1∑ndistij2=tr(B)+nbjj(2)j=1∑ndistij2=tr(B)+nbii(3)i=1∑nj=1∑ndistij2=2n tr(B)(4)
其中 tr(.) 表示矩阵的迹(trace),
t
r
(
B
)
=
∑
i
=
1
n
∣
∣
z
i
∣
∣
2
tr(B) = \sum_{i=1}^n ||z_i||^2
tr(B)=∑i=1n∣∣zi∣∣2,矩阵 B 的主对角上元素之和。
【上述公式推导过程】:
d
i
s
t
i
j
2
=
∣
∣
z
i
∣
∣
2
+
∣
∣
z
j
∣
∣
2
−
2
z
i
T
z
j
=
b
i
i
+
b
j
j
−
2
b
i
j
d
i
s
t
1
j
2
=
b
11
+
b
j
j
−
2
b
1
j
d
i
s
t
2
j
2
=
b
22
+
b
j
j
−
2
b
2
j
⋯
∑
i
=
1
n
d
i
s
t
i
j
2
=
b
11
+
⋯
+
b
n
n
+
n
b
j
j
−
2
(
b
1
j
+
⋯
+
b
n
j
)
因:去中心化
∑
i
=
1
n
b
i
j
=
0
,
t
r
(
B
)
=
b
11
+
⋯
+
b
n
n
∑
i
=
1
n
d
i
s
t
i
j
2
=
t
r
(
B
)
+
n
b
j
j
同理可得:
∑
j
=
1
n
d
i
s
t
i
j
2
=
t
r
(
B
)
+
n
b
i
i
当 i = 1 时:
∑
j
=
1
n
d
i
s
t
1
j
2
=
t
r
(
B
)
+
n
b
11
当 i = n 时:
∑
j
=
1
n
d
i
s
t
n
j
2
=
t
r
(
B
)
+
n
b
n
n
∑
i
=
1
n
∑
j
=
1
n
d
i
s
t
1
j
2
=
n
t
r
(
B
)
+
n
(
b
11
+
⋯
+
b
n
n
)
=
n
t
r
(
B
)
+
n
t
r
(
B
)
=
2
n
t
r
(
B
)
dist_{ij}^2 = ||z_i||^2 + ||z_j||^2 - 2z_i^Tz_j = b_{ii} + b_{jj} - 2b_{ij} \\ dist_{1j}^2 = b_{11} + b_{jj} - 2b_{1j} \quad dist_{2j}^2 = b_{22} + b_{jj} - 2b_{2j} \\ \cdots \\ \sum_{i=1}^n dist_{ij}^2 = b_{11} + \cdots + b_{nn} + nb_{jj} - 2(b_{1j} + \cdots + b_{nj}) \\ \text{因:去中心化} \quad \sum_{i=1}^n b_{ij} = 0, \quad tr(B) = b_{11} + \cdots + b_{nn} \\ \sum_{i=1}^n dist_{ij}^2 = tr(B) + nb_{jj} \\ \text{同理可得:} \sum_{j=1}^n dist_{ij}^2 = tr(B) + nb_{ii} \\ \text{当 i = 1 时:} \sum_{j=1}^n dist_{1j}^2 = tr(B) + nb_{11} \\ \text{当 i = n 时:} \sum_{j=1}^n dist_{nj}^2 = tr(B) + nb_{nn} \\ \sum_{i=1}^n\sum_{j=1}^n dist_{1j}^2 = ntr(B) + n(b_{11} + \cdots + b_{nn}) = n \ tr(B) + n \ tr(B) = 2n \ tr(B)
distij2=∣∣zi∣∣2+∣∣zj∣∣2−2ziTzj=bii+bjj−2bijdist1j2=b11+bjj−2b1jdist2j2=b22+bjj−2b2j⋯i=1∑ndistij2=b11+⋯+bnn+nbjj−2(b1j+⋯+bnj)因:去中心化i=1∑nbij=0,tr(B)=b11+⋯+bnni=1∑ndistij2=tr(B)+nbjj同理可得:j=1∑ndistij2=tr(B)+nbii当 i = 1 时:j=1∑ndist1j2=tr(B)+nb11当 i = n 时:j=1∑ndistnj2=tr(B)+nbnni=1∑nj=1∑ndist1j2=ntr(B)+n(b11+⋯+bnn)=n tr(B)+n tr(B)=2n tr(B)
接着令:
d
i
s
t
i
⋅
2
=
1
n
∑
j
=
1
n
d
i
s
t
i
j
2
(
5
)
d
i
s
t
⋅
j
2
=
1
n
∑
i
=
1
n
d
i
s
t
i
j
2
(
6
)
d
i
s
t
⋅
⋅
2
=
1
n
2
∑
i
=
1
n
∑
j
=
1
n
d
i
s
t
i
j
2
(
7
)
dist_{i\cdot}^2 = \frac{1}{n}\sum_{j=1}^n dist_{ij}^2 \quad (5) \\ dist_{\cdot j}^2 = \frac{1}{n}\sum_{i=1}^n dist_{ij}^2 \quad (6) \\ dist_{\cdot\cdot}^2 = \frac{1}{n^2}\sum_{i=1}^n\sum_{j=1}^n dist_{ij}^2 \quad (7)
disti⋅2=n1j=1∑ndistij2(5)dist⋅j2=n1i=1∑ndistij2(6)dist⋅⋅2=n21i=1∑nj=1∑ndistij2(7)
由上述式子可得
b
i
j
=
−
1
2
(
d
i
s
t
i
j
2
−
d
i
s
t
i
⋅
2
−
d
i
s
t
⋅
j
2
+
d
i
s
t
⋅
⋅
2
)
(
8
)
b_{ij} = -\frac{1}{2}(dist_{ij}^2 - dist_{i\cdot}^2 - dist_{\cdot j}^2 + dist_{\cdot\cdot}^2) \quad (8)
bij=−21(distij2−disti⋅2−dist⋅j2+dist⋅⋅2)(8)
【
b
i
j
b_{ij}
bij 计算公式推导过程】:
- 根据式子(1)可知
b i j = − 1 2 ( d i s t i j 2 − b i i − b j j ) ( 9 ) b_{ij} = -\frac{1}{2}(dist_{ij}^2 - b_{ii} - b_{jj}) \quad (9) bij=−21(distij2−bii−bjj)(9) - 再根据式子(2)~(4)可分别得出
b
i
i
b_{ii}
bii、
b
j
j
b_{jj}
bjj 以及 tr(B) 的表达式
b i i = 1 n ∑ j = 1 n d i s t i j 2 − 1 n t r ( B ) b j j = 1 n ∑ i = 1 n d i s t i j 2 − 1 n t r ( B ) t r ( B ) = 1 2 n ∑ i = 1 n ∑ j = 1 n d i s t i j 2 b_{ii} = \frac{1}{n}\sum_{j=1}^ndist_{ij}^2 - \frac{1}{n}tr(B) \\ b_{jj} = \frac{1}{n}\sum_{i=1}^ndist_{ij}^2 - \frac{1}{n}tr(B) \\ tr(B) = \frac{1}{2n}\sum_{i=1}^n\sum_{j=1}^n dist_{ij}^2 bii=n1j=1∑ndistij2−n1tr(B)bjj=n1i=1∑ndistij2−n1tr(B)tr(B)=2n1i=1∑nj=1∑ndistij2 - 通过式子(5)~(7)可进一步化简
b i i = d i s t i ⋅ 2 − 1 n t r ( B ) b j j = d i s t ⋅ j 2 − 1 n t r ( B ) t r ( B ) = n 2 d i s t ⋅ ⋅ 2 b_{ii} = dist_{i\cdot}^2 - \frac{1}{n}tr(B) \\ b_{jj} = dist_{\cdot j}^2 - \frac{1}{n}tr(B) \\ tr(B) = \frac{n}{2}dist_{\cdot\cdot}^2 bii=disti⋅2−n1tr(B)bjj=dist⋅j2−n1tr(B)tr(B)=2ndist⋅⋅2 - 最后将
b
i
i
b_{ii}
bii、
b
j
j
b_{jj}
bjj 以及 tr(B) 代入式子(9)中可得
b i j = − 1 2 ( d i s t i j 2 − d i s t i ⋅ 2 − d i s t ⋅ j 2 + d i s t ⋅ ⋅ 2 ) b_{ij} = -\frac{1}{2}(dist_{ij}^2 - dist_{i\cdot}^2 - dist_{\cdot j}^2 + dist_{\cdot\cdot}^2) bij=−21(distij2−disti⋅2−dist⋅j2+dist⋅⋅2)
由此即可通过降维前后保持不变的距离矩阵 D 求取内积矩阵 B。
b 11 = − 1 2 ( d i s t 11 2 − d i s t 1 ⋅ 2 − d i s t ⋅ 1 2 + d i s t ⋅ ⋅ 2 ) = − 1 2 ( 0 − 5 − 5 + 4 ) = 3 b 12 = 0 b 13 = − 3 b 21 = 0 b 22 = 0 b 23 = 0 b 31 = − 3 b 32 = 0 b 33 = 3 b_{11} = -\frac{1}{2}(dist_{11}^2 - dist_{1\cdot}^2 - dist_{\cdot 1}^2 + dist_{\cdot\cdot}^2) = -\frac{1}{2}(0 - 5 - 5 + 4) = 3 \\ b_{12} = 0 \quad b_{13} = -3 \quad b_{21} = 0 \quad b_{22} = 0 \\ b_{23} = 0 \quad b_{31} = -3 \quad b_{32} = 0 \quad b_{33} = 3 b11=−21(dist112−dist1⋅2−dist⋅12+dist⋅⋅2)=−21(0−5−5+4)=3b12=0b13=−3b21=0b22=0b23=0b31=−3b32=0b33=3
求得内积矩阵 B 为
样本编号 | 1 | 2 | 3 |
---|---|---|---|
1 | 3 | 0 | -3 |
2 | 0 | 0 | 0 |
3 | -3 | 0 | 3 |
对矩阵 B 做特征值分解,
B
=
V
Λ
V
T
B = V\Lambda V^T
B=VΛVT,其中
Λ
=
d
i
a
g
(
λ
1
,
λ
2
,
⋯
 
,
λ
d
)
\Lambda = diag(\lambda_1, \lambda_2, \cdots, \lambda_d)
Λ=diag(λ1,λ2,⋯,λd) 为特征值构成的对角矩阵,
λ
1
≥
λ
2
≥
⋯
≥
λ
d
\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_d
λ1≥λ2≥⋯≥λd,V 为特征向量矩阵。假定其中有 d* 个非零特征值,它们构成对角矩阵
Λ
∗
=
d
i
a
g
(
λ
1
,
λ
2
,
⋯
 
,
λ
d
∗
)
\Lambda_* = diag(\lambda_1, \lambda_2, \cdots, \lambda_{d*})
Λ∗=diag(λ1,λ2,⋯,λd∗),令
V
∗
V_*
V∗ 表示相应的特征向量矩阵,则 Z 可表达为
Z
=
A
∗
1
/
2
V
∗
T
∈
R
d
∗
×
n
Z = A_*^{1/2}V_*^T \in R^{d* \times n}
Z=A∗1/2V∗T∈Rd∗×n
在现实应用中为了有效降维,往往仅需降维后的距离与原始空间中的距离尽可能接近,而不必严格相等。此时可取 d’ << d 个最大特征值构成对角矩阵
Λ
^
=
d
i
a
g
(
λ
1
,
λ
2
,
⋯
 
,
λ
d
′
)
\hat{\Lambda} = diag(\lambda_1, \lambda_2, \cdots, \lambda_{d'})
Λ^=diag(λ1,λ2,⋯,λd′),令
V
^
\hat{V}
V^ 表示相应的特征向量矩阵,则 Z 可表达为
Z
=
Λ
^
1
/
2
V
^
T
∈
R
∗
d
′
×
n
Z = \hat{\Lambda}^{1/2}\hat{V}^T \in R*{d' \times n}
Z=Λ^1/2V^T∈R∗d′×n
【算法描述】:
- 输入:距离矩阵 D ∈ R n × n D \in R^{n \times n} D∈Rn×n;低维空间维数 d’。
- 过程:
- 根据式子(5)~(7)计算 d i s t i ⋅ 2 , d i s t ⋅ j 2 , d i s t ⋅ ⋅ 2 dist_{i\cdot}^2, dist_{\cdot j}^2, dist_{\cdot\cdot}^2 disti⋅2,dist⋅j2,dist⋅⋅2;
- 根据式子(8)计算矩阵 B;
- 对矩阵 B 做特征值分解;
- 取 Λ ^ \hat{\Lambda} Λ^ 为 d’ 个最大特征值所构成的对角矩阵, V ^ \hat{V} V^ 为相应的特征向量矩阵。
- 输出:矩阵 Z T = V ^ Λ ^ 1 / 2 ∈ R n × d ′ Z^T = \hat{V}\hat{\Lambda}^{1/2} \in R^{n \times d'} ZT=V^Λ^1/2∈Rn×d′,每行是一个样本的低维坐标。
继续刚才的例子,我们已经求得矩阵 B,接着我们对矩阵 B 做特征值分解。
B
=
[
3
−
λ
0
−
3
0
−
λ
0
−
3
0
3
−
λ
]
B = \begin{bmatrix} 3 - \lambda && 0 && -3 \\ 0 && -\lambda && 0 \\ -3 && 0 && 3 - \lambda \\ \end{bmatrix}
B=⎣⎡3−λ0−30−λ0−303−λ⎦⎤
可求得
λ
2
(
6
−
λ
)
=
0
\lambda^2(6 - \lambda) = 0
λ2(6−λ)=0,
λ
1
=
6
,
λ
2
=
λ
3
=
0
\lambda_1 = 6, \lambda_2 = \lambda_3 = 0
λ1=6,λ2=λ3=0。此时,
Λ
=
d
i
a
g
(
6
,
0
,
0
)
,
Λ
∗
=
d
i
a
g
(
6
)
,
V
∗
T
=
[
1
,
0
,
−
1
]
\Lambda = diag(6, 0, 0), \ \Lambda_* = diag(6), \ V_*^T = [1, 0, -1]
Λ=diag(6,0,0), Λ∗=diag(6), V∗T=[1,0,−1]。基于上述求得的内容,Z 可表达为
Z
=
[
6
]
[
1
,
0
,
−
1
]
=
[
6
,
0
,
−
6
]
Z = [\sqrt{6}][1, 0, -1] = [\sqrt{6}, 0, -\sqrt{6}]
Z=[6][1,0,−1]=[6,0,−6]。
Z
T
=
[
6
0
−
6
]
Z^T = \begin{bmatrix} \sqrt{6} \\ 0 \\ -\sqrt{6} \end{bmatrix}
ZT=⎣⎡60−6⎦⎤
可以看到,我们将 3 个三维的特征向量转换为 3 个一维的特征向量,从而实现了降维。
代码实现
【步骤】:
- 计算原始空间的距离矩阵
from scipy.spatial.distance import cdist
original_matrix = cdist(dataset, dataset, 'euclidean')
可直接引入 scipy.spatial.distance 包的 cdist 函数实现,该处代码使用的是欧式距离。
- 依次计算 b i j b_{ij} bij 所对应的 d i s t i j 2 , d i s t i ⋅ 2 , d i s t ⋅ j 2 , d i s t ⋅ ⋅ 2 dist_{ij}^2, \ dist_{i\cdot}^2, \ dist_{\cdot j}^2, \ dist_{\cdot\cdot}^2 distij2, disti⋅2, dist⋅j2, dist⋅⋅2,以及 b i j b_{ij} bij,并获得矩阵 B
def _cal_dist(matrix, row, col):
if row == '*' and col == '*':
return np.sum(matrix**2)
elif row == '*' and col != '*':
return np.sum(matrix[:, col]**2)
elif row != '*' and col == '*':
return np.sum(matrix[row, :]**2)
else:
return matrix[row, col]**2
# 计算 dist_i.、dist_.j 以及 dist_..
dist_matrix = np.matrix(np.zeros(original_matrix.shape))
rows, cols = dist_matrix.shape
# 获得矩阵 B
for row in range(rows):
for col in range(cols):
distij = self._cal_dist(original_matrix, row, col)
dist_i = self._cal_dist(original_matrix, row, '*') / length
dist_j = self._cal_dist(original_matrix, '*', col) / length
dist_all = self._cal_dist(original_matrix, '*', '*') / (length**2)
dist_j, dist_all)
dist_matrix[row, col] = -(distij - dist_i - dist_j + dist_all) / 2
- 计算矩阵 B 的特征值以及特征向量
# 计算特征值和特征向量
feature_values, feature_vectors = np.linalg.eig(dist_matrix)
- 从已有的特征向量中选择非零特征值对应的特征向量,构造 V*
select_feature_values = []
for i in range(len(feature_values) - 1, -1, -1):
if np.round(feature_values[i]) != 0:
select_feature_values.append(feature_values[i])
else:
feature_vectors = np.delete(feature_vectors, i, axis=1)
eye_matrix = np.eye(len(select_feature_values))
for i in range(len(select_feature_values)):
eye_matrix[i, i] = select_feature_values[i]
- 返回 Z T = V ^ Λ ^ 1 / 2 Z^T = \hat{V}\hat{\Lambda}^{1/2} ZT=V^Λ^1/2
return np.dot(feature_vectors, eye_matrix**0.5)
【完整代码】:
import numpy as np
from scipy.spatial.distance import cdist
class MDS:
def __init__(self):
from scipy.spatial.distance import cdist
def _cal_dist(self, matrix, row, col):
if row == '*' and col == '*':
return np.sum(matrix**2)
elif row == '*' and col != '*':
return np.sum(matrix[:, col]**2)
elif row != '*' and col == '*':
return np.sum(matrix[row, :]**2)
else:
return matrix[row, col]**2
def fit(self, dataset):
length = dataset.shape[0]
# 计算原始空间的距离矩阵
original_matrix = cdist(dataset, dataset, 'euclidean')
# 计算 dist_i.、dist_.j 以及 dist_..
dist_matrix = np.matrix(np.zeros(original_matrix.shape))
rows, cols = dist_matrix.shape
# 获得矩阵 B
for row in range(rows):
for col in range(cols):
distij = self._cal_dist(original_matrix, row, col)
dist_i = self._cal_dist(original_matrix, row, '*') / length
dist_j = self._cal_dist(original_matrix, '*', col) / length
dist_all = self._cal_dist(original_matrix, '*', '*') / (length**2)
dist_matrix[row, col] = -(distij - dist_i - dist_j + dist_all) / 2
# 计算特征值和特征向量
feature_values, feature_vectors = np.linalg.eig(dist_matrix)
select_feature_values = []
for i in range(len(feature_values) - 1, -1, -1):
if np.round(feature_values[i]) != 0:
select_feature_values.append(feature_values[i])
else:
feature_vectors = np.delete(feature_vectors, i, axis=1)
eye_matrix = np.eye(len(select_feature_values))
for i in range(len(select_feature_values)):
eye_matrix[i, i] = select_feature_values[i]
return np.dot(feature_vectors, eye_matrix**0.5)
mds = MDS()
mds.fit(np.array([
[1, 1, 1],
[2, 2, 2],
[3, 3, 3]
]))
参考
- 《机器学习》
- numpy求解矩阵的特征值和特征向量:https://blog.csdn.net/appleyuchi/article/details/80117398
- numpy.eye() 生成对角矩阵:https://blog.csdn.net/chixujohnny/article/details/51011931