多维缩放 MDS

多维缩放

多维缩放(Multiple Dimensional Scaling,简称 MDS)是一种经典的降维方法,要求原始空间中样本之间的距离在低维空间中得以保持。

推导过程

假定 n 个样本在原始空间的距离矩阵为 D ∈ R n × n D \in R^{n \times n} DRn×n,其中第 i 行 j 列的元素 d i s t i j dist_{ij} distij 为样本 x i x_i xi x j x_j xj 的距离。我们的目标是获得样本在 d’ 维空间的表示 Z ∈ R d ′ × n ,   d ′ ≤ d Z \in R^{d' \times n}, \ d' \leq d ZRd×n, dd,且任意两个样本在 d’ 维空间中的欧式距离等于原始空间中的距离,即 ∣ ∣ z i − z j ∣ ∣ = d i s t i j ||z_i - z_j|| = dist_{ij} zizj=distij

按照上面所说的内容,我们先创建 3 个样本,分别为 (1, 1, 1)、(2, 2, 2) 以及 (3, 3, 3),这 3 个样本在原始空间的距离矩阵为

样本编号123
10 3 \sqrt{3} 3 2 3 2\sqrt{3} 23
2 3 \sqrt{3} 3 0 3 \sqrt{3} 3
3 2 3 2\sqrt{3} 23 3 \sqrt{3} 3 0

我们现在的目标是获得样本在 2 维空间的表示,且任意两个样本在 2 维空间中的欧式距离等于原始空间(3 维)中的距离。

B = Z T Z ∈ R n × n B = Z^TZ \in R^{n \times n} B=ZTZRn×n,其中 B 为降维后样本的内积矩阵, b i j = z i T z j b_{ij} = z_i^Tz_j bij=ziTzj,有
d i s t i j = ∣ ∣ z i − z j ∣ ∣ d i s t i j 2 = ∣ ∣ z i ∣ ∣ 2 + ∣ ∣ z j ∣ ∣ 2 − 2 z i T z j = b i i + b j j − 2 b i j ( 1 ) dist_{ij} = ||z_i - z_j|| \\ dist_{ij}^2 = ||z_i||^2 + ||z_j||^2 - 2z_i^Tz_j = b_{ii} + b_{jj} - 2b_{ij} \quad (1) distij=zizjdistij2=zi2+zj22ziTzj=bii+bjj2bij(1)
为便于讨论,令降维后的样本 Z 被中心化,即 ∑ i = 1 n z i = 0 \sum_{i=1}^n z_i = 0 i=1nzi=0。显然,矩阵 B 的行与列之和均为零,即 ∑ i = 1 n b i j = ∑ j = 1 n b i j = 0 \sum_{i=1}^n b_{ij} = \sum_{j=1}^n b_{ij} = 0 i=1nbij=j=1nbij=0。易知
∑ i = 1 n d i s t i j 2 = t r ( B ) + n b j j ( 2 ) ∑ j = 1 n d i s t i j 2 = t r ( B ) + n b i i ( 3 ) ∑ i = 1 n ∑ j = 1 n d i s t i j 2 = 2 n   t r ( B ) ( 4 ) \sum_{i=1}^n dist_{ij}^2 = tr(B) + nb_{jj} \quad (2) \\ \sum_{j=1}^n dist_{ij}^2 = tr(B) + nb_{ii} \quad (3) \\ \sum_{i=1}^n\sum_{j=1}^n dist_{ij}^2 = 2n \ tr(B) \quad (4) \\ i=1ndistij2=tr(B)+nbjj(2)j=1ndistij2=tr(B)+nbii(3)i=1nj=1ndistij2=2n tr(B)(4)
其中 tr(.) 表示矩阵的迹(trace), t r ( B ) = ∑ i = 1 n ∣ ∣ z i ∣ ∣ 2 tr(B) = \sum_{i=1}^n ||z_i||^2 tr(B)=i=1nzi2,矩阵 B 的主对角上元素之和。

【上述公式推导过程】:
d i s t i j 2 = ∣ ∣ z i ∣ ∣ 2 + ∣ ∣ z j ∣ ∣ 2 − 2 z i T z j = b i i + b j j − 2 b i j d i s t 1 j 2 = b 11 + b j j − 2 b 1 j d i s t 2 j 2 = b 22 + b j j − 2 b 2 j ⋯ ∑ i = 1 n d i s t i j 2 = b 11 + ⋯ + b n n + n b j j − 2 ( b 1 j + ⋯ + b n j ) 因:去中心化 ∑ i = 1 n b i j = 0 , t r ( B ) = b 11 + ⋯ + b n n ∑ i = 1 n d i s t i j 2 = t r ( B ) + n b j j 同理可得: ∑ j = 1 n d i s t i j 2 = t r ( B ) + n b i i 当 i = 1 时: ∑ j = 1 n d i s t 1 j 2 = t r ( B ) + n b 11 当 i = n 时: ∑ j = 1 n d i s t n j 2 = t r ( B ) + n b n n ∑ i = 1 n ∑ j = 1 n d i s t 1 j 2 = n t r ( B ) + n ( b 11 + ⋯ + b n n ) = n   t r ( B ) + n   t r ( B ) = 2 n   t r ( B ) dist_{ij}^2 = ||z_i||^2 + ||z_j||^2 - 2z_i^Tz_j = b_{ii} + b_{jj} - 2b_{ij} \\ dist_{1j}^2 = b_{11} + b_{jj} - 2b_{1j} \quad dist_{2j}^2 = b_{22} + b_{jj} - 2b_{2j} \\ \cdots \\ \sum_{i=1}^n dist_{ij}^2 = b_{11} + \cdots + b_{nn} + nb_{jj} - 2(b_{1j} + \cdots + b_{nj}) \\ \text{因:去中心化} \quad \sum_{i=1}^n b_{ij} = 0, \quad tr(B) = b_{11} + \cdots + b_{nn} \\ \sum_{i=1}^n dist_{ij}^2 = tr(B) + nb_{jj} \\ \text{同理可得:} \sum_{j=1}^n dist_{ij}^2 = tr(B) + nb_{ii} \\ \text{当 i = 1 时:} \sum_{j=1}^n dist_{1j}^2 = tr(B) + nb_{11} \\ \text{当 i = n 时:} \sum_{j=1}^n dist_{nj}^2 = tr(B) + nb_{nn} \\ \sum_{i=1}^n\sum_{j=1}^n dist_{1j}^2 = ntr(B) + n(b_{11} + \cdots + b_{nn}) = n \ tr(B) + n \ tr(B) = 2n \ tr(B) distij2=zi2+zj22ziTzj=bii+bjj2bijdist1j2=b11+bjj2b1jdist2j2=b22+bjj2b2ji=1ndistij2=b11++bnn+nbjj2(b1j++bnj)因:去中心化i=1nbij=0,tr(B)=b11++bnni=1ndistij2=tr(B)+nbjj同理可得:j=1ndistij2=tr(B)+nbii i = 1 时:j=1ndist1j2=tr(B)+nb11 i = n 时:j=1ndistnj2=tr(B)+nbnni=1nj=1ndist1j2=ntr(B)+n(b11++bnn)=n tr(B)+n tr(B)=2n tr(B)

接着令:
d i s t i ⋅ 2 = 1 n ∑ j = 1 n d i s t i j 2 ( 5 ) d i s t ⋅ j 2 = 1 n ∑ i = 1 n d i s t i j 2 ( 6 ) d i s t ⋅ ⋅ 2 = 1 n 2 ∑ i = 1 n ∑ j = 1 n d i s t i j 2 ( 7 ) dist_{i\cdot}^2 = \frac{1}{n}\sum_{j=1}^n dist_{ij}^2 \quad (5) \\ dist_{\cdot j}^2 = \frac{1}{n}\sum_{i=1}^n dist_{ij}^2 \quad (6) \\ dist_{\cdot\cdot}^2 = \frac{1}{n^2}\sum_{i=1}^n\sum_{j=1}^n dist_{ij}^2 \quad (7) disti2=n1j=1ndistij2(5)distj2=n1i=1ndistij2(6)dist2=n21i=1nj=1ndistij2(7)
由上述式子可得
b i j = − 1 2 ( d i s t i j 2 − d i s t i ⋅ 2 − d i s t ⋅ j 2 + d i s t ⋅ ⋅ 2 ) ( 8 ) b_{ij} = -\frac{1}{2}(dist_{ij}^2 - dist_{i\cdot}^2 - dist_{\cdot j}^2 + dist_{\cdot\cdot}^2) \quad (8) bij=21(distij2disti2distj2+dist2)(8)
b i j b_{ij} bij 计算公式推导过程】:

  • 根据式子(1)可知
    b i j = − 1 2 ( d i s t i j 2 − b i i − b j j ) ( 9 ) b_{ij} = -\frac{1}{2}(dist_{ij}^2 - b_{ii} - b_{jj}) \quad (9) bij=21(distij2biibjj)(9)
  • 再根据式子(2)~(4)可分别得出 b i i b_{ii} bii b j j b_{jj} bjj 以及 tr(B) 的表达式
    b i i = 1 n ∑ j = 1 n d i s t i j 2 − 1 n t r ( B ) b j j = 1 n ∑ i = 1 n d i s t i j 2 − 1 n t r ( B ) t r ( B ) = 1 2 n ∑ i = 1 n ∑ j = 1 n d i s t i j 2 b_{ii} = \frac{1}{n}\sum_{j=1}^ndist_{ij}^2 - \frac{1}{n}tr(B) \\ b_{jj} = \frac{1}{n}\sum_{i=1}^ndist_{ij}^2 - \frac{1}{n}tr(B) \\ tr(B) = \frac{1}{2n}\sum_{i=1}^n\sum_{j=1}^n dist_{ij}^2 bii=n1j=1ndistij2n1tr(B)bjj=n1i=1ndistij2n1tr(B)tr(B)=2n1i=1nj=1ndistij2
  • 通过式子(5)~(7)可进一步化简
    b i i = d i s t i ⋅ 2 − 1 n t r ( B ) b j j = d i s t ⋅ j 2 − 1 n t r ( B ) t r ( B ) = n 2 d i s t ⋅ ⋅ 2 b_{ii} = dist_{i\cdot}^2 - \frac{1}{n}tr(B) \\ b_{jj} = dist_{\cdot j}^2 - \frac{1}{n}tr(B) \\ tr(B) = \frac{n}{2}dist_{\cdot\cdot}^2 bii=disti2n1tr(B)bjj=distj2n1tr(B)tr(B)=2ndist2
  • 最后将 b i i b_{ii} bii b j j b_{jj} bjj 以及 tr(B) 代入式子(9)中可得
    b i j = − 1 2 ( d i s t i j 2 − d i s t i ⋅ 2 − d i s t ⋅ j 2 + d i s t ⋅ ⋅ 2 ) b_{ij} = -\frac{1}{2}(dist_{ij}^2 - dist_{i\cdot}^2 - dist_{\cdot j}^2 + dist_{\cdot\cdot}^2) bij=21(distij2disti2distj2+dist2)
    由此即可通过降维前后保持不变的距离矩阵 D 求取内积矩阵 B。
    b 11 = − 1 2 ( d i s t 11 2 − d i s t 1 ⋅ 2 − d i s t ⋅ 1 2 + d i s t ⋅ ⋅ 2 ) = − 1 2 ( 0 − 5 − 5 + 4 ) = 3 b 12 = 0 b 13 = − 3 b 21 = 0 b 22 = 0 b 23 = 0 b 31 = − 3 b 32 = 0 b 33 = 3 b_{11} = -\frac{1}{2}(dist_{11}^2 - dist_{1\cdot}^2 - dist_{\cdot 1}^2 + dist_{\cdot\cdot}^2) = -\frac{1}{2}(0 - 5 - 5 + 4) = 3 \\ b_{12} = 0 \quad b_{13} = -3 \quad b_{21} = 0 \quad b_{22} = 0 \\ b_{23} = 0 \quad b_{31} = -3 \quad b_{32} = 0 \quad b_{33} = 3 b11=21(dist112dist12dist12+dist2)=21(055+4)=3b12=0b13=3b21=0b22=0b23=0b31=3b32=0b33=3
    求得内积矩阵 B 为
样本编号123
130-3
2000
3-303

对矩阵 B 做特征值分解, B = V Λ V T B = V\Lambda V^T B=VΛVT,其中 Λ = d i a g ( λ 1 , λ 2 , ⋯   , λ d ) \Lambda = diag(\lambda_1, \lambda_2, \cdots, \lambda_d) Λ=diag(λ1,λ2,,λd) 为特征值构成的对角矩阵, λ 1 ≥ λ 2 ≥ ⋯ ≥ λ d \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_d λ1λ2λd,V 为特征向量矩阵。假定其中有 d* 个非零特征值,它们构成对角矩阵 Λ ∗ = d i a g ( λ 1 , λ 2 , ⋯   , λ d ∗ ) \Lambda_* = diag(\lambda_1, \lambda_2, \cdots, \lambda_{d*}) Λ=diag(λ1,λ2,,λd),令 V ∗ V_* V 表示相应的特征向量矩阵,则 Z 可表达为
Z = A ∗ 1 / 2 V ∗ T ∈ R d ∗ × n Z = A_*^{1/2}V_*^T \in R^{d* \times n} Z=A1/2VTRd×n
在现实应用中为了有效降维,往往仅需降维后的距离与原始空间中的距离尽可能接近,而不必严格相等。此时可取 d’ << d 个最大特征值构成对角矩阵 Λ ^ = d i a g ( λ 1 , λ 2 , ⋯ &ThinSpace; , λ d ′ ) \hat{\Lambda} = diag(\lambda_1, \lambda_2, \cdots, \lambda_{d&#x27;}) Λ^=diag(λ1,λ2,,λd),令 V ^ \hat{V} V^ 表示相应的特征向量矩阵,则 Z 可表达为
Z = Λ ^ 1 / 2 V ^ T ∈ R ∗ d ′ × n Z = \hat{\Lambda}^{1/2}\hat{V}^T \in R*{d&#x27; \times n} Z=Λ^1/2V^TRd×n

【算法描述】:

  • 输入:距离矩阵 D ∈ R n × n D \in R^{n \times n} DRn×n;低维空间维数 d’。
  • 过程:
  1. 根据式子(5)~(7)计算 d i s t i ⋅ 2 , d i s t ⋅ j 2 , d i s t ⋅ ⋅ 2 dist_{i\cdot}^2, dist_{\cdot j}^2, dist_{\cdot\cdot}^2 disti2,distj2,dist2
  2. 根据式子(8)计算矩阵 B;
  3. 对矩阵 B 做特征值分解;
  4. Λ ^ \hat{\Lambda} Λ^ 为 d’ 个最大特征值所构成的对角矩阵, V ^ \hat{V} V^ 为相应的特征向量矩阵。
  • 输出:矩阵 Z T = V ^ Λ ^ 1 / 2 ∈ R n × d ′ Z^T = \hat{V}\hat{\Lambda}^{1/2} \in R^{n \times d&#x27;} ZT=V^Λ^1/2Rn×d,每行是一个样本的低维坐标。

继续刚才的例子,我们已经求得矩阵 B,接着我们对矩阵 B 做特征值分解。
B = [ 3 − λ 0 − 3 0 − λ 0 − 3 0 3 − λ ] B = \begin{bmatrix} 3 - \lambda &amp;&amp; 0 &amp;&amp; -3 \\ 0 &amp;&amp; -\lambda &amp;&amp; 0 \\ -3 &amp;&amp; 0 &amp;&amp; 3 - \lambda \\ \end{bmatrix} B=3λ030λ0303λ
可求得 λ 2 ( 6 − λ ) = 0 \lambda^2(6 - \lambda) = 0 λ2(6λ)=0 λ 1 = 6 , λ 2 = λ 3 = 0 \lambda_1 = 6, \lambda_2 = \lambda_3 = 0 λ1=6,λ2=λ3=0。此时, Λ = d i a g ( 6 , 0 , 0 ) ,   Λ ∗ = d i a g ( 6 ) ,   V ∗ T = [ 1 , 0 , − 1 ] \Lambda = diag(6, 0, 0), \ \Lambda_* = diag(6), \ V_*^T = [1, 0, -1] Λ=diag(6,0,0), Λ=diag(6), VT=[1,0,1]。基于上述求得的内容,Z 可表达为 Z = [ 6 ] [ 1 , 0 , − 1 ] = [ 6 , 0 , − 6 ] Z = [\sqrt{6}][1, 0, -1] = [\sqrt{6}, 0, -\sqrt{6}] Z=[6 ][1,0,1]=[6 ,0,6 ]

Z T = [ 6 0 − 6 ] Z^T = \begin{bmatrix} \sqrt{6} \\ 0 \\ -\sqrt{6} \end{bmatrix} ZT=6 06
可以看到,我们将 3 个三维的特征向量转换为 3 个一维的特征向量,从而实现了降维。

代码实现

【步骤】:

  1. 计算原始空间的距离矩阵
from scipy.spatial.distance import cdist

original_matrix = cdist(dataset, dataset, 'euclidean')

可直接引入 scipy.spatial.distance 包的 cdist 函数实现,该处代码使用的是欧式距离。

  1. 依次计算 b i j b_{ij} bij 所对应的 d i s t i j 2 ,   d i s t i ⋅ 2 ,   d i s t ⋅ j 2 ,   d i s t ⋅ ⋅ 2 dist_{ij}^2, \ dist_{i\cdot}^2, \ dist_{\cdot j}^2, \ dist_{\cdot\cdot}^2 distij2, disti2, distj2, dist2,以及 b i j b_{ij} bij,并获得矩阵 B
def _cal_dist(matrix, row, col):
    if row == '*' and col == '*':
        return np.sum(matrix**2)
    elif row == '*' and col != '*':
        return np.sum(matrix[:, col]**2)
    elif row != '*' and col == '*':
        return np.sum(matrix[row, :]**2)
    else:
        return matrix[row, col]**2
        
# 计算 dist_i.、dist_.j 以及 dist_..
dist_matrix = np.matrix(np.zeros(original_matrix.shape))
rows, cols = dist_matrix.shape
# 获得矩阵 B
for row in range(rows):
    for col in range(cols):
        distij = self._cal_dist(original_matrix, row, col)
        dist_i = self._cal_dist(original_matrix, row, '*') / length
        dist_j = self._cal_dist(original_matrix, '*', col) / length
        dist_all = self._cal_dist(original_matrix, '*', '*') / (length**2)
        dist_j, dist_all)
        dist_matrix[row, col] = -(distij - dist_i - dist_j + dist_all) / 2
  1. 计算矩阵 B 的特征值以及特征向量
# 计算特征值和特征向量
feature_values, feature_vectors = np.linalg.eig(dist_matrix)
  1. 从已有的特征向量中选择非零特征值对应的特征向量,构造 V*
select_feature_values = []
for i in range(len(feature_values) - 1, -1, -1):
    if np.round(feature_values[i]) != 0:
        select_feature_values.append(feature_values[i])
    else:
        feature_vectors = np.delete(feature_vectors, i, axis=1)
eye_matrix = np.eye(len(select_feature_values))
for i in range(len(select_feature_values)):
    eye_matrix[i, i] = select_feature_values[i]
  1. 返回 Z T = V ^ Λ ^ 1 / 2 Z^T = \hat{V}\hat{\Lambda}^{1/2} ZT=V^Λ^1/2
return np.dot(feature_vectors, eye_matrix**0.5)

【完整代码】:

import numpy as np
from scipy.spatial.distance import cdist


class MDS:
    
    def __init__(self):
        from scipy.spatial.distance import cdist
    
    def _cal_dist(self, matrix, row, col):
        if row == '*' and col == '*':
            return np.sum(matrix**2)
        elif row == '*' and col != '*':
            return np.sum(matrix[:, col]**2)
        elif row != '*' and col == '*':
            return np.sum(matrix[row, :]**2)
        else:
            return matrix[row, col]**2
    
    def fit(self, dataset):
        length = dataset.shape[0]
        # 计算原始空间的距离矩阵
        original_matrix = cdist(dataset, dataset, 'euclidean')
        # 计算 dist_i.、dist_.j 以及 dist_..
        dist_matrix = np.matrix(np.zeros(original_matrix.shape))
        rows, cols = dist_matrix.shape
        # 获得矩阵 B
        for row in range(rows):
            for col in range(cols):
                distij = self._cal_dist(original_matrix, row, col)
                dist_i = self._cal_dist(original_matrix, row, '*') / length
                dist_j = self._cal_dist(original_matrix, '*', col) / length
                dist_all = self._cal_dist(original_matrix, '*', '*') / (length**2)
                dist_matrix[row, col] = -(distij - dist_i - dist_j + dist_all) / 2
        # 计算特征值和特征向量
        feature_values, feature_vectors = np.linalg.eig(dist_matrix)
        select_feature_values = []
        for i in range(len(feature_values) - 1, -1, -1):
            if np.round(feature_values[i]) != 0:
                select_feature_values.append(feature_values[i])
            else:
                feature_vectors = np.delete(feature_vectors, i, axis=1)
        eye_matrix = np.eye(len(select_feature_values))
        for i in range(len(select_feature_values)):
            eye_matrix[i, i] = select_feature_values[i]
        return np.dot(feature_vectors, eye_matrix**0.5)


mds = MDS()
mds.fit(np.array([
    [1, 1, 1],
    [2, 2, 2],
    [3, 3, 3]
]))

参考

  • 9
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值