多维缩放 MDS

最新推荐文章于 2024-05-30 12:50:27 发布

clvsit

最新推荐文章于 2024-05-30 12:50:27 发布

阅读量3.1k

点赞数 9

分类专栏：机器学习

本文链接：https://blog.csdn.net/weixin_43378396/article/details/89319328

版权

机器学习专栏收录该内容

33 篇文章 11 订阅

订阅专栏

多维缩放

多维缩放（Multiple Dimensional Scaling，简称 MDS）是一种经典的降维方法，要求原始空间中样本之间的距离在低维空间中得以保持。

推导过程

假定 n 个样本在原始空间的距离矩阵为 $\in R^{n \times n}$ ，其中第 i 行 j 列的元素 $dist_{ij}$ 为样本 $x_i$ 到 $x_j$ 的距离。我们的目标是获得样本在 d’ 维空间的表示 $\in R^{d' \times n}, \ d' \leq d$ ，且任意两个样本在 d’ 维空间中的欧式距离等于原始空间中的距离，即 $z_i - z_j|| = dist_{ij}$ 。

按照上面所说的内容，我们先创建 3 个样本，分别为 (1, 1, 1)、(2, 2, 2) 以及 (3, 3, 3)，这 3 个样本在原始空间的距离矩阵为

样本编号	1	2	3
1	0	$\sqrt{3}$	$2\sqrt{3}$
2	$\sqrt{3}$	0	$\sqrt{3}$
3	$2\sqrt{3}$	$\sqrt{3}$	0

我们现在的目标是获得样本在 2 维空间的表示，且任意两个样本在 2 维空间中的欧式距离等于原始空间（3 维）中的距离。

令 $Z^TZ \in R^{n \times n}$ ，其中 B 为降维后样本的内积矩阵， $b_{ij} = z_i^Tz_j$ ，有
$dist_{ij} = ||z_i - z_j|| \\ dist_{ij}^2 = ||z_i||^2 + ||z_j||^2 - 2z_i^Tz_j = b_{ii} + b_{jj} - 2b_{ij} \quad (1)$
为便于讨论，令降维后的样本 Z 被中心化，即 $\sum_{i=1}^n z_i = 0$ 。显然，矩阵 B 的行与列之和均为零，即 $\sum_{i=1}^n b_{ij} = \sum_{j=1}^n b_{ij} = 0$ 。易知
$\sum_{i=1}^n dist_{ij}^2 = tr(B) + nb_{jj} \quad (2) \\ \sum_{j=1}^n dist_{ij}^2 = tr(B) + nb_{ii} \quad (3) \\ \sum_{i=1}^n\sum_{j=1}^n dist_{ij}^2 = 2n \ tr(B) \quad (4) \\$
其中 tr(.) 表示矩阵的迹（trace）， $\sum_{i=1}^n ||z_i||^2$ ，矩阵 B 的主对角上元素之和。

【上述公式推导过程】：
$dist_{ij}^2 = ||z_i||^2 + ||z_j||^2 - 2z_i^Tz_j = b_{ii} + b_{jj} - 2b_{ij} \\ dist_{1j}^2 = b_{11} + b_{jj} - 2b_{1j} \quad dist_{2j}^2 = b_{22} + b_{jj} - 2b_{2j} \\ \cdots \\ \sum_{i=1}^n dist_{ij}^2 = b_{11} + \cdots + b_{nn} + nb_{jj} - 2(b_{1j} + \cdots + b_{nj}) \\ \text{因：去中心化} \quad \sum_{i=1}^n b_{ij} = 0, \quad tr(B) = b_{11} + \cdots + b_{nn} \\ \sum_{i=1}^n dist_{ij}^2 = tr(B) + nb_{jj} \\ \text{同理可得：} \sum_{j=1}^n dist_{ij}^2 = tr(B) + nb_{ii} \\ \text{当 i = 1 时：} \sum_{j=1}^n dist_{1j}^2 = tr(B) + nb_{11} \\ \text{当 i = n 时：} \sum_{j=1}^n dist_{nj}^2 = tr(B) + nb_{nn} \\ \sum_{i=1}^n\sum_{j=1}^n dist_{1j}^2 = ntr(B) + n(b_{11} + \cdots + b_{nn}) = n \ tr(B) + n \ tr(B) = 2n \ tr(B)$

接着令：
$dist_{i\cdot}^2 = \frac{1}{n}\sum_{j=1}^n dist_{ij}^2 \quad (5) \\ dist_{\cdot j}^2 = \frac{1}{n}\sum_{i=1}^n dist_{ij}^2 \quad (6) \\ dist_{\cdot\cdot}^2 = \frac{1}{n^2}\sum_{i=1}^n\sum_{j=1}^n dist_{ij}^2 \quad (7)$
由上述式子可得
$b_{ij} = -\frac{1}{2}(dist_{ij}^2 - dist_{i\cdot}^2 - dist_{\cdot j}^2 + dist_{\cdot\cdot}^2) \quad (8)$
【 $b_{ij}$ 计算公式推导过程】：

根据式子（1）可知
$b_{ij} = -\frac{1}{2}(dist_{ij}^2 - b_{ii} - b_{jj}) \quad (9)$
再根据式子（2）~（4）可分别得出 $b_{ii}$ 、 $b_{jj}$ 以及 tr(B) 的表达式
$b_{ii} = \frac{1}{n}\sum_{j=1}^ndist_{ij}^2 - \frac{1}{n}tr(B) \\ b_{jj} = \frac{1}{n}\sum_{i=1}^ndist_{ij}^2 - \frac{1}{n}tr(B) \\ tr(B) = \frac{1}{2n}\sum_{i=1}^n\sum_{j=1}^n dist_{ij}^2$
通过式子（5）~（7）可进一步化简
$b_{ii} = dist_{i\cdot}^2 - \frac{1}{n}tr(B) \\ b_{jj} = dist_{\cdot j}^2 - \frac{1}{n}tr(B) \\ tr(B) = \frac{n}{2}dist_{\cdot\cdot}^2$
最后将 $b_{ii}$ 、 $b_{jj}$ 以及 tr(B) 代入式子（9）中可得
$b_{ij} = -\frac{1}{2}(dist_{ij}^2 - dist_{i\cdot}^2 - dist_{\cdot j}^2 + dist_{\cdot\cdot}^2)$
由此即可通过降维前后保持不变的距离矩阵 D 求取内积矩阵 B。
$b_{11} = -\frac{1}{2}(dist_{11}^2 - dist_{1\cdot}^2 - dist_{\cdot 1}^2 + dist_{\cdot\cdot}^2) = -\frac{1}{2}(0 - 5 - 5 + 4) = 3 \\ b_{12} = 0 \quad b_{13} = -3 \quad b_{21} = 0 \quad b_{22} = 0 \\ b_{23} = 0 \quad b_{31} = -3 \quad b_{32} = 0 \quad b_{33} = 3$
求得内积矩阵 B 为

样本编号	1	3
1	3	-3
2	0	0
3	-3	3

对矩阵 B 做特征值分解， $V\Lambda V^T$ ，其中 $\Lambda = diag(\lambda_1, \lambda_2, \cdots, \lambda_d)$ 为特征值构成的对角矩阵， $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_d$ ，V 为特征向量矩阵。假定其中有 d* 个非零特征值，它们构成对角矩阵 $\Lambda_* = diag(\lambda_1, \lambda_2, \cdots, \lambda_{d*})$ ，令 $V_*$ 表示相应的特征向量矩阵，则 Z 可表达为
$A_*^{1/2}V_*^T \in R^{d* \times n}$
在现实应用中为了有效降维，往往仅需降维后的距离与原始空间中的距离尽可能接近，而不必严格相等。此时可取 d’ << d 个最大特征值构成对角矩阵 $\hat{\Lambda} = diag(\lambda_1, \lambda_2, \cdots, \lambda_{d'})$ ，令 $\hat{V}$ 表示相应的特征向量矩阵，则 Z 可表达为
$\hat{\Lambda}^{1/2}\hat{V}^T \in R*{d' \times n}$

【算法描述】：

输入：距离矩阵 $\in R^{n \times n}$ ；低维空间维数 d’。
过程：

根据式子（5）~（7）计算 $dist_{i\cdot}^2, dist_{\cdot j}^2, dist_{\cdot\cdot}^2$ ；
根据式子（8）计算矩阵 B；
对矩阵 B 做特征值分解；
取 $\hat{\Lambda}$ 为 d’ 个最大特征值所构成的对角矩阵， $\hat{V}$ 为相应的特征向量矩阵。

输出：矩阵 $Z^T = \hat{V}\hat{\Lambda}^{1/2} \in R^{n \times d'}$ ，每行是一个样本的低维坐标。

继续刚才的例子，我们已经求得矩阵 B，接着我们对矩阵 B 做特征值分解。
$\begin{bmatrix} 3 - \lambda && 0 && -3 \\ 0 && -\lambda && 0 \\ -3 && 0 && 3 - \lambda \\ \end{bmatrix}$
可求得 $\lambda^2(6 - \lambda) = 0$ ， $\lambda_1 = 6, \lambda_2 = \lambda_3 = 0$ 。此时， $\Lambda = diag(6, 0, 0), \ \Lambda_* = diag(6), \ V_*^T = [1, 0, -1]$ 。基于上述求得的内容，Z 可表达为 $[\sqrt{6}][1, 0, -1] = [\sqrt{6}, 0, -\sqrt{6}]$ 。

$Z^T = \begin{bmatrix} \sqrt{6} \\ 0 \\ -\sqrt{6} \end{bmatrix}$
可以看到，我们将 3 个三维的特征向量转换为 3 个一维的特征向量，从而实现了降维。

代码实现

【步骤】：

计算原始空间的距离矩阵

from scipy.spatial.distance import cdist

original_matrix = cdist(dataset, dataset, 'euclidean')

可直接引入 scipy.spatial.distance 包的 cdist 函数实现，该处代码使用的是欧式距离。

依次计算 $b_{ij}$ 所对应的 $dist_{ij}^2, \ dist_{i\cdot}^2, \ dist_{\cdot j}^2, \ dist_{\cdot\cdot}^2$ ，以及 $b_{ij}$ ，并获得矩阵 B

def _cal_dist(matrix, row, col):
    if row == '*' and col == '*':
        return np.sum(matrix**2)
    elif row == '*' and col != '*':
        return np.sum(matrix[:, col]**2)
    elif row != '*' and col == '*':
        return np.sum(matrix[row, :]**2)
    else:
        return matrix[row, col]**2
        
# 计算 dist_i.、dist_.j 以及 dist_..
dist_matrix = np.matrix(np.zeros(original_matrix.shape))
rows, cols = dist_matrix.shape
# 获得矩阵 B
for row in range(rows):
    for col in range(cols):
        distij = self._cal_dist(original_matrix, row, col)
        dist_i = self._cal_dist(original_matrix, row, '*') / length
        dist_j = self._cal_dist(original_matrix, '*', col) / length
        dist_all = self._cal_dist(original_matrix, '*', '*') / (length**2)
        dist_j, dist_all)
        dist_matrix[row, col] = -(distij - dist_i - dist_j + dist_all) / 2

计算矩阵 B 的特征值以及特征向量

# 计算特征值和特征向量
feature_values, feature_vectors = np.linalg.eig(dist_matrix)

从已有的特征向量中选择非零特征值对应的特征向量，构造 V*

select_feature_values = []
for i in range(len(feature_values) - 1, -1, -1):
    if np.round(feature_values[i]) != 0:
        select_feature_values.append(feature_values[i])
    else:
        feature_vectors = np.delete(feature_vectors, i, axis=1)
eye_matrix = np.eye(len(select_feature_values))
for i in range(len(select_feature_values)):
    eye_matrix[i, i] = select_feature_values[i]

返回 $Z^T = \hat{V}\hat{\Lambda}^{1/2}$

return np.dot(feature_vectors, eye_matrix**0.5)

【完整代码】：

import numpy as np
from scipy.spatial.distance import cdist


class MDS:
    
    def __init__(self):
        from scipy.spatial.distance import cdist
    
    def _cal_dist(self, matrix, row, col):
        if row == '*' and col == '*':
            return np.sum(matrix**2)
        elif row == '*' and col != '*':
            return np.sum(matrix[:, col]**2)
        elif row != '*' and col == '*':
            return np.sum(matrix[row, :]**2)
        else:
            return matrix[row, col]**2
    
    def fit(self, dataset):
        length = dataset.shape[0]
        # 计算原始空间的距离矩阵
        original_matrix = cdist(dataset, dataset, 'euclidean')
        # 计算 dist_i.、dist_.j 以及 dist_..
        dist_matrix = np.matrix(np.zeros(original_matrix.shape))
        rows, cols = dist_matrix.shape
        # 获得矩阵 B
        for row in range(rows):
            for col in range(cols):
                distij = self._cal_dist(original_matrix, row, col)
                dist_i = self._cal_dist(original_matrix, row, '*') / length
                dist_j = self._cal_dist(original_matrix, '*', col) / length
                dist_all = self._cal_dist(original_matrix, '*', '*') / (length**2)
                dist_matrix[row, col] = -(distij - dist_i - dist_j + dist_all) / 2
        # 计算特征值和特征向量
        feature_values, feature_vectors = np.linalg.eig(dist_matrix)
        select_feature_values = []
        for i in range(len(feature_values) - 1, -1, -1):
            if np.round(feature_values[i]) != 0:
                select_feature_values.append(feature_values[i])
            else:
                feature_vectors = np.delete(feature_vectors, i, axis=1)
        eye_matrix = np.eye(len(select_feature_values))
        for i in range(len(select_feature_values)):
            eye_matrix[i, i] = select_feature_values[i]
        return np.dot(feature_vectors, eye_matrix**0.5)


mds = MDS()
mds.fit(np.array([
    [1, 1, 1],
    [2, 2, 2],
    [3, 3, 3]
]))

参考

《机器学习》
numpy求解矩阵的特征值和特征向量：https://blog.csdn.net/appleyuchi/article/details/80117398
numpy.eye() 生成对角矩阵：https://blog.csdn.net/chixujohnny/article/details/51011931

clvsit

关注

9
点赞
踩
23

收藏

觉得还不错? 一键收藏
4
评论
多维缩放 MDS

多维缩放多维缩放（Multiple Dimensional Scaling，简称 MDS）是一种经典的降维方法，要求原始空间中样本之间的距离在低维空间中得以保持。推导过程假定 n 个样本在原始空间的距离矩阵为 D∈Rn×nD \in R^{n \times n}D∈Rn×n，其中第 i 行 j 列的元素 distijdist_{ij}distij 为样本 xix_ixi 到 xjx_jx...
复制链接

扫一扫

专栏目录