目录
Isomap(isometric feature mapping)
step3 MDS Algorithm(Multiple Dimensional Scaling )
Isomap(isometric feature mapping)
在前一节我们介绍了 LLE 算法及在瑞士卷上的应用,详情见
https://blog.csdn.net/waitingwinter/article/details/105467074
这一节,我们介绍一下与 LLE 同时发表在science上的另一篇文章Isomap(isometric feature mapping), 论文地址为
J. B. Tenenbaum, V . de Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, pp.2319–2323, 2000.
基本思想
给定数据集,通过最近邻等方式构造一个数据图(data graph)。然后,计算任意两个点之间的最
短路径(即测地距离)。对于所有的任意两个点对,期望在低维空间中保持其测地距离。
step1 最近邻确定与权重矩阵
与 LLE 中的步骤相似, 仍然采用欧式距离来确定k个近邻,有所不同的是
与k个近邻点之间的距离设定为欧氏距离,与非近邻点的距离设置为无穷大。
step2 确定点对之间的测地距离
我们要求一个矩阵元素之间的测地距离,也就是最短距离。所谓的测地距离请看下图(https://blog.csdn.net/qq_30683589/article/details/80371438):
常见的求解最短路径、最短距离的方法有以下两种
- 确定起点的最短路径问题- 即已知起始结点,求最短路径的问题。适合使用Dijkstra算法;
- 全局最短路径问题- 求图中所有的最短路径。适合使用Floyd-Warshall算法。
因为我们要求所有点对之间的最短路径,故我们采用 Floyd 算法。具体想法细节参照博客
https://blog.csdn.net/u012366767/article/details/81562482
在求得最短距离后,我们便可以构造一个距离矩阵,此矩阵的每个元素
.
step3 MDS Algorithm(Multiple Dimensional Scaling )
MDS 为多维缩放算法,我们简要介绍其思想
我们已知m维空间中的n个样本之间的距离矩阵,但不清楚样本的坐标,我们试图获得这n个样本在d(<m)维中的表示,满足
降维后的样本仍保持两两之间的距离
假设我们原始样本为 降维后的样本为 我们试图寻找 之间的关系。
我们引入中心化矩阵
不失一般性, 令降维后的样本是零均值的, 即 .
令 为内积矩阵,简单计算有
令
则有
也即为已知的(原始的距离矩阵),经计算有 由此我们可以算出矩阵 , 之后
- 对矩阵B进行特征值分解,即
- 其中
(最大的前d个特征值、对应的特征向量)。
算法步骤
MATLAB程序实现
仍旧是之前的瑞士卷数据
N=1000;
rand('state',123456789)
%Gaussian noise
noise = 0.001*randn(1,N);
%standard swiss roll data
tt = (3*pi/2)*(1+2*rand(1,N)); height = 21*rand(1,N);
X = [(tt+ noise).*cos(tt); height; (tt+ noise).*sin(tt)];
%show the picture
point_size = 20;
figure(1)
cla
scatter3(X(1,:),X(2,:),X(3,:), point_size,tt,'filled');
view([12 12]); grid off; axis off; hold on;
axis off;
axis equal;
drawnow;
X = X';
k = 20;
d = 2;
可视化为
计算k近邻矩阵
% step1: Calculate the k nearest distance
[m, ~] = size(X); %d<~
D = zeros(m);
for i =1 : m
xx = repmat(X(i, :), m, 1);
diff = xx - X;
dist = sum(diff.* diff, 2);
[dd, pos] = sort(dist);
index = pos(1 : k + 1)';
index2 = pos(k + 2 : m);
D(i,index) = sqrt(dd(index));
D(i, index2) = inf;
end
计算最短路径
%step2: recalculate shortest distant matrix
for k=1:m
for i=1:m
for j=1:m
if D(i,j)>D(i,k)+D(k,j)
D(i,j)=D(i,k)+D(k,j);
end
end
end
end
需要注意的是: 此算法的复杂度为O(n^3), 故运行时间会比较长。
最后调用MDS算法降维
% step3: adapt MDS algorithm to reduce dimension
Z = MDS(D, d);
function Z = MDS(D, d)
%% Multiple Dimensional Scailing Method
% Suppose there exists n samples in m-dim, try to reduce dimension
% D: distence matric
% d: dimension
% Z: new sample \in R^(d*n)
[n1, n2] = size(D);
if n1~=n2
printf('D is not a square matrix');
else
n = n1;
e = ones(n,1);
H = eye(n) - 1 / n * (e *e');
D2 = D .* D;
B =-1/2 * H' * D2 * H;
[eigenvector, eigenvalue] = eig(B);
eigenvalue = diag(eigenvalue);
[~, pos] = sort(eigenvalue,'descend');
index = pos(1:d);
eigenvector = eigenvector(:, index);
Z = diag(eigenvalue(index))^(0.5) * eigenvector';
end
显示分类图形
% show the distribution
figure(2)
scatter(Z(1, :), Z(2, :), point_size, tt, 'filled');
grid off; axis off;
可视化图形为
总的MATLAB程序
function X = isomap(X, k, d)
%% minifold learning: isometric feacture mapping
% k : neighbors
% x : data set
% d : low dimension;
if nargin < 1
rand('state',123456789)
N=1000;
%Gaussian noise
noise = 0.001*randn(1,N);
%standard swiss roll data
tt = (3*pi/2)*(1+2*rand(1,N)); height = 21*rand(1,N);
X = [(tt+ noise).*cos(tt); height; (tt+ noise).*sin(tt)];
%show the picture
point_size = 20;
figure(1)
cla
scatter3(X(1,:),X(2,:),X(3,:), point_size,tt,'filled');
view([12 12]); grid off; axis off; hold on;
axis off;
axis equal;
drawnow;
X = X';
k = 30;
d = 2;
end
% step1: Calculate the k nearest distance
[m, ~] = size(X); %d<~
D = zeros(m);
for i =1 : m
xx = repmat(X(i, :), m, 1);
diff = xx - X;
dist = sum(diff.* diff, 2);
[dd, pos] = sort(dist);
index = pos(1 : k + 1)';
index2 = pos(k + 2 : m);
D(i,index) = sqrt(dd(index));
D(i, index2) = inf;
end
%step2: recalculate shortest distant matrix
for k=1:m
for i=1:m
for j=1:m
if D(i,j)>D(i,k)+D(k,j)
D(i,j)=D(i,k)+D(k,j);
end
end
end
end
% for i = 1 : m
% for j = 1 : m
% if D(i,j) == inf
% D(i,j) = D(j,i);
% end
% end
% end
% step3: adapt MDS algorithm to reduce dimension
Z = MDS(D, d);
% show the distribution
figure(2)
scatter(Z(1, :), Z(2, :), point_size, tt, 'filled');
grid off; axis off;
function Z = MDS(D, d)
%% Multiple Dimensional Scailing Method
% Suppose there exists n samples in m-dim, try to reduce dimension
% D: distence matric
% d: dimension
% Z: new sample \in R^(d*n)
[n1, n2] = size(D);
if n1~=n2
printf('D is not a square matrix');
else
n = n1;
e = ones(n,1);
H = eye(n) - 1 / n * (e *e');
D2 = D .* D;
B =-1/2 * H' * D2 * H;
[eigenvector, eigenvalue] = eig(B);
eigenvalue = diag(eigenvalue);
[~, pos] = sort(eigenvalue,'descend');
index = pos(1:d);
eigenvector = eigenvector(:, index);
Z = diag(eigenvalue(index))^(0.5) * eigenvector';
end
下一篇我们将介绍 LE 算法的思想及MATLAB实现
https://blog.csdn.net/waitingwinter/article/details/105471484