目录
LE(Laplacian Eigenmapping )
前两节我们介绍了 LLE 和 Isomap 算法以及其在瑞士卷上的降维,详见
Isomap: https://blog.csdn.net/waitingwinter/article/details/105469511
LLE: https://blog.csdn.net/waitingwinter/article/details/105467074
这一节我们来介绍 LE 算法,其论文地址为
M. Belkin and P . Niyogi, “Laplacian eigenmaps for dimensionality reduction anddata representation,” Neural Computation, vol. 15, no. 6, pp. 1373–1396, 2003
基本思想
给定数据集,通过最近邻等方式构造一个数据图(data graph)。然后,在每一个局部区域,计算点与点之间的亲合度(相似度),期望 点对亲合度 在低维空间中也得到保持。
step1 最近邻确定与最佳权重
假设我们已经找个k个最近邻, 采用下列公式计算亲和度
则亲和度矩阵为
注意到 W 可能不是对称矩阵, 令
step2 构造目标函数
如何在低维空间保持亲合度?构造如下目标函数:
令
简单验证有 是半正定矩阵。记且
故优化问题为
step3 求解优化问题
令 M=D-W, 则
- M 有一个特征值为零,对应的特征向量全为1;
- 对 M 进行特征值分解;
- 采用第2至第d+1个最小的特征值对应的特征向量组成低维嵌入.
MATLAB 程序及实例
仍然是瑞士卷数据
N=2000;
rand('state',123456789);
%Gaussian noise
noise = 0.001*randn(1,N);
%standard swiss roll data
tt = (3*pi/2)*(1+2*rand(1,N)); height = 21*rand(1,N);
X = [(tt+ noise).*cos(tt); height; (tt+ noise).*sin(tt)];
%show the picture
point_size = 20;
figure(1)
cla
scatter3(X(1,:),X(2,:),X(3,:), point_size,tt,'filled');
view([12 12]); grid off; axis off; hold on;
axis off;
axis equal;
drawnow;
X = X';
k = 40;
d = 2;
可视化为
计算k近邻与权重矩阵
% step1: Calculate the k nearest distance
[m, ~] = size(X); %d<~
W = zeros(m);
sigma = 10;
for i =1 : m
xx = repmat(X(i, :), m, 1);
diff = xx - X;
dist = sum(diff.* diff, 2);
[dd, pos] = sort(dist);
index = pos(1 : k + 1)';
index(index == i) = [];
W(i,index) = exp(-dd(index) / (2 * sigma ^ 2));
end
W = 0.5 * (W + W');
计算 M 矩阵及求解优化问题
D = sum(W, 2);
D = diag(D);
M = D - W;
[eigenvector, eigenvalue] = eig(M);
eigenvalue = diag(eigenvalue);
[~,pos] = sort(eigenvalue);
index = pos(1: d+1);
tran = eigenvector(:, index);
p =sum(tran.*tran);
j = find(p == min(p));
tran(:, j) = [];
X = tran;
显示降维后的数据
figure(2)
scatter(X(:,1),X(:,2),point_size, tt,'filled');
grid off; axis off;
我的程序中 大家可以改变其值探索降维效果。
总的MATLAB程序
function X = LE(X, k , d)
%% Laplacian Eigenmapping Algorithm (Minifold Learning)
% k : neighbors
% x : data set
% d : low dimension;
if nargin < 1
N=2000;
rand('state',123456789);
%Gaussian noise
noise = 0.001*randn(1,N);
%standard swiss roll data
tt = (3*pi/2)*(1+2*rand(1,N)); height = 21*rand(1,N);
X = [(tt+ noise).*cos(tt); height; (tt+ noise).*sin(tt)];
%show the picture
point_size = 20;
figure(1)
cla
scatter3(X(1,:),X(2,:),X(3,:), point_size,tt,'filled');
view([12 12]); grid off; axis off; hold on;
axis off;
axis equal;
drawnow;
X = X';
k = 30;
d = 2;
end
% step1: Calculate the k nearest distance
[m, ~] = size(X); %d<~
W = zeros(m);
sigma = 10;
for i =1 : m
xx = repmat(X(i, :), m, 1);
diff = xx - X;
dist = sum(diff.* diff, 2);
[dd, pos] = sort(dist);
index = pos(1 : k + 1)';
index(index == i) = [];
W(i,index) = exp(-dd(index) / (2 * sigma ^ 2));
end
W = 0.5 * (W + W');
D = sum(W, 2);
D = diag(D);
M = D - W;
[eigenvector, eigenvalue] = eig(M);
eigenvalue = diag(eigenvalue);
[~,pos] = sort(eigenvalue);
index = pos(1: d+1);
tran = eigenvector(:, index);
p =sum(tran.*tran);
j = find(p == min(p));
tran(:, j) = [];
X = tran;
figure(2)
scatter(X(:,1),X(:,2),point_size, tt,'filled');
grid off; axis off;