LLE算法可以归结为三步:
(1)寻找每个样本点的k个近邻点;
(2)由每个样本点的近邻点计算出该样本点的局部重建权值矩阵;
(3)由该样本点的局部重建权值矩阵和其近邻点计算出该样本点的输出值。
Matlab LLE主函数:
% LLE ALGORITHM (using
% [Y] = lle(X,K,dmax)
% X
% K
% dmax
% Y
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [Y] = lle(X,K,d)
[D,N] = size(X);
fprintf(1,'LLE running on %d points in %d dimensionsn',N,D);
%% Step1: compute pairwise distances & find neighbour
fprintf(1,'-->Finding %d nearest neighbours.n',K);
X2 = sum(X.^2,1);
distance = repmat(X2,N,1)+repmat(X2',1,N)-2*X'*X;
[sorted,index] = sort(distance);
neighborhood = index(2:(1+K),:);
% Step2: solve for recinstruction weights
fprintf(1,'-->Solving for reconstruction weights.n');
if(K>D)
else
end
W = zeros(K,N);
for ii=1:N
end;
% Step 3: compute embedding from eigenvects of cost matrix M=(I-W)'(I-W)
fprintf(1,'-->Computing embedding.n');
% M=eye(N,N); % use a sparse matrix with storage for 4KN nonzero elements
M = sparse(1:N,1:N,ones(1,N),N,N,4*K*N);
for ii=1:N
end;
% calculation of embedding
options.disp = 0;
options.isreal = 1;
options.issym = 1;
[Y,eigenvals] = eigs(M,d+1,0,options);
Y = Y(:,2:d+1)'*sqrt(N); % bottom evect is [1,1,1,1...] with eval 0
fprintf(1,'Done.n');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% other possible regularizers for K>D
%
%
测试用例(瑞士卷,貌似挺好吃的):
clear all,clc
N = 2000;
K = 12;
d = 2;
% Plot true manfold
tt0 = (3*pi/2)*(1+2*[0:0.02:1]); hh = [0:0.125:1]*30;
xx
yy
zz
cc
subplot(1,3,1); cla;
surf(xx,yy,zz,cc);
view([12 20]); grid off; axis off; hold on;
lnx=-5*[3,3,3;3,-4,3]; lny=[0,0,0;32,0,0]; lnz=-5*[3,3,3;3,3,-3];
lnh=line(lnx,lny,lnz);
set(lnh,'Color',[1,1,1],'LineWidth',2,'LineStyle','-','Clipping','off');
axis([-15,20,0,32,-15,15]);
%generate sample data
tt
height = 21*rand(1,N);
X
%scatter plot of sampled data
subplot(1,3,2); cla;
scatter3(X(1,:),X(2,:),X(3,:),12,tt,'+');
view([12 20]); grid off; axis off; hold on;
lnh=line(lnx,lny,lnz);
set(lnh,'Color',[1,1,1],'LineWidth',2,'LineStyle','-','Clipping','off');
axis([-15,20,0,32,-15,15]); drawnow;
%run LLE algorithm
Y=lle(X,K,d);
%scatterplot of embedding
subplot(1,3,3); cla;
scatter(Y(1,:),Y(2,:),12,tt,'+');
grid off;
set(gca,'XTick',[]); set(gca,'YTick',[]);
另外还测试了一张S-CURVE,如下所示:
从上面这两张图可以看出,LLE算法是一种针对非线性数据的降维方法,处理后的低维数据均能够保持原有的拓扑关系。
现在LLE算法已经广泛应用于图像数据的分类与聚类、文字识别、多维数据的可视化、以及生物信息学等领域中。我目前所做的工作是基于Sparse Representation的Super Resolution,看的一篇有介绍使用这种算法进行数据降维的工作,后续有时间会继续写下去。
参考:
1、Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5500): 2323-2326.
2、算法:http://www.cs.nyu.edu/~roweis/lle/code.html