归一化谱聚类的实现步骤
- 构建归一化拉普拉斯矩阵L
- 计算L的特征值和特征向量
- 对L的特征向量完成聚类
完成聚类的三种方法
- Kmeans:缺点是对初始化敏感,可以通过运行多次Kmeans找到最优结果
- Discretize:对初始化不太敏感
- cluster_qr:直接从特征向量中提取簇,不进行迭代,无调优参数,性能和质量优于前两种[1]
Matlab代码
dim = length(W); %读取邻接矩阵W的维度,W是已经预先得到的
D = zeros(dim,dim); %初始化度矩阵D为全0矩阵
L = zeros(dim,dim); %初始化拉普拉斯矩阵L
L_sym = zeros(dim,dim); %初始化归一化拉普拉斯矩阵Lsym
for row = 1:1:dim
D(row,row) = sum(W(row,:)); %计算度矩阵D的对角线元素,为邻接矩阵W的行和
end
L = D-W; %计算拉普拉斯矩阵L
L_sym = D^(-1/2)*L*D^(1/2); %计算归一化拉普拉斯矩阵Lsym
L_sym = (L_sym+L_sym')/2; %保证Lsym的对称
[V,~] = eigs(L_sym,k,'smallesabs'); %求Lsym的最小的k个特征值对应的特征向量,eigs求解的特征向量是经过归一化的,即特征向量的二范数=1
Clu_V = clusterQR_random(V,4); %调用clusterQR_random函数
[~, Clu] = max(abs(Clu_V),[],2); %从Clu_V中识别簇
clusterQR_random:
function [U, piv] = clusterQR_random(U,gamma)
% U is N x k and columns are the eigenvectors to be used
%
% U returns the cluster assignment vectors, generically,
% clustering is done by taking location of the max absolute entry in each
% row as the cluster assignment.
%
% piv encodes which columns of UU^T were picked by the QRCP
%
% gamma is the oversampling factor, i.e. gamma*k*log(k) columns are used
k = size(U,2);
NN = size(U,1);
count = min(ceil(gamma*k*log(k)),NN);
rho = sum(U'.^2);
rho = rho/sum(rho);
rhosum = cumsum(rho);
[~, I] = histc(rand(1,count),[0 rhosum]);
I = unique(I);
[~, ~, idx] = qr(U(I,:)',0);
idx = idx(1:k);
piv = I(idx);
[Ut, ~, Vt] = svd(U(piv,:)',0);
U = U*(Ut*Vt');
[1] Anil Damle, Victor Minden, Lexing Ying, Simple, direct and efficient multi-way spectral clustering, Information and Inference: A Journal of the IMA, Volume 8, Issue 1, March 2019, Pages 181–203