提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
前言
设真实标签有m类,聚类结果类别数目和真实标签类别数目一样,也是m类。但是真实标签与预测标签类标可能不同,比如真实标签是[1,1,2,2,3,3],预测标签是[3,3,1,1,2,2],那就是说真实标签1对应预测标签3,2对应1,3对应1。那么如何实现真实标签和聚类结果标签的映射呢?
一、根据重复度最大的值直接确定标签映射关系
%真实标签:La1 聚类结果标签:La2 映射后的标签:NewLabel
Label1=unique(La1');
L1=length(Label1);
Label2=unique(La2');
L2=length(Label2);
ncls=L2;
label=zeros(1,ncls);
for k=1:L2
index=find(La2==k);
tmp=zeros(1,ncls);
for j=1:ncls
tmp(j)=sum(La1(index)==j);
end
[~,l]=max(tmp);
label(k)=l;
end
NewLabel=label(La2);
这种方法就是我们直接找出对应每个预测标签,每一行中重复度最大的值,确定它的位置就可以了,但是这样的话会出现多个不同行的重复度最大的值在同一列的情况(即真实标签和聚类结果标签的映射不是1对1),这显然是不合理的。
二、使用匈牙利算法
1.计算真实标签和聚类标签结果的重复度,并将结果存储在矩阵G中
%真实标签:La1 聚类结果标签:La2 映射后的标签:NewLabel
Label1=unique(La1');
L1=length(Label1);
Label2=unique(La2');
L2=length(Label2);
%构建计算两种分类标签重复度的矩阵G
G = zeros(max(L1,L2),max(L1,L2));
for i=1:L1
index1= La1==Label1(1,i);
for j=1:L2
index2= La2==Label2(1,j);
G(i,j)=sum(index1.*index2);
end
end
2.使用匈牙利算法
%利用匈牙利算法计算出映射重排后的矩阵
[index]=munkres(-G);
%将映射重排结果转换为一个存储有映射重排后标签顺序的行向量
[temp]=MarkReplace(index);
%生成映射重排后的标签NewLabel
NewLabel=zeros(size(La2));
for i=1:L2
NewLabel(La2==Label2(i))=temp(i);
end
end
function [assignment] = munkres(costMat)
% MUNKRES Munkres Assign Algorithm
%
% [ASSIGN,COST] = munkres(COSTMAT) returns the optimal assignment in ASSIGN
% with the minimum COST based on the assignment problem represented by the
% COSTMAT, where the (i,j)th element represents the cost to assign the jth
% job to the ith worker.
%
% This is vectorized implementation of the algorithm. It is the fastest
% among all Matlab implementations of the algorithm.
% Examples
% Example 1: a 5 x 5 example
%{
[assignment,cost] = munkres(magic(5));
[assignedrows,dum]=find(assignment);
disp(assignedrows'); % 3 2 1 5 4
disp(cost); %15
%}
% Example 2: 400 x 400 random data
%{
n=5;
A=rand(n);
tic
[a,b]=munkres(A);
toc
%}
% Reference:
% "Munkres' Assignment Algorithm, Modified for Rectangular Matrices",
% http://csclab.murraystate.edu/bob.pilgrim/445/munkres.html
% version 1.0 by Yi Cao at Cranfield University on 17th June 2008
assignment = false(size(costMat));
costMat(costMat~=costMat)=Inf;
validMat = costMat<Inf;
validCol = any(validMat);
validRow = any(validMat,2);
nRows = sum(validRow);
nCols = sum(validCol);
n = max(nRows,nCols);
if ~n
return
end
dMat = zeros(n);
dMat(1:nRows,1:nCols) = costMat(validRow,validCol);
%*************************************************
% Munkres' Assignment Algorithm starts here
%*************************************************
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% STEP 1: Subtract the row minimum from each row.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
dMat = bsxfun(@minus, dMat, min(dMat,[],2));
%**************************************************************************
% STEP 2: Find a zero of dMat. If there are no starred zeros in its
% column or row start the zero. Repeat for each zero
%**************************************************************************
zP = ~dMat;
starZ = false(n);
while any(zP(:))
[r,c]=find(zP,1);
starZ(r,c)=true;
zP(r,:)=false;
zP(:,c)=false;
end
while 1
%**************************************************************************
% STEP 3: Cover each column with a starred zero. If all the columns are
% covered then the matching is maximum
%**************************************************************************
primeZ = false(n);
coverColumn = any(starZ);
if ~any(~coverColumn)
break
end
coverRow = false(n,1);
while 1
%**************************************************************************
% STEP 4: Find a noncovered zero and prime it. If there is no starred
% zero in the row containing this primed zero, Go to Step 5.
% Otherwise, cover this row and uncover the column containing
% the starred zero. Continue in this manner until there are no
% uncovered zeros left. Save the smallest uncovered value and
% Go to Step 6.
%**************************************************************************
zP(:) = false;
zP(~coverRow,~coverColumn) = ~dMat(~coverRow,~coverColumn);
Step = 6;
while any(any(zP(~coverRow,~coverColumn)))
[uZr,uZc] = find(zP,1);
primeZ(uZr,uZc) = true;
stz = starZ(uZr,:);
if ~any(stz)
Step = 5;
break;
end
coverRow(uZr) = true;
coverColumn(stz) = false;
zP(uZr,:) = false;
zP(~coverRow,stz) = ~dMat(~coverRow,stz);
end
if Step == 6
% *************************************************************************
% STEP 6: Add the minimum uncovered value to every element of each covered
% row, and subtract it from every element of each uncovered column.
% Return to Step 4 without altering any stars, primes, or covered lines.
%**************************************************************************
M=dMat(~coverRow,~coverColumn);
minval=min(min(M));
if minval==inf
return
end
dMat(coverRow,coverColumn)=dMat(coverRow,coverColumn)+minval;
dMat(~coverRow,~coverColumn)=M-minval;
else
break
end
end
%**************************************************************************
% STEP 5:
% Construct a series of alternating primed and starred zeros as
% follows:
% Let Z0 represent the uncovered primed zero found in Step 4.
% Let Z1 denote the starred zero in the column of Z0 (if any).
% Let Z2 denote the primed zero in the row of Z1 (there will always
% be one). Continue until the series terminates at a primed zero
% that has no starred zero in its column. Unstar each starred
% zero of the series, star each primed zero of the series, erase
% all primes and uncover every line in the matrix. Return to Step 3.
%**************************************************************************
rowZ1 = starZ(:,uZc);
starZ(uZr,uZc)=true;
while any(rowZ1)
starZ(rowZ1,uZc)=false;
uZc = primeZ(rowZ1,:);
uZr = rowZ1;
rowZ1 = starZ(:,uZc);
starZ(uZr,uZc)=true;
end
end
%生成标签矩阵
assignment(validRow,validCol) = starZ(1:nRows,1:nCols);
%解决标签映射问题不需要计算权重cost,故将其注释
%cost = 0;
%cost = sum(costMat(assignment));
%将存储标签顺序的空间矩阵转换为一个行向量
function [assignment] = MarkReplace(MarkMat)
[rows,cols]=size(MarkMat);
assignment=zeros(1,cols);
for i=1:rows
for j=1:cols
if MarkMat(i,j)==1
assignment(1,j)=i;
end
end
end
end
总结
显然,使用第二种方法更好。但是,当预测类标分类数大于实际类标分类数,比如,实际类标10类,预测类标15类,就无法使用匈牙利算法。因为匈牙利算法实际上是一种指派问题,只适合于一对一的指派