Use an input matrix of proposed clustering solutions to evaluate the optimal number of clusters.
Load the sample data.
load fisheriris;
The data contains length and width measurements from the sepals and petals of three species of iris flowers.
Use kmeans to create an input matrix of proposed clustering solutions for the sepal length measurements, using 1, 2, 3, 4, 5, and 6 clusters.
clust = zeros(size(meas,1),6);
for i=1:6
clust(:,i) = kmeans(meas,i,'emptyaction','singleton',...
'replicate',5);
end
Each row of clust corresponds to one sepal length measurement. Each of the six columns corresponds to a clustering solution containing 1 to 6 clusters.
Evaluate the optimal number of clusters using the Calinski-Harabasz criterion.
eva = evalclusters(meas,clust,'CalinskiHarabasz')
eva =
CalinskiHarabaszEvaluation with properties:
NumObservations: 150
InspectedK: [1 2 3 4 5 6]
CriterionValues: [NaN 513.9245 561.6278 530.4871 456.1279 469.5068]
OptimalK: 3
The OptimalK value indicates that, based on the Calinski-Harabasz criterion, the optimal number of clusters is three.