Evaluate the optimal number of clusters using the Calinski-Harabasz clustering evaluation criterion.
Load the sample data.
load fisheriris;
The data contains length and width measurements from the sepals and petals of three species of iris flowers.
Evaluate the optimal number of clusters using the Calinski-Harabasz criterion. Cluster the data using kmeans.
rng('default'); % For reproducibility
eva = evalclusters(meas,'kmeans','CalinskiHarabasz','KList',[1:6])
eva =
CalinskiHarabaszEvaluation with properties:
NumObservations: 150
InspectedK: [1 2 3 4 5 6]
CriterionValues: [Inf 513.9245 561.6278 530.4871 456.1279 469.5068]
OptimalK: 1
The OptimalK value indicates that, based on the Calinski-Harabasz criterion, the optimal number of clusters is three.
Plot the Calinski-Harabasz criterion values for each number of clusters tested.
figure;
plot(eva);
The plot shows that the highest Calinski-Harabasz value occurs at three clusters, suggesting that the optimal number of clusters is three.
Create a grouped scatter plot to examine the relationship between petal length and width. Group the data by suggested clusters.
PetalLength = meas(:,3);
PetalWidth = meas(:,4);
ClusterGroup = eva.OptimalY;
figure;
gscatter(PetalLength,PetalWidth,ClusterGroup,'rbg','xod');
The plot shows cluster 3 in the lower-left corner, completely separated from the other two clusters. Cluster 3 contains flowers with the smallest petal widths and lengths. Cluster 1 is in the upper-right corner, and contains flowers with the largest petal widths and lengths. Cluster 2 is near the center of the plot, and contains flowers with measurements between these two extremes.