silhouette_matlab

最新推荐文章于 2023-06-08 10:49:41 发布

ShiningJotes

最新推荐文章于 2023-06-08 10:49:41 发布

阅读量2.4k

点赞数

分类专栏： matlab 其他文章标签：评价指标

其他同时被 2 个专栏收录

5 篇文章 1 订阅

订阅专栏

matlab

3 篇文章 1 订阅

订阅专栏

转自：https://ww2.mathworks.cn/help/stats/silhouette.html

Silhouette plot

Syntax

silhouette(X,clust) s = silhouette(X,clust) [s,h] = silhouette(X,clust) [...] = silhouette(X,clust,metric) [...] = silhouette(X,clust,distfun,p1,p2,...)

Description

silhouette(X,clust) plots cluster silhouettes for the n-by-p data matrix X, with clusters defined by clust. Rows of X correspond to points, columns correspond to coordinates. clust can be a categorical variable, numeric vector, character matrix, string array, or cell array of character vectors containing a cluster name for each point. silhouette treats NaNs and empty values in clust as missing values, and ignores the corresponding rows of X. By default, silhouette uses the squared Euclidean distance between points in X.

s = silhouette(X,clust) returns the silhouette values in the n-by-1 vector s, but does not plot the cluster silhouettes.

[s,h] = silhouette(X,clust) plots the silhouettes, and returns the silhouette values in the n-by-1 vector s, and the figure handle in h.

[...] = silhouette(X,clust,metric) plots the silhouettes using the inter-point distance function specified in metric. Choices for metric are given in the following table.

Metric	Description
`'Euclidean'`	Euclidean distance
`'sqEuclidean'`	Squared Euclidean distance (default)
`'cityblock'`	Sum of absolute differences
`'cosine'`	One minus the cosine of the included angle between points (treated as vectors)
`'correlation'`	One minus the sample correlation between points (treated as sequences of values)
`'Hamming'`	Percentage of coordinates that differ
`'Jaccard'`	Percentage of nonzero coordinates that differ
Vector	A numeric distance matrix in upper triangular vector form, such as is created by `pdist. X` is not used in this case, and can safely be set to `[]`.

For more information on each metric, see Distance Metrics.

[...] = silhouette(X,clust,distfun,p1,p2,...) accepts a function handle distfun to a metric of the form

d = distfun(X0,X,p1,p2,...)

where X0 is a 1-by-p point, X is an n-by-p matrix of points, and p1,p2,... are optional additional arguments. The function distfun returns an n-by-1 vector d of distances between X0 and each point (row) in X. The arguments p1, p2,... are passed directly to the function distfun.

Examples

collapse all

Create Silhouette Plot

Create a silhouette plot from clustered data.

Generate random sample data.

rng default  % For reproducibility
X = [randn(10,2)+ones(10,2);randn(10,2)-ones(10,2)];

Cluster the data in X using kmeans.

cidx = kmeans(X,2);

Create a silhouette plot from the clustered data.

silhouette(X,cidx)

Compute Silhouette Values

Compute the silhouette values from clustered data.

Generate random sample data.

rng default  % For reproducibility
X = [randn(10,2)+ones(10,2);randn(10,2)-ones(10,2)];

Use kmeans to cluster the data in X based on the sum of absolute differences in distance.

cidx = kmeans(X,2,'distance','cityblock');

Compute the silhouette values from the clustered data. Specify metric as 'cityblock' to indicate that the kmeans clustering is based on the sum of absolute differences.

s = silhouette(X,cidx,'cityblock')

More About

collapse all

Silhouette Value

The silhouette value for each point is a measure of how similar that point is to points in its own cluster, when compared to points in other clusters. The silhouette value for the ith point, Si, is defined as

Si = (bi-ai)/ max(ai,bi)

where ai is the average distance from the ith point to the other points in the same cluster as i, and bi is the minimum average distance from the ith point to points in a different cluster, minimized over clusters.

The silhouette value ranges from -1 to +1. A high silhouette value indicates that i is well-matched to its own cluster, and poorly-matched to neighboring clusters. If most points have a high silhouette value, then the clustering solution is appropriate. If many points have a low or negative silhouette value, then the clustering solution may have either too many or too few clusters. The silhouette clustering evaluation criterion can be used with any distance metric.