silhouette_matlab

转自:https://ww2.mathworks.cn/help/stats/silhouette.html

Silhouette plot

Syntax

silhouette(X,clust)
s = silhouette(X,clust)
[s,h] = silhouette(X,clust)
[...] = silhouette(X,clust,metric)
[...] = silhouette(X,clust,distfun,p1,p2,...)

Description

silhouette(X,clust) plots cluster silhouettes for the n-by-p data matrix X, with clusters defined by clust. Rows of X correspond to points, columns correspond to coordinates. clust can be a categorical variable, numeric vector, character matrix, string array, or cell array of character vectors containing a cluster name for each point. silhouette treats NaNs and empty values in clust as missing values, and ignores the corresponding rows of X. By default, silhouette uses the squared Euclidean distance between points in X.

s = silhouette(X,clust) returns the silhouette values in the n-by-1 vector s, but does not plot the cluster silhouettes.

[s,h] = silhouette(X,clust) plots the silhouettes, and returns the silhouette values in the n-by-1 vector s, and the figure handle in h.

[...] = silhouette(X,clust,metric) plots the silhouettes using the inter-point distance function specified in metric. Choices for metric are given in the following table.

MetricDescription
'Euclidean'

Euclidean distance

'sqEuclidean'

Squared Euclidean distance (default)

'cityblock'

Sum of absolute differences

'cosine'

One minus the cosine of the included angle between points (treated as vectors)

'correlation'

One minus the sample correlation between points (treated as sequences of values)

'Hamming'

Percentage of coordinates that differ

'Jaccard'

Percentage of nonzero coordinates that differ

Vector

A numeric distance matrix in upper triangular vector form, such as is created by pdist. X is not used in this case, and can safely be set to [].

For more information on each metric, see Distance Metrics.

[...] = silhouette(X,clust,distfun,p1,p2,...) accepts a function handle distfun to a metric of the form

d = distfun(X0,X,p1,p2,...)

where X0 is a 1-by-p point, X is an n-by-p matrix of points, and p1,p2,... are optional additional arguments. The function distfun returns an n-by-1 vector d of distances between X0 and each point (row) in X. The arguments p1, p2,... are passed directly to the function distfun.

Examples

collapse all

Create Silhouette Plot

Create a silhouette plot from clustered data.

Generate random sample data.

rng default  % For reproducibility
X = [randn(10,2)+ones(10,2);randn(10,2)-ones(10,2)];

Cluster the data in X using kmeans.

cidx = kmeans(X,2);

Create a silhouette plot from the clustered data.

silhouette(X,cidx)

Compute Silhouette Values

Compute the silhouette values from clustered data.

Generate random sample data.

rng default  % For reproducibility
X = [randn(10,2)+ones(10,2);randn(10,2)-ones(10,2)];

Use kmeans to cluster the data in X based on the sum of absolute differences in distance.

cidx = kmeans(X,2,'distance','cityblock');

Compute the silhouette values from the clustered data. Specify metric as 'cityblock' to indicate that the kmeans clustering is based on the sum of absolute differences.

s = silhouette(X,cidx,'cityblock')
s = 20×1

    0.0816
    0.5848
    0.1906
    0.2781
    0.3954
    0.4050
    0.0897
    0.5416
    0.6203
    0.6664
      ⋮

More About

collapse all

Silhouette Value

The silhouette value for each point is a measure of how similar that point is to points in its own cluster, when compared to points in other clusters. The silhouette value for the ith point, Si, is defined as

Si = (bi-ai)/ max(ai,bi)

where ai is the average distance from the ith point to the other points in the same cluster as i, and bi is the minimum average distance from the ith point to points in a different cluster, minimized over clusters.

The silhouette value ranges from -1 to +1. A high silhouette value indicates that i is well-matched to its own cluster, and poorly-matched to neighboring clusters. If most points have a high silhouette value, then the clustering solution is appropriate. If many points have a low or negative silhouette value, then the clustering solution may have either too many or too few clusters. The silhouette clustering evaluation criterion can be used with any distance metric.

References

[1] Kaufman L., and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, NJ: John Wiley & Sons, Inc., 1990.

### 回答1: matlab中的silhouette函数是用于计算聚类结果的轮廓系数的函数。轮廓系数是一种用于评估聚类结果的指标,它反映了聚类结果的紧密度和分离度。silhouette函数可以帮助用户快速计算聚类结果的轮廓系数,并且可以根据轮廓系数的大小来评估聚类结果的好坏。 ### 回答2: silhouetteMATLAB中一个计算轮廓系数(silhouette coefficient)的函数。轮廓系数是一个用于评估聚类结果好坏的指标,它是对聚类结果中各个样本间距离越来越小,组内相似度越来越高,组间差异越来越大这一特点的量化表达。 silhouette函数的使用方法为:[S,h] = silhouette(X, idx),其中X为样本数据,idx为聚类结果,S为轮廓系数向量,h为轮廓系数图表句柄。轮廓系数向量中每个元素是对应样本的轮廓系数,其值越接近于1表示该样本越合适属于当前类别,越接近于-1表示该样本越应该划归于其它类别,而越接近于0则表明该样本在两个聚类中均没有明显优势,需要权衡。 silhouette函数除此之外还支持其它参数的设置,例如“distance”表示距离计算方法,“algorithm”表示聚类算法,“Replicates”表示重复聚类次数等等。通过这些参数的设置,可以进一步调整聚类结果以及轮廓系数的计算方式来达到更好的聚类效果。 在实际应用中,可以通过轮廓系数来确定最优聚类数,一般情况下,轮廓系数较高的聚类结果更可靠,但需要注意的是,该指标仅适用于欧几里得距离下平面数据的聚类,对于高维数据的聚类以及非欧氏距离下的聚类,可能需要采用其它指标来评估聚类效果。 ### 回答3: matlabsilhouette函数是一种用于计算聚类质量的算法,它可以帮助我们判断聚类结果的好坏,通过silhouette函数计算的值可以帮助我们确定最佳聚类数。 核心思想是对于每个数据点,通过计算其与同簇其他点的距离(dissimilarities)和与其他簇的距离(dissimilarities),得到其轮廓系数(silhouette coefficient)。 它涉及的基本公式是: s(i)=\frac{b(i)-a(i)}{max(a(i),b(i))} 其中,a(i)表示同簇其他点的平均距离(歧义度),b(i)表示与其他簇的最小平均距离(紧密度)。 轮廓系数与1接近,则说明聚类结果较好; 轮廓系数越接近-1,说明聚类结果较差。 通过silhouette函数的输出结果,我们可以有对象的可视化工具,例如silhouette plot,来比较不同算法、不同参数的效果。这样可以为我们的聚类分析选择最合适的参数,从而得到高质量的聚类结果,为数据分析提供有效的支持。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值