matlab中kmeans源代码,请教：分析MATLAB的kmeans源程序

最新推荐文章于 2023-03-02 16:36:04 发布

小知课代表

最新推荐文章于 2023-03-02 16:36:04 发布

阅读量861

点赞数

文章标签： matlab中kmeans源代码

tion [idx, C, sumD, D] = kmeans(X, k, varargin)

%KMEANS K-means clustering.

% IDX = KMEANS(X, K) partitions the points in the N-by-P data matrix X

% into K clusters. This partition minimizes the sum, over all clusters, of

% the within-cluster sums of point-to-cluster-centroid distances. Rows of X

% correspond to points, columns correspond to variables. Note: when X is a

% vector, KMEANS treats it as an N-by-1 data matrix, regardless of its

% orientation. KMEANS returns an N-by-1 vector IDX containing the cluster

% indices of each point. By default, KMEANS uses squared Euclidean

% distances.

% KMEANS treats NaNs as missing data, and ignores any rows of X that

% contain NaNs.

% [IDX, C] = KMEANS(X, K) returns the K cluster centroid locations in

% the K-by-P matrix C.

% [IDX, C, SUMD] = KMEANS(X, K) returns the within-cluster sums of

% point-to-centroid distances in the 1-by-K vector sumD.

% [IDX, C, SUMD, D] = KMEANS(X, K) returns distances from each point

% to every centroid in the N-by-K matrix D.

% [ ... ] = KMEANS(..., 'PARAM1',val1, 'PARAM2',val2, ...) specifies

% optional parameter name/value pairs to control the iterative algorithm

% used by KMEANS. Parameters are:

% 'Distance' - Distance measure, in P-dimensional space, that KMEANS

% should minimize with respect to. Choices are:

% 'sqEuclidean' - Squared Euclidean distance (the default)

% 'cityblock' - Sum of absolute differences, a.k.a. L1 distance

% 'cosine' - One minus the cosine of the included angle

% between points (treated as vectors)

% 'correlation' - One minus the sample correlation between points

% (treated as sequences of values)

% 'Hamming' - Percentage of bits that differ (only suitable

% for binary data)

% 'Start' - Method used to choose initial cluster centroid positions,

% sometimes known as "seeds". Choices are:

% 'sample' - Select K observations from X at random (the default)

% 'uniform' - Select K points uniformly at random from the range

% of X. Not valid for Hamming distance.

% 'cluster' - PeRForm preliminary clustering phase on random 10%

% subsample of X. This preliminary phase is itself

% initialized using 'sample'.

% matrix - A K-by-P matrix of starting locations. In this case,

% you can pass in [] for K, and KMEANS infers K from

% the first dimension of the matrix. You can also

% supply a 3D array, implying a value for 'Replicates'

% from the array's third dimension.

% 'Replicates' - Number of times to repeat the clustering, each with a

% new set of initial centroids. A positive integer, default is 1.

% 'EmptyAction' - Action to take if a cluster loses all of its member

% observations. Choices are:

% 'error' - Treat an empty cluster as an error (the default)

% 'drop' - Remove any clusters that become empty, and set

% the corresponding values in C and D to NaN.

% 'singleton' - Create a new cluster consisting of the one

% observation furthest from its centroid.

% 'Options' - Options for the iterative algorithm used to minimize the

% fitting criterion, as created by STATSET. Choices of STATSET

% parameters are:

% 'display' - Level of display output. Choices are 'off', (the

% default), 'iter', and 'final'.

% 'MaxIter' - Maximum number of iterations allowed. Default is 100.

% 'OnlinePhase' - Flag indicating whether KMEANS should perform an "on-line

% update" phase in addition to a "batch update" phase. The on-line phase

% can be time consuming for large data sets, but guarantees a solution

% that is a local minimum of the distance criterion, i.e., a partition of

% the data where moving any single point to a different cluster increases

% the total sum of distances. 'on' (the default) or 'off'.

% Example:

% X = [randn(20,2)+ones(20,2); randn(20,2)-ones(20,2)];

% opts = statset('Display','final');

% [cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...

% 'Replicates',5, 'Options',opts);

% plot(X(cidx==1,1),X(cidx==1,2),'r.', ...

% X(cidx==2,1),X(cidx==2,2),'b.', ctrs(:,1),ctrs(:,2),'kx');

% See also LINKAGE, CLUSTERDATA, SILHOUETTE.

% KMEANS uses a two-phase iterative algorithm to minimize the sum of

% point-to-centroid distances, summed over all K clusters. The first phase

% uses what the literature often describes as "batch" updates, where each

% iteration consists of reassigning points to their nearest cluster

% centroid, all at once, followed by recalculation of cluster centroids.

% This phase occasionally (especially for small data sets) does not converge

% to solution that is a local minimum, i.e., a partition of the data where

% moving any single point to a different cluster increases the total sum of

% distances. Thus, the batch phase be thought of as providing a fast but

% potentially only approximate solution as a starting point for the second

% phase. The second phase uses what the literature often describes as

% "on-line" updates, where points are individually reassigned if doing so

% will reduce the sum of distances, and cluster centroids are recomputed

% after each reassignment. Each iteration during this second phase consists

% of one pass though all the points. The on-line phase will converge to a

% local minimum, although there may be other local minima wi

最低0.47元/天解锁文章

小知课代表

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫