聚类2

3.2

The Actual K-Means Algorithm

In this section we want to study the actual K-means algorithm. In

particular, we want to investigate when and how it gets stuck in dif-

ferent local optima. The general insight is that even though, from an

algorithmic point of view, it is an annoying property of the K-means

algorithm that it can get stuck in dierent local optima, this property

might actually help us for the purpose of model selection. We now want

to focus on the eect of the random initialization of the K-means algo-

rithm. For simplicity, we ignore sampling artifacts and assume that we

always work with “infinitely many” data points; that is, we work on

the underlying distribution directly.

    The following observation is the key to our analysis. Assume we are

given a data set with Ktrue well-separated clusters, and assume that

we initialize the K-means algorithm with Kinit Ktrue initial centers.

The key observation is that if there is at least one initial center in each

of the underlying clusters, then the initial centers tend to stay in the

clusters they had been placed in. This means that during the course

of the K-means algorithm, cluster centers are only re-adjusted within

the underlying clusters and do not move between them. If this prop-

erty is true, then the final clustering result is essentially determined by

the number of initial centers in each of the true clusters. In particular,

if we call the number of initial centers per cluster the initial config-

uration, one can say that each initial configuration leads to a unique


258

(a)

Stability Analysis of the K-Means Algorithm

(b)

Fig. 3.3 Dierent initial configurations and the corresponding outcomes of the K-means

algorithm. Figure a: the two boxes in the top row depict a data set with three clusters and

four initial centers. Both boxes show dierent realizations of the same initial configuration.

As can be seen in the bottom, both initializations lead to the same K-means clustering.

Figure b: here the initial configuration is dierent from the one in Figure a, which leads to

a dierentK-means clustering.

clustering, and dierent configurations lead to dierent clusterings; see

Figure 3.3 for an illustration. Thus, if the initialization method used

in K-means regularly leads to dierent initial configurations, then we

observe instability.

    In [9], the first results in this direction were proved. They are still

preliminary in the sense that so far, proofs only exist for a simple

setting. However, we believe that the results also hold in a more general

context.

Theorem 3.6 (Stability of the actual K-means algorithm).

Assume that the underlying distribution P is a mixture of two

well-separated Gaussians on R. Denote the means of the Gaussians

by µ1 and µ2 .

(1) Assume that we run the K-means algorithm with K = 2 and

    that we use an initialization scheme that places one initial

    center in each of the true clusters (with high probability).

    Then the K-means algorithm is stable in the sense that with

    high probability, it terminates in a solution with one center

    close to µ1 and one center close to µ2 .

(2) Assume that we run the K-means algorithm with K = 3 and

    that we use an initialization scheme that places at least one


3.2 The Actual K-Means Algorithm

259

Fig. 3.4 Stable regions used in the proof of Theorem 3.6. See text for details.

of the initial centers in each of the true clusters (with high

probability). Then the K-means algorithm is instable in the

sense that with probability close to 0.5 it terminates in a

solution that considers the first Gaussian as cluster, but splits

the second Gaussian into two clusters; and with probability

close to 0.5 it does it the other way round.

    Proof idea. The idea of this proof is best described with Figure 3.4.

In the case of Kinit = 2 one has to prove that if the one center lies

in a large region around µ1 and the second center in a similar region

around µ2 , then the next step of K-means does not move the cen-

ters out of their regions (in Figure 3.4, these regions are indicated

by the black bars). If this is true, and if we know that there is one

initial center in each of the regions, the same is true when the algo-

rithm stops. Similarly, in the case of Kinit = 3, one proves that if there

are two initial centers in the first region and one initial center in the

second region, then all centers stay in their regions in one step of

K-means.

    All that is left to do now is to find an initialization scheme that

satisfies the conditions in Theorem 3.6. Luckily, we can adapt a scheme

that has already been used in Dasgupta and Schulman [10]. For simplic-

ity, assume that all clusters have similar weights (for the general case

see [9]), and that we want to select K initial centers for the K-means

algorithm. Then the following initialization should be used:

Initialization (I):

 (1) Select L preliminary centers uniformly at random from the

     given data set, where L K log(K).


260

Stability Analysis of the K-Means Algorithm

(2) Run one step of K-means, that is assign the data points to

    the preliminary centers and re-adjust the centers once.

(3) Remove all centers for which the mass of the assigned data

    points is smaller than p0 1/L.

(4) Among the remaining centers, select K centers by the

    following procedure:

(a) Choose the first center uniformly at random.

(b) Repeat until K centers are selected: Select the next

    center as the one that maximizes the minimum distance

    to the centers already selected.

   One can prove that this initialization scheme satisfies the conditions

needed in Theorem 3.6 (for exact details see [9]).

Theorem 3.7 (Initialization). Assume we are given a mixture of

Ktrue well-separated Gaussians in R, and denote the centers of the

Gaussians by µi . If we use the Initialization (I) to select Kinit centers,

then there exist Ktrue disjoint regions Ak with µk Ak , so that all Kinit

centers are contained in one of the Ak and

if Kinit = Ktrue , each Ak contains exactly one center,

if Kinit < Ktrue , each Ak contains at most one center,

if Kinit > Ktrue , each Ak contains at least one center.

    Proof sketch. The following statements can be proved to hold with

high probability. By selecting Ktrue log(Ktrue ) preliminary centers, each

of the Gaussians receives at least one of these centers. By running

one step of K-means and removing the centers with too small mass,

one removes all preliminary centers that sit on outliers. Moreover, one

can prove that “ambiguous centers” (that is, centers that sit between

two clusters) attract only few data points and will be removed as well.

Next one shows that centers that are “unambiguous” are reasonably

close to a true cluster center µk . Consequently, the method for selecting

the final center from the remaining preliminary ones “cycles though

dierent Gaussians” before visiting a particular Gaussian for the second

time.


3.2 The Actual K-Means Algorithm

261

    When combined, the results of Theorems 3.6 and 3.7 show that if

the data set contains Ktrue well-separated clusters, then the K-means

algorithm is stable if it is started with the true number of clusters, and

instable if the number of clusters is too large. Unfortunately, in the

case where K is too small one cannot make any useful statement about

stability because the aforementioned configuration argument does not

hold any more. In particular, initial cluster centers do not stay inside

their initial clusters, but move out of the clusters. Often, the final cen-

ters constructed by the K-means algorithm lie in between several true

clusters, and it is very hard to predict the final positions of the cen-

ters from the initial ones. This can be seen with the example shown in

Figure 3.5. We consider two data sets from a mixture of three Gaus-

sians. The only dierence between the two data sets is that in the

left plot all mixture components have the same weight, while in the

right plot the top right component has a larger weight than the other

two components. One can verify experimentally that if initialized with

Kinit = 2, the K-means algorithm is rather stable in the left figure (it

always merges the top two clusters). But it is instable in the right

figure (sometimes it merges the top clusters, sometimes the left two

clusters). This example illustrates that if the number of clusters is too

small, subtle dierences in the distribution can decide on stability or

instability of the actual K-means algorithm.

stable

5

0

−5

−10

0

10

5

0

−5

−10

0

10

instable

Fig. 3.5 Illustration for the case where K is too small. We consider two data sets that have

been drawn from a mixture of three Gaussians with means µ1 = (−5, −7), µ2 = (−5, 7),

µ3 = (5, 7) and unit variances. In the left figure, all clusters have the same weight 1/3,

whereas in the right figure the top right cluster has larger weight 0.6 than the other two

clusters with weights 0.2 each. If we run K-means with K = 2, we can verify experimentally

that the algorithm is pretty stable if applied to points from the distribution in the left

figure. It nearly always merges the top two clusters. On the distribution shown in the right

figure, however, the algorithm is instable. Sometimes the top two clusters are merged, and

sometimes the left two clusters.


262

Stability Analysis of the K-Means Algorithm

   In general, we expect that the following statements hold (but they

have not yet been proved in a context more general than in Theo-

rems 3.6 and 3.7).

Conjecture 3.8 (Stability of the actual K-means algorithm).

Assume that the underlying distribution has Ktrue well-separated

clusters, and that these clusters can be represented by a center-based

clustering model. Then, if one uses Initialization (I) to construct Kinit

initial centers, the following statements hold:

  If Kinit = Ktrue , we have one center per cluster, with high proba-

   bility. The clustering results are stable.

If Kinit > Ktrue , dierent initial configurations occur. By the above

 argument, dierent configurations lead to dierent clusterings, so

 we observe instability.

If Kinit < Ktrue , then depending on subtle dierences in the under-

 lying distribution we can have either stability or instability.

3.3

Relationships between the results

In this section we discuss conceptual aspects of the results and relate

them to each other.

3.3.1

Jittering versus Jumping

There are two main eects that lead to instability of the K-means

algorithm. Both eects are visualized in Figure 3.6.

    Jittering of the cluster boundaries. Consider a fixed local (or global)

               (∞)

optimum of QK and the corresponding clustering on dierent random

samples. Due to the fact that dierent samples lead to slightly dierent

positions of the cluster centers, the cluster boundaries “jitter”. That is,

the cluster boundaries corresponding to dierent samples are slightly

shifted with respect to one another. We call this behavior the “jittering”

of a particular clustering solution. For the special case of the global

optimum, this jittering has been investigated in Sections 3.1.2 and 3.1.3.

It has been established that dierent parameters K lead to dierent

amounts of jittering (measured in terms of rescaled instability). The


3.3 Relationships between the results

263

obj. function

space of solutions for k=2

Fig. 3.6 The x-axis depicts the space of all clusterings for a fixed distribution P and for

a fixed parameter K (this is an abstract sketch only). The y-axis shows the value of the

objective function of the dierent solutions. The solid line corresponds to the true limit

                      (∞)(∞)

objective function QK , the dotted lines show the sample-based function QK on dierent

samples. The idealized K-means algorithm only studies the jittering of the global optimum,

that is how far the global optimum varies due to the sampling process. The jumping between

dierent local optima is induced by dierent random initializations, as investigated for the

actual K-means algorithm.

jittering is larger if the cluster boundaries are in a high-density region

and smaller if the cluster boundaries are in low-density regions of the

space. The main “source” of jittering is the sampling variation.

    Jumping between dierent local optima. By “jumping” we refer to

the fact that an algorithm terminates in dierent local optima. Investi-

gating jumping has been the major goal in Section 3.2. The main source

of jumping is the random initialization. If we initialize the K-means

algorithm in dierent configurations, we end in dierent local optima.

The key point in favor of clustering stability is that one can relate the

                                (∞)

number of local optima of QK to whether the number K of clusters

is correct or too large (this has happened implicitly in Section 3.2).

3.3.2

Discussion of the Main Theorems

Theorem 3.1 works in the idealized setting. In Part 1 it shows that if

the underlying distribution is not symmetric, the idealized clustering

results are stable in the sense that dierent samples always lead to the

same clustering. That is, no jumping between dierent solutions takes

place. In hindsight, this result can be considered as an artifact of the

idealized clustering scenario. The idealized K-means algorithm artifi-

cially excludes the possibility of ending in dierent local optima. Unless


264

Stability Analysis of the K-Means Algorithm

there exist several global optima, jumping between dierent solutions

cannot happen. In particular, the conclusion that clustering results are

stable for all values of K does not carry over to the realistic K-means

algorithm (as can be seen from the results in Section 3.2). Put plainly,

even though the idealized K-means algorithm with K = 2 is stable in

the example of Figure 3.1a, the actual K-means algorithm is instable.

Part 2 of Theorem 3.1 states that if the objective function has several

global optima, for example due to symmetry, then jumping takes place

even for the idealized K-means algorithm and results in instability. In

the setting of the theorem, the jumping is merely induced by having

dierent random samples. However, a similar result can be shown to

hold for the actual K-means algorithm, where it is induced due to ran-

dom initialization. Namely, if the underlying distribution is perfectly

symmetric, then “symmetric initializations” lead to the dierent local

optima corresponding to the dierent symmetric solutions.

    To summarize, Theorem 3.1 investigates whether jumping between

dierent solutions takes place due to the random sampling process. The

negative connotation of Part 1 is an artifact of the idealized setting

that does not carry over to the actual K-means algorithm, whereas the

positive connotation of Part 2 does carry over.

    Theorem 3.2 studies how dierent samples aect the jittering of a

unique solution of the idealized K-means algorithm. In general, one

can expect that similar jittering takes place for the actual K-means

algorithm as well. In this sense, we believe that the results of this the-

orem can be carried over to the actual K-means algorithm. However, if

we reconsider the intuition stated in the introduction and depicted in

Figure 1.1, we realize that jittering was not really what we had been

looking for. The main intuition in the beginning was that the algo-

rithm might jump between dierent solutions, and that such jumping

shows that the underlying parameter K is wrong. In practice, stability

is usually computed for the actual K-means algorithm with random

initialization and on dierent samples. Here both eects (jittering and

jumping) and both random processes (random samples and random

initialization) play a role. We suspect that the eect of jumping to

dierent local optima due to dierent initialization has higher impact

on stability than the jittering of a particular solution due to sampling


3.3 Relationships between the results

265

variation. Our reason to believe so is that the distance between two

clusterings is usually higher if the two clusterings correspond to dier-

ent local optima than if they correspond to the same solution with a

slightly shifted boundary.

    To summarize, Theorem 3.2 describes the jittering behavior of an

individual solution of the idealized K-means algorithm. We believe that

similar eects take place for the actual K-means algorithm. However,

we also believe that the influence of jittering on stability plays a minor

role compared to the one of jumping.

    Theorem 3.6 investigates the jumping behavior of the actual

K-means algorithm. As the source of jumping, it considers the random

initialization only. It does not take into account variations due to ran-

dom samples (this is hidden in the proof, which works on the underlying

distribution rather than with finitely many sample points). However, we

believe that the results of this theorem also hold for finite samples. The-

orem 3.6 is not yet as general as we would like it to be. But we believe

that studying the jumping behavior of the actual K-means algorithm is

the key to understanding the stability of the K-means algorithm used

in practice, and Theorem 3.6 points in the right direction.

    Altogether, the results obtained in the idealized and realistic setting

perfectly complement each other and describe two sides of the same

coin. The idealized setting mainly studies what influence the dier-

ent samples can have on the stability of one particular solution. The

realistic setting focuses on how the random initialization makes the

algorithm jump between dierent local optima. In both settings, sta-

bility “pushes” in the same direction: If the number of clusters is too

large, results tend to be instable. If the number of clusters is correct,

results tend to be stable. If the number of clusters is too small, both

stability and instability can occur, depending on subtle properties of

the underlying distribution.


4

Beyond K-Means

Most of the theoretical results in the literature on clustering stability

have been proved with the K-means algorithm in mind. However, some

of them hold for more general clustering algorithms. This is mainly the

case for the idealized clustering setting.

    Assume a general clustering objective function Q and an ideal clus-

tering algorithm that globally minimizes this objective function. If this

clustering algorithm is consistent in the sense that the optimal clus-

tering on the finite sample converges to the optimal clustering of the

underlying space, then the results of Theorem 3.1 can be carried over

to this general objective function [4]. Namely, if the objective function

has a unique global optimum, the clustering algorithm is stable, and

it is instable if the algorithm has several global minima (for exam-

ple due to symmetry). It is not too surprising that one can extend

the stability results of the K-means algorithm to more general vector-

quantization-type algorithms. However, the setup of this theorem is so

general that it also holds for completely dierent algorithms such as

spectral clustering. The consistency requirement sounds like a rather

strong assumption. But note that clustering algorithms that are not

consistent are completely unreliable and should not be used anyway.

266


267

    Similarly as above, one can also generalize the characterization of

instable clusterings stated in Conclusion 3.3, cf. Bon-David and von

Luxburg [3]. Again we are dealing with algorithms that minimize an

objective function. The consistency requirements are slightly stronger

in that we need uniform consistency over the space (or a subspace)

of probability distributions. Once such uniform consistency holds, the

characterization that instable clusterings tend to have their boundary

in high-density regions of the space can be established.

    While the two results mentioned above can be carried over to a

huge bulk of clustering algorithms, it is not as simple for the refined

convergence analysis of Theorem 3.2. Here we need to make one cru-

cial additional assumption, namely the existence of a central limit type

result. This is a rather strong assumption which is not satisfied for many

clustering objective functions. However, a few results can be estab-

lished [24]: in addition to the traditional K-means objective function,

a central limit theorem can be proved for other variants of K-means

such as kernel K-means (a kernelized version of the traditional K-

means algorithm) or Bregman divergence clustering (where one selects

a set of centroids such that the average divergence between points and

centroids is minimized). Moreover, central limit theorems are known

for maximum likelihood estimators, which leads to stability results for

certain types of model-based clusterings using maximum likelihood esti-

mators. Still the results of Theorem 3.2 are limited to a small number

of clustering objective functions, and one cannot expect to be able to

extend them to a wide range of clustering algorithms.

    Even stronger limitations hold for the results about the actual K-

means algorithm. The methods used in Section 3.2 were particularly

designed for the K-means algorithm. It might be possible to extend

them to more general centroid-based algorithms, but it is not obvi-

ous how to advance further. In spite of this shortcoming, we believe

that these results hold in a much more general context of random-

ized clustering algorithms. From a high level point of view, the actual

K-means algorithm is a randomized algorithm due to its random ini-

tialization. The randomization is used to explore dierent local optima

of the objective function. There were two key insights in our stability

analysis of the actual K-means algorithm: First, we could describe the


268

Beyond K-Means

“regions of attraction” of dierent local minima, that is we could prove

which initial centers lead to which solution in the end (this was the con-

figurations idea). Second, we could relate the “size” of the regions of

attraction to the number of clusters. Namely, if the number of clusters

is correct, the global minimum will have a huge region of attraction in

the sense that it is very likely that we will end in the global minimum.

If the number of clusters is too large, we could show that there exist

several local optima with large regions of attraction. This leads to a

significant likelihood of ending in dierent local optima and observing

instability.

    We believe that similar arguments can be used to investigate stabil-

ity of other kinds of randomized clustering algorithms. However, such

an analysis always has to be adapted to the particular algorithm under

consideration. In particular, it is not obvious whether the number of

clusters can always be related to the number of large regions of attrac-

tion. Hence it is an open question whether results similar to the ones

for the actual K-means algorithm also hold for completely dierent

randomized clustering algorithms.


5

Outlook

Based on the results presented above one can draw a cautiously opti-

mistic picture about model selection based on clustering stability for

the K-means algorithm. Stability can discriminate between dierent

values of K, and the values of K that lead to stable results have desir-

able properties. If the data set contains a few well-separated clusters

that can be represented by a center-based clustering, then stability has

the potential to discover the correct number of clusters.

    An important point to stress is that stability-based model selec-

tion for the K-means algorithm can only lead to convincing results if

the underlying distribution can be represented by center-based clus-

ters. If the clusters are very elongated or have complicated shapes, the

K-means algorithm cannot find a good representation of this data set,

regardless what number K one uses. In this case, stability-based model

selection breaks down, too. It is a legitimate question what implications

this has in practice. We usually do not know whether a given data set

can be represented by center-based clusterings, and often the K-means

algorithm is used anyway. In my opinion, however, the question of

selecting the “correct” number of clusters is not so important in this

case. The only way in which complicated structure can be represented

269


270

Outlook

using K-means is to break each true cluster in several small, spherical

clusters and either live with the fact that the true clusters are split into

pieces, or use some mechanism to join these pieces afterwards to form a

bigger cluster of general shape. In such a scenario it is not so important

what number of clusters we use in the K-means step: it does not really

matter whether we split an underlying cluster into, say, 5 or 7 pieces.

    There are a few technical questions that deserve further considera-

tion. Obviously, the results in Section 3.2 are still somewhat preliminary

and should be worked out in more generality. The results in Section 3.1

are large sample results. It is not clear what “large sample size” means

in practice, and one can construct examples where the sample size

has to be arbitrarily large to make valid statements [3]. However, such

examples can either be countered by introducing assumptions on the

underlying probability distribution, or one can state that the sample

size has to be large enough to ensure that the cluster structure is well-

represented in the data and that we do not miss any clusters.

    There is yet another limitation that is more severe, namely the

number of clusters to which the results apply. The conclusions in Sec-

tion 3.1 as well as the results in Section 3.2 only hold if the true number

of clusters is relatively small (say, in the order of 10 rather than on

the order of 100), and if the parameter K used by K-means is in the

same order of magnitude. Let us briefly explain why this is the case.

In the idealized setting, the limit results in Theorems 3.1 and 3.2 of

course hold regardless of what the true number of clusters is. But the

subsequent interpretation regarding cluster boundaries in high and low

density areas breaks down if the number of clusters is too large. The

reason is that the influence of one tiny bit of cluster boundary between

two clusters is negligible compared to the rest of the cluster boundary

if there are many clusters, such that other factors might dominate the

behavior of clustering stability. In the realistic setting of Section 3.2,

we use an initialization scheme which, with high probability, places

centers in dierent clusters before placing them into the same cluster.

The procedure works well if the number of clusters is small. However,

the larger the number of clusters, the higher the likelihood to fail with

this scheme. Similarly problematic is the situation where the true num-

ber of clusters is small, but the K-means algorithm is run with a very


271

large K. Finally, note that similar limitations hold for all model selec-

tion criteria. It is simply a very dicult (and pretty useless) question

whether a data set contains 100 or 105 clusters, say.

    While stability is relatively well-studied for the K-means algorithm,

there does not exist much work on the stability of completely dierent

clustering mechanisms. We have seen in Section 4 that some of the

results for the idealized K-means algorithm also hold in a more general

context. However, this is not the case for the results about the actual

K-means algorithm. We consider the results about the actual K-means

algorithm as the strongest evidence in favor of stability-based model

selection for K-means. Whether this principle can be proved to work

well for algorithms very dierent from K-means is an open question.

    An important point we have not discussed in depth is how clustering

stability should be implemented in practice. As we have outlined in

Section 2 there exist many dierent protocols for computing stability

scores. It would be very important to compare and evaluate all these

approaches in practice, in particular as there are several unresolved

issues (such as the normalization). Unfortunately, a thorough study

that compares all dierent protocols in practice does not exist.


References

[1] S. Ben-David, “A framework for statistical clustering with constant time

    approximation algorithms for K-median and K-means clustering,” Machine

    Learning, vol. 66, pp. 243–257, 2007.

[2] S. Ben-David, D. P´l, and H.-U. Simon, “Stability of k-Means Clustering,” ina

    Conference on Learning Theory (COLT), (N. Bshouty and C. Gentile, eds.),

    pp. 20–34, Springer, 2007.

[3] S. Ben-David and U. von Luxburg, “Relating clustering stability to properties

    of cluster boundaries,” in Proceedings of the 21st Annual Conference on Learn-

    ing Theory (COLT), (R. Servedio and T. Zhang, eds.), pp. 379–390, Springer,

    Berlin, 2008.

[4] S. Ben-David, U. von Luxburg, and D. P´l, “A sober look on clustering stabil-a

    ity,” in Proceedings of the 19th Annual Conference on Learning Theory (COLT),

    (G. Lugosi and H. Simon, eds.), pp. 5–19, Springer, Berlin, 2006.

[5] A. Ben-Hur, A. Elissee, and I. Guyon, “A stability based method for dis-

    covering structure in clustered data,” in Pacific Symposium on Biocomputing,

    pp. 6–17, 2002.

[6] A. Bertoni and G. Valentini, “Model order selection for bio-molecular data

    clustering,” BMC Bioinformatics, vol. 8(Suppl 2):S7, 2007.

[7] A. Bertoni and G. Valentini, “Discovering multi-level structures in bio-

    molecular data through the Bernstein inequality,” BMC Bioinformatics,

    vol. 9(Suppl 2), 2008.

[8] M. Bittner, P. Meltzer, Y. Chen, Y. Jiang, E. Seftor, M. Hendrix, M. Rad-

    macher, R. Simon, Z. Yakhini, A. Ben-Dor, N. Sampas, E. Dougherty, E. Wang,

    F. Marincola, C. Gooden, J. Lueders, A. Glatfelter, P. Pollock, J. Carpten,

    E. Gillanders, D. Leja, K. Dietrich, C. Beaudry, M. Berens, D. Alberts,

272


References

273

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

V. Sondak, M. Hayward, and J. Trent, “Molecular classification of cutaneous

malignant melanoma by gene expression profiling,” Nature, vol. 406, pp. 536–

540, 2000.

S. Bubeck, M. Meila, and U. von Luxburg, “How the initialization aects

the stability of the k-means algorithm,” Draft, http://arxiv.org/abs/0907.5494,

2009.

S. Dasgupta and L. Schulman, “A probabilistic analysis of EM for mixtures of

separated, spherical gaussians,” JMLR, vol. 8, pp. 203–226, 2007.

B. Efron and R. Tibshirani, An Introduction to the Bootstrap. Chapman &

Hall, 1993.

J. Fridlyand and S. Dudoit, “Applications of resampling methods to estimate

the number of clusters and to improve the accuracy of a clustering method,”

Technical Report 600, Department of Statistics, University of California, Berke-

ley, 2001.

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning.

New York: Springer, 2001.

M. K. Kerr and G. A. Churchill, “Bootstrapping cluster analysis: Assessing the

reliability of conclusions from microarray experiments,” PNAS, vol. 98, no. 16,

pp. 8961–8965, 2001.

T. Lange, V. Roth, M. Braun, and J. Buhmann, “Stability-based validation of

clustering solutions,” Neural Computation, vol. 16, no. 6, pp. 1299–1323, 2004.

J. Lember, “On minimizing sequences for k-centres,” Journal of Approximation

Theory, vol. 120, pp. 20–35, 2003.

E. Levine and E. Domany, “Resampling method for unsupervised estimation

of cluster validity,” Neural Computation, vol. 13, no. 11, pp. 2573–2593, 2001.

M. Meila, “Comparing clusterings by the variation of information,” in Pro-

ceedings of the 16th Annual Conference on Computational Learning Theory

(COLT), (B. Sch¨lkopf and M. Warmuth, eds.), pp. 173–187, Springer, 2003.o

U. M¨ller and D. Radke, “A cluster validity approach based on nearest-neighboro

resampling,” in Proceedings of the 18th International Conference on Pattern

Recognition (ICPR), pp. 892–895, Washington, DC, USA: IEEE Computer

Society, 2006.

D. Pollard, “Strong consistency of k-means clustering,” Annals of Statistics,

vol. 9, no. 1, pp. 135–140, 1981.

D. Pollard, “A central limit theorem for k-means clustering,” Annals of Prob-

ability, vol. 10, no. 4, pp. 919–926, 1982.

O. Shamir and N. Tishby, “Cluster stability for finite samples,” in Advances

in Neural Information Processing Systems (NIPS) 21, (J. Platt, D. Koller,

Y. Singer, and S. Rowseis, eds.), Cambridge, MA: MIT Press, 2008.

O. Shamir and N. Tishby, “Model Selection and Stability in k-means clus-

tering,” in Proceedings of the 21rst Annual Conference on Learning Theory

(COLT), (R. Servedio and T. Zhang, eds.), 2008.

O. Shamir and N. Tishby, “On the reliability of clustering stability in the

large sample regime,” in Advances in Neural Information Processing Systems

21 (NIPS), (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.),

2009.


274

References

[25] M. Smolkin and D. Ghosh, “Cluster stability scores for microarray data in

     cancer studies,” BMC Bioinformatics, vol. 36, no. 4, 2003.

[26] A. Strehl and J. Ghosh, “Cluster ensembles — A knowledge reuse framework

     for combining multiple partitions,” JMLR, vol. 3, pp. 583–617, 2002.

[27] N. Vinh and J. Epps, “A novel approach for automatic number of clusters detec-

     tion in microarray data based on consensus clustering,” in Proceedings of the

     Ninth IEEE International Conference on Bioinformatics and Bioengineering,

     pp. 84–91, IEEE Computer Society, 2009.

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值