3.2 |
The Actual K-Means Algorithm |
In this section we want to study the actual K-means algorithm. In particular, we want to investigate when and how it gets stuck in dif- ferent local optima. The general insight is that even though, from an algorithmic point of view, it is an annoying property of the K-means algorithm that it can get stuck in different local optima, this property might actually help us for the purpose of model selection. We now want to focus on the effect of the random initialization of the K-means algo- rithm. For simplicity, we ignore sampling artifacts and assume that we always work with “infinitely many” data points; that is, we work on the underlying distribution directly. The following observation is the key to our analysis. Assume we are given a data set with Ktrue well-separated clusters, and assume that we initialize the K-means algorithm with Kinit ≥ Ktrue initial centers. The key observation is that if there is at least one initial center in each of the underlying clusters, then the initial centers tend to stay in the clusters they had been placed in. This means that during the course of the K-means algorithm, cluster centers are only re-adjusted within the underlying clusters and do not move between them. If this prop- erty is true, then the final clustering result is essentially determined by the number of initial centers in each of the true clusters. In particular, if we call the number of initial centers per cluster the initial config- uration, one can say that each initial configuration leads to a unique |
258 |
(a) |
Stability Analysis of the K-Means Algorithm |
(b) |
Fig. 3.3 Different initial configurations and the corresponding outcomes of the K-means algorithm. Figure a: the two boxes in the top row depict a data set with three clusters and four initial centers. Both boxes show different realizations of the same initial configuration. As can be seen in the bottom, both initializations lead to the same K-means clustering. Figure b: here the initial configuration is different from the one in Figure a, which leads to a differentK-means clustering. |
clustering, and different configurations lead to different clusterings; see Figure 3.3 for an illustration. Thus, if the initialization method used in K-means regularly leads to different initial configurations, then we observe instability. In [9], the first results in this direction were proved. They are still preliminary in the sense that so far, proofs only exist for a simple setting. However, we believe that the results also hold in a more general context. |
Theorem 3.6 (Stability of the actual K-means algorithm). Assume that the underlying distribution P is a mixture of two well-separated Gaussians on R. Denote the means of the Gaussians by µ1 and µ2 . |
(1) Assume that we run the K-means algorithm with K = 2 and that we use an initialization scheme that places one initial center in each of the true clusters (with high probability). Then the K-means algorithm is stable in the sense that with high probability, it terminates in a solution with one center close to µ1 and one center close to µ2 . (2) Assume that we run the K-means algorithm with K = 3 and that we use an initialization scheme that places at least one |
3.2 The Actual K-Means Algorithm |
259 |
Fig. 3.4 Stable regions used in the proof of Theorem 3.6. See text for details. |
of the initial centers in each of the true clusters (with high probability). Then the K-means algorithm is instable in the sense that with probability close to 0.5 it terminates in a solution that considers the first Gaussian as cluster, but splits the second Gaussian into two clusters; and with probability close to 0.5 it does it the other way round. |
Proof idea. The idea of this proof is best described with Figure 3.4. In the case of Kinit = 2 one has to prove that if the one center lies in a large region around µ1 and the second center in a similar region around µ2 , then the next step of K-means does not move the cen- ters out of their regions (in Figure 3.4, these regions are indicated by the black bars). If this is true, and if we know that there is one initial center in each of the regions, the same is true when the algo- rithm stops. Similarly, in the case of Kinit = 3, one proves that if there are two initial centers in the first region and one initial center in the second region, then all centers stay in their regions in one step of K-means. |
All that is left to do now is to find an initialization scheme that satisfies the conditions in Theorem 3.6. Luckily, we can adapt a scheme that has already been used in Dasgupta and Schulman [10]. For simplic- ity, assume that all clusters have similar weights (for the general case see [9]), and that we want to select K initial centers for the K-means algorithm. Then the following initialization should be used: |
Initialization (I): (1) Select L preliminary centers uniformly at random from the given data set, where L ≈ K log(K). |
260 |
Stability Analysis of the K-Means Algorithm |
(2) Run one step of K-means, that is assign the data points to the preliminary centers and re-adjust the centers once. |
(3) Remove all centers for which the mass of the assigned data points is smaller than p0 ≈ 1/L. |
(4) Among the remaining centers, select K centers by the following procedure: |
(a) Choose the first center uniformly at random. (b) Repeat until K centers are selected: Select the next center as the one that maximizes the minimum distance to the centers already selected. |
One can prove that this initialization scheme satisfies the conditions needed in Theorem 3.6 (for exact details see [9]). |
Theorem 3.7 (Initialization). Assume we are given a mixture of Ktrue well-separated Gaussians in R, and denote the centers of the Gaussians by µi . If we use the Initialization (I) to select Kinit centers, then there exist Ktrue disjoint regions Ak with µk ∈Ak , so that all Kinit centers are contained in one of the Ak and |
• if Kinit = Ktrue , each Ak contains exactly one center, • if Kinit < Ktrue , each Ak contains at most one center, • if Kinit > Ktrue , each Ak contains at least one center. |
Proof sketch. The following statements can be proved to hold with high probability. By selecting Ktrue log(Ktrue ) preliminary centers, each of the Gaussians receives at least one of these centers. By running one step of K-means and removing the centers with too small mass, one removes all preliminary centers that sit on outliers. Moreover, one can prove that “ambiguous centers” (that is, centers that sit between two clusters) attract only few data points and will be removed as well. Next one shows that centers that are “unambiguous” are reasonably close to a true cluster center µk . Consequently, the method for selecting the final center from the remaining preliminary ones “cycles though different Gaussians” before visiting a particular Gaussian for the second time. |
3.2 The Actual K-Means Algorithm |
261 |
When combined, the results of Theorems 3.6 and 3.7 show that if the data set contains Ktrue well-separated clusters, then the K-means algorithm is stable if it is started with the true number of clusters, and instable if the number of clusters is too large. Unfortunately, in the case where K is too small one cannot make any useful statement about stability because the aforementioned configuration argument does not hold any more. In particular, initial cluster centers do not stay inside their initial clusters, but move out of the clusters. Often, the final cen- ters constructed by the K-means algorithm lie in between several true clusters, and it is very hard to predict the final positions of the cen- ters from the initial ones. This can be seen with the example shown in Figure 3.5. We consider two data sets from a mixture of three Gaus- sians. The only difference between the two data sets is that in the left plot all mixture components have the same weight, while in the right plot the top right component has a larger weight than the other two components. One can verify experimentally that if initialized with Kinit = 2, the K-means algorithm is rather stable in the left figure (it always merges the top two clusters). But it is instable in the right figure (sometimes it merges the top clusters, sometimes the left two clusters). This example illustrates that if the number of clusters is too small, subtle differences in the distribution can decide on stability or instability of the actual K-means algorithm. |
stable |
5 |
0 |
−5 |
−10 |
0 |
10 |
5 |
0 |
−5 |
−10 |
0 |
10 |
instable |
Fig. 3.5 Illustration for the case where K is too small. We consider two data sets that have been drawn from a mixture of three Gaussians with means µ1 = (−5, −7), µ2 = (−5, 7), µ3 = (5, 7) and unit variances. In the left figure, all clusters have the same weight 1/3, whereas in the right figure the top right cluster has larger weight 0.6 than the other two clusters with weights 0.2 each. If we run K-means with K = 2, we can verify experimentally that the algorithm is pretty stable if applied to points from the distribution in the left figure. It nearly always merges the top two clusters. On the distribution shown in the right figure, however, the algorithm is instable. Sometimes the top two clusters are merged, and sometimes the left two clusters. |
262 |
Stability Analysis of the K-Means Algorithm |
In general, we expect that the following statements hold (but they have not yet been proved in a context more general than in Theo- rems 3.6 and 3.7). |
Conjecture 3.8 (Stability of the actual K-means algorithm). Assume that the underlying distribution has Ktrue well-separated clusters, and that these clusters can be represented by a center-based clustering model. Then, if one uses Initialization (I) to construct Kinit initial centers, the following statements hold: • If Kinit = Ktrue , we have one center per cluster, with high proba- bility. The clustering results are stable. |
• If Kinit > Ktrue , different initial configurations occur. By the above argument, different configurations lead to different clusterings, so we observe instability. |
• If Kinit < Ktrue , then depending on subtle differences in the under- lying distribution we can have either stability or instability. |
3.3 |
Relationships between the results |
In this section we discuss conceptual aspects of the results and relate them to each other. |
3.3.1 |
Jittering versus Jumping |
There are two main effects that lead to instability of the K-means algorithm. Both effects are visualized in Figure 3.6. Jittering of the cluster boundaries. Consider a fixed local (or global) (∞) optimum of QK and the corresponding clustering on different random samples. Due to the fact that different samples lead to slightly different positions of the cluster centers, the cluster boundaries “jitter”. That is, the cluster boundaries corresponding to different samples are slightly shifted with respect to one another. We call this behavior the “jittering” of a particular clustering solution. For the special case of the global optimum, this jittering has been investigated in Sections 3.1.2 and 3.1.3. It has been established that different parameters K lead to different amounts of jittering (measured in terms of rescaled instability). The |
3.3 Relationships between the results |
263 |
obj. function |
space of solutions for k=2 |
Fig. 3.6 The x-axis depicts the space of all clusterings for a fixed distribution P and for a fixed parameter K (this is an abstract sketch only). The y-axis shows the value of the objective function of the different solutions. The solid line corresponds to the true limit (∞)(∞) objective function QK , the dotted lines show the sample-based function QK on different samples. The idealized K-means algorithm only studies the jittering of the global optimum, that is how far the global optimum varies due to the sampling process. The jumping between different local optima is induced by different random initializations, as investigated for the actual K-means algorithm. |
jittering is larger if the cluster boundaries are in a high-density region and smaller if the cluster boundaries are in low-density regions of the space. The main “source” of jittering is the sampling variation. Jumping between different local optima. By “jumping” we refer to the fact that an algorithm terminates in different local optima. Investi- gating jumping has been the major goal in Section 3.2. The main source of jumping is the random initialization. If we initialize the K-means algorithm in different configurations, we end in different local optima. The key point in favor of clustering stability is that one can relate the (∞) number of local optima of QK to whether the number K of clusters is correct or too large (this has happened implicitly in Section 3.2). |
3.3.2 |
Discussion of the Main Theorems |
Theorem 3.1 works in the idealized setting. In Part 1 it shows that if the underlying distribution is not symmetric, the idealized clustering results are stable in the sense that different samples always lead to the same clustering. That is, no jumping between different solutions takes place. In hindsight, this result can be considered as an artifact of the idealized clustering scenario. The idealized K-means algorithm artifi- cially excludes the possibility of ending in different local optima. Unless |
264 |
Stability Analysis of the K-Means Algorithm |
there exist several global optima, jumping between different solutions cannot happen. In particular, the conclusion that clustering results are stable for all values of K does not carry over to the realistic K-means algorithm (as can be seen from the results in Section 3.2). Put plainly, even though the idealized K-means algorithm with K = 2 is stable in the example of Figure 3.1a, the actual K-means algorithm is instable. Part 2 of Theorem 3.1 states that if the objective function has several global optima, for example due to symmetry, then jumping takes place even for the idealized K-means algorithm and results in instability. In the setting of the theorem, the jumping is merely induced by having different random samples. However, a similar result can be shown to hold for the actual K-means algorithm, where it is induced due to ran- dom initialization. Namely, if the underlying distribution is perfectly symmetric, then “symmetric initializations” lead to the different local optima corresponding to the different symmetric solutions. To summarize, Theorem 3.1 investigates whether jumping between different solutions takes place due to the random sampling process. The negative connotation of Part 1 is an artifact of the idealized setting that does not carry over to the actual K-means algorithm, whereas the positive connotation of Part 2 does carry over. Theorem 3.2 studies how different samples affect the jittering of a unique solution of the idealized K-means algorithm. In general, one can expect that similar jittering takes place for the actual K-means algorithm as well. In this sense, we believe that the results of this the- orem can be carried over to the actual K-means algorithm. However, if we reconsider the intuition stated in the introduction and depicted in Figure 1.1, we realize that jittering was not really what we had been looking for. The main intuition in the beginning was that the algo- rithm might jump between different solutions, and that such jumping shows that the underlying parameter K is wrong. In practice, stability is usually computed for the actual K-means algorithm with random initialization and on different samples. Here both effects (jittering and jumping) and both random processes (random samples and random initialization) play a role. We suspect that the effect of jumping to different local optima due to different initialization has higher impact on stability than the jittering of a particular solution due to sampling |
3.3 Relationships between the results |
265 |
variation. Our reason to believe so is that the distance between two clusterings is usually higher if the two clusterings correspond to differ- ent local optima than if they correspond to the same solution with a slightly shifted boundary. To summarize, Theorem 3.2 describes the jittering behavior of an individual solution of the idealized K-means algorithm. We believe that similar effects take place for the actual K-means algorithm. However, we also believe that the influence of jittering on stability plays a minor role compared to the one of jumping. Theorem 3.6 investigates the jumping behavior of the actual K-means algorithm. As the source of jumping, it considers the random initialization only. It does not take into account variations due to ran- dom samples (this is hidden in the proof, which works on the underlying distribution rather than with finitely many sample points). However, we believe that the results of this theorem also hold for finite samples. The- orem 3.6 is not yet as general as we would like it to be. But we believe that studying the jumping behavior of the actual K-means algorithm is the key to understanding the stability of the K-means algorithm used in practice, and Theorem 3.6 points in the right direction. Altogether, the results obtained in the idealized and realistic setting perfectly complement each other and describe two sides of the same coin. The idealized setting mainly studies what influence the differ- ent samples can have on the stability of one particular solution. The realistic setting focuses on how the random initialization makes the algorithm jump between different local optima. In both settings, sta- bility “pushes” in the same direction: If the number of clusters is too large, results tend to be instable. If the number of clusters is correct, results tend to be stable. If the number of clusters is too small, both stability and instability can occur, depending on subtle properties of the underlying distribution. |
4 |
Beyond K-Means |
Most of the theoretical results in the literature on clustering stability have been proved with the K-means algorithm in mind. However, some of them hold for more general clustering algorithms. This is mainly the case for the idealized clustering setting. Assume a general clustering objective function Q and an ideal clus- tering algorithm that globally minimizes this objective function. If this clustering algorithm is consistent in the sense that the optimal clus- tering on the finite sample converges to the optimal clustering of the underlying space, then the results of Theorem 3.1 can be carried over to this general objective function [4]. Namely, if the objective function has a unique global optimum, the clustering algorithm is stable, and it is instable if the algorithm has several global minima (for exam- ple due to symmetry). It is not too surprising that one can extend the stability results of the K-means algorithm to more general vector- quantization-type algorithms. However, the setup of this theorem is so general that it also holds for completely different algorithms such as spectral clustering. The consistency requirement sounds like a rather strong assumption. But note that clustering algorithms that are not consistent are completely unreliable and should not be used anyway. |
266 |
267 |
Similarly as above, one can also generalize the characterization of instable clusterings stated in Conclusion 3.3, cf. Bon-David and von Luxburg [3]. Again we are dealing with algorithms that minimize an objective function. The consistency requirements are slightly stronger in that we need uniform consistency over the space (or a subspace) of probability distributions. Once such uniform consistency holds, the characterization that instable clusterings tend to have their boundary in high-density regions of the space can be established. While the two results mentioned above can be carried over to a huge bulk of clustering algorithms, it is not as simple for the refined convergence analysis of Theorem 3.2. Here we need to make one cru- cial additional assumption, namely the existence of a central limit type result. This is a rather strong assumption which is not satisfied for many clustering objective functions. However, a few results can be estab- lished [24]: in addition to the traditional K-means objective function, a central limit theorem can be proved for other variants of K-means such as kernel K-means (a kernelized version of the traditional K- means algorithm) or Bregman divergence clustering (where one selects a set of centroids such that the average divergence between points and centroids is minimized). Moreover, central limit theorems are known for maximum likelihood estimators, which leads to stability results for certain types of model-based clusterings using maximum likelihood esti- mators. Still the results of Theorem 3.2 are limited to a small number of clustering objective functions, and one cannot expect to be able to extend them to a wide range of clustering algorithms. Even stronger limitations hold for the results about the actual K- means algorithm. The methods used in Section 3.2 were particularly designed for the K-means algorithm. It might be possible to extend them to more general centroid-based algorithms, but it is not obvi- ous how to advance further. In spite of this shortcoming, we believe that these results hold in a much more general context of random- ized clustering algorithms. From a high level point of view, the actual K-means algorithm is a randomized algorithm due to its random ini- tialization. The randomization is used to explore different local optima of the objective function. There were two key insights in our stability analysis of the actual K-means algorithm: First, we could describe the |
268 |
Beyond K-Means |
“regions of attraction” of different local minima, that is we could prove which initial centers lead to which solution in the end (this was the con- figurations idea). Second, we could relate the “size” of the regions of attraction to the number of clusters. Namely, if the number of clusters is correct, the global minimum will have a huge region of attraction in the sense that it is very likely that we will end in the global minimum. If the number of clusters is too large, we could show that there exist several local optima with large regions of attraction. This leads to a significant likelihood of ending in different local optima and observing instability. We believe that similar arguments can be used to investigate stabil- ity of other kinds of randomized clustering algorithms. However, such an analysis always has to be adapted to the particular algorithm under consideration. In particular, it is not obvious whether the number of clusters can always be related to the number of large regions of attrac- tion. Hence it is an open question whether results similar to the ones for the actual K-means algorithm also hold for completely different randomized clustering algorithms. |
5 |
Outlook |
Based on the results presented above one can draw a cautiously opti- mistic picture about model selection based on clustering stability for the K-means algorithm. Stability can discriminate between different values of K, and the values of K that lead to stable results have desir- able properties. If the data set contains a few well-separated clusters that can be represented by a center-based clustering, then stability has the potential to discover the correct number of clusters. An important point to stress is that stability-based model selec- tion for the K-means algorithm can only lead to convincing results if the underlying distribution can be represented by center-based clus- ters. If the clusters are very elongated or have complicated shapes, the K-means algorithm cannot find a good representation of this data set, regardless what number K one uses. In this case, stability-based model selection breaks down, too. It is a legitimate question what implications this has in practice. We usually do not know whether a given data set can be represented by center-based clusterings, and often the K-means algorithm is used anyway. In my opinion, however, the question of selecting the “correct” number of clusters is not so important in this case. The only way in which complicated structure can be represented |
269 |
270 |
Outlook |
using K-means is to break each true cluster in several small, spherical clusters and either live with the fact that the true clusters are split into pieces, or use some mechanism to join these pieces afterwards to form a bigger cluster of general shape. In such a scenario it is not so important what number of clusters we use in the K-means step: it does not really matter whether we split an underlying cluster into, say, 5 or 7 pieces. There are a few technical questions that deserve further considera- tion. Obviously, the results in Section 3.2 are still somewhat preliminary and should be worked out in more generality. The results in Section 3.1 are large sample results. It is not clear what “large sample size” means in practice, and one can construct examples where the sample size has to be arbitrarily large to make valid statements [3]. However, such examples can either be countered by introducing assumptions on the underlying probability distribution, or one can state that the sample size has to be large enough to ensure that the cluster structure is well- represented in the data and that we do not miss any clusters. There is yet another limitation that is more severe, namely the number of clusters to which the results apply. The conclusions in Sec- tion 3.1 as well as the results in Section 3.2 only hold if the true number of clusters is relatively small (say, in the order of 10 rather than on the order of 100), and if the parameter K used by K-means is in the same order of magnitude. Let us briefly explain why this is the case. In the idealized setting, the limit results in Theorems 3.1 and 3.2 of course hold regardless of what the true number of clusters is. But the subsequent interpretation regarding cluster boundaries in high and low density areas breaks down if the number of clusters is too large. The reason is that the influence of one tiny bit of cluster boundary between two clusters is negligible compared to the rest of the cluster boundary if there are many clusters, such that other factors might dominate the behavior of clustering stability. In the realistic setting of Section 3.2, we use an initialization scheme which, with high probability, places centers in different clusters before placing them into the same cluster. The procedure works well if the number of clusters is small. However, the larger the number of clusters, the higher the likelihood to fail with this scheme. Similarly problematic is the situation where the true num- ber of clusters is small, but the K-means algorithm is run with a very |
271 |
large K. Finally, note that similar limitations hold for all model selec- tion criteria. It is simply a very difficult (and pretty useless) question whether a data set contains 100 or 105 clusters, say. While stability is relatively well-studied for the K-means algorithm, there does not exist much work on the stability of completely different clustering mechanisms. We have seen in Section 4 that some of the results for the idealized K-means algorithm also hold in a more general context. However, this is not the case for the results about the actual K-means algorithm. We consider the results about the actual K-means algorithm as the strongest evidence in favor of stability-based model selection for K-means. Whether this principle can be proved to work well for algorithms very different from K-means is an open question. An important point we have not discussed in depth is how clustering stability should be implemented in practice. As we have outlined in Section 2 there exist many different protocols for computing stability scores. It would be very important to compare and evaluate all these approaches in practice, in particular as there are several unresolved issues (such as the normalization). Unfortunately, a thorough study that compares all different protocols in practice does not exist. |
References |
[1] S. Ben-David, “A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering,” Machine Learning, vol. 66, pp. 243–257, 2007. [2] S. Ben-David, D. P´l, and H.-U. Simon, “Stability of k-Means Clustering,” ina Conference on Learning Theory (COLT), (N. Bshouty and C. Gentile, eds.), pp. 20–34, Springer, 2007. [3] S. Ben-David and U. von Luxburg, “Relating clustering stability to properties of cluster boundaries,” in Proceedings of the 21st Annual Conference on Learn- ing Theory (COLT), (R. Servedio and T. Zhang, eds.), pp. 379–390, Springer, Berlin, 2008. [4] S. Ben-David, U. von Luxburg, and D. P´l, “A sober look on clustering stabil-a ity,” in Proceedings of the 19th Annual Conference on Learning Theory (COLT), (G. Lugosi and H. Simon, eds.), pp. 5–19, Springer, Berlin, 2006. [5] A. Ben-Hur, A. Elisseeff, and I. Guyon, “A stability based method for dis- covering structure in clustered data,” in Pacific Symposium on Biocomputing, pp. 6–17, 2002. [6] A. Bertoni and G. Valentini, “Model order selection for bio-molecular data clustering,” BMC Bioinformatics, vol. 8(Suppl 2):S7, 2007. [7] A. Bertoni and G. Valentini, “Discovering multi-level structures in bio- molecular data through the Bernstein inequality,” BMC Bioinformatics, vol. 9(Suppl 2), 2008. [8] M. Bittner, P. Meltzer, Y. Chen, Y. Jiang, E. Seftor, M. Hendrix, M. Rad- macher, R. Simon, Z. Yakhini, A. Ben-Dor, N. Sampas, E. Dougherty, E. Wang, F. Marincola, C. Gooden, J. Lueders, A. Glatfelter, P. Pollock, J. Carpten, E. Gillanders, D. Leja, K. Dietrich, C. Beaudry, M. Berens, D. Alberts, |
272 |
References |
273 |
[9] |
[10] |
[11] |
[12] |
[13] |
[14] |
[15] |
[16] |
[17] |
[18] |
[19] |
[20] |
[21] |
[22] |
[23] |
[24] |
V. Sondak, M. Hayward, and J. Trent, “Molecular classification of cutaneous malignant melanoma by gene expression profiling,” Nature, vol. 406, pp. 536– 540, 2000. S. Bubeck, M. Meila, and U. von Luxburg, “How the initialization affects the stability of the k-means algorithm,” Draft, http://arxiv.org/abs/0907.5494, 2009. S. Dasgupta and L. Schulman, “A probabilistic analysis of EM for mixtures of separated, spherical gaussians,” JMLR, vol. 8, pp. 203–226, 2007. B. Efron and R. Tibshirani, An Introduction to the Bootstrap. Chapman & Hall, 1993. J. Fridlyand and S. Dudoit, “Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method,” Technical Report 600, Department of Statistics, University of California, Berke- ley, 2001. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. New York: Springer, 2001. M. K. Kerr and G. A. Churchill, “Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments,” PNAS, vol. 98, no. 16, pp. 8961–8965, 2001. T. Lange, V. Roth, M. Braun, and J. Buhmann, “Stability-based validation of clustering solutions,” Neural Computation, vol. 16, no. 6, pp. 1299–1323, 2004. J. Lember, “On minimizing sequences for k-centres,” Journal of Approximation Theory, vol. 120, pp. 20–35, 2003. E. Levine and E. Domany, “Resampling method for unsupervised estimation of cluster validity,” Neural Computation, vol. 13, no. 11, pp. 2573–2593, 2001. M. Meila, “Comparing clusterings by the variation of information,” in Pro- ceedings of the 16th Annual Conference on Computational Learning Theory (COLT), (B. Sch¨lkopf and M. Warmuth, eds.), pp. 173–187, Springer, 2003.o U. M¨ller and D. Radke, “A cluster validity approach based on nearest-neighboro resampling,” in Proceedings of the 18th International Conference on Pattern Recognition (ICPR), pp. 892–895, Washington, DC, USA: IEEE Computer Society, 2006. D. Pollard, “Strong consistency of k-means clustering,” Annals of Statistics, vol. 9, no. 1, pp. 135–140, 1981. D. Pollard, “A central limit theorem for k-means clustering,” Annals of Prob- ability, vol. 10, no. 4, pp. 919–926, 1982. O. Shamir and N. Tishby, “Cluster stability for finite samples,” in Advances in Neural Information Processing Systems (NIPS) 21, (J. Platt, D. Koller, Y. Singer, and S. Rowseis, eds.), Cambridge, MA: MIT Press, 2008. O. Shamir and N. Tishby, “Model Selection and Stability in k-means clus- tering,” in Proceedings of the 21rst Annual Conference on Learning Theory (COLT), (R. Servedio and T. Zhang, eds.), 2008. O. Shamir and N. Tishby, “On the reliability of clustering stability in the large sample regime,” in Advances in Neural Information Processing Systems 21 (NIPS), (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.), 2009. |
274 |
References |
[25] M. Smolkin and D. Ghosh, “Cluster stability scores for microarray data in cancer studies,” BMC Bioinformatics, vol. 36, no. 4, 2003. [26] A. Strehl and J. Ghosh, “Cluster ensembles — A knowledge reuse framework for combining multiple partitions,” JMLR, vol. 3, pp. 583–617, 2002. [27] N. Vinh and J. Epps, “A novel approach for automatic number of clusters detec- tion in microarray data based on consensus clustering,” in Proceedings of the Ninth IEEE International Conference on Bioinformatics and Bioengineering, pp. 84–91, IEEE Computer Society, 2009. |