Clustering - Random initialization

本文介绍了K-均值聚类算法中的随机初始化过程及其重要性,强调了多轮随机初始化可以避免陷入局部最优解。通过随机选择训练样本作为初始聚类中心,作者说明了不同的初始化可能导致不同的收敛结果。当K值较小,多次初始化能提高找到全局最优解的概率;而当K值较大时,首次初始化可能已足够好。建议在实际应用中采用多次随机初始化来寻找更好的聚类结果。
摘要由CSDN通过智能技术生成

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第十四章《无监督学习》中第111课时《随机初始化》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助.
————————————————

In this video, I'd like to talk about how to initialize K-means. And more importantly, this will lead to a discussion of how to make K-means avoid local optima as well.

Here's the K-means clustering algorithm that we talked about earlier. One step that we never really talked much about was this step of how you randomly initialize the cluster centroids. There are a few different ways that one can use to randomly initialize the cluster centroids. But it turns out that there is one method that is much more recommended than most of the other options. So, let me tell you about that option since it's what often seems to work best.

Here's how I usually initialize my cluster centroids. When running K-means, you should have the number of cluster centroids, K, set to be less than the number of training examples m (K<m). It would be really weird to run K-means with a number of cluster centroids that's equal or greater than the number of examples you have. So, the way I usually initialize K-means is, I would randomly pick K training examples. So what I do is then set \mu _{1},..., \mu_{K} equal to these K examples. Let me show you a concrete example. Let's say K=2. So on this example on the right, let's say I want to find two clusters. So, what I'm going to do in order to initialize my cluster centroids is, I'm going to randomly pick a couple examples. So, let's say I pick this one and that one (the 2 arrows). The way I'm going to initialize my cluster centroids is I'm just going to initialize my cluster centroids to be right on top of those examples. So the red and blue crosses are my selected cluster centroids. And that's one random initialization of K-means. The one I drew looks like a particularly good one. Sometimes I might get less lucky and maybe I'll end up picking that as my first randome initial example, and that as my second one (bottom right figure). Here I'm picking two examples because K=2. So that's how you can randomly initialize your cluster centroids. And so at initialization, your first cluster centroid \mu _{1}=x^{(i)} for some random value of i and \mu _{2}=x^{(j)} for some different randomly chosen value of j and so on if you have more clusters and more cluster centroids. As sort of side comment, I should say that in the earlier video where I first illustrated K-means with the animation. In those slides, just for illustration, I actually used a different method of initialization for my cluster centroids. But the method described on this slide is really the recommended way. So by these two illustrations on the right, you might guess that K-means can end up converging to different solutions depending on exactly how the clusters were initialized. So depending on the random initialization, K-means can end up at different solutions.

And in particular, K-mean can actually end up at local optima. If you're given the data set like this. Well, looks like there're three clusters. So, if you run K-means and if it ends up at a good local optima this might be really the global optima, you might end up with that clustering. But if you had a particularly unlucky random initialization, K-means can also get stuck at different local optima. So, in this example on the left, it looks like this blue cluster has captured a lot of points on the left; and the red and green clusters each is capturing on the relatively small number of points. And so, this corresponds to a bad local optima because it has basically taken these two clusters and merged them into one, and furthermore, it has split the second cluster into two separate sub-clusters like so. So, both of these examples on the lower right correspond to different local optima of K-means and in this example here (lower right), the red cluster has captured only a single unlabeled example. And the term local optima, by the way, refers to local optima of this distortion function J, and what these solutions on the lower left, what these local optimia corresponds to is really solutions where K-means has got stuck to the local optima and is not doing a very good job minimizing this distortion function J. So, if you're worried about K-means getting stuck in local optima, if you want to increase the odds of K-means finding the best possible clustering, what we can do is try multiple random initializations. So, instead of just initializing K-means once and hoping that that works, what we can do is initialize K-means lots of times and run K-means lots of times and use that to try to make sure we get as good a solution, as good a global optima as possible.

Concretely, here's how you could do that. Let's say, I decide to run K-means a hundred times. So, I'll execute this loop a hundred times. And a fairy typical number of times to run K-means would be something from 50 up to maybe 1000. Let's say you decide to run K-means 100 times. So what I mean is we randomly initialize K-means. And for each of these one hundred random initializations, we would run K-means and that would give us a set of clusterings, and a set of cluster centroids. And then we would compute the distortion J, that is compute the cost function on the set of cluster assignments and the cluster centroids we got. Finally, having done this whole procedure a hundred times. You will have a hundred different ways of clustering the data, and finally what you do is out of these hundreds ways you have found to cluster the data, just pick one that gave us the lowest cost. And it turns out that if you are running K-means with a fairy small number of clusters, example from  2 to maybe 10, then doing multiple random initializations can sometimes make sure you find a better local optima. Make sure you find the better clustering data. But if K is very large like much greater than 10, if you're trying to find hundreds of clusters, then having multiple random initialization is less likely to make a huge difference and there's much higher chance that your first random initialization will give you pretty decent solution already and doing multiple random initializations will probably give you a slightly better solution but maybe not that much. But it's really in the regime of where you have a relatively small number of clusters, especially if you have maybe 2 or 3 or 4 clusters that random initialization could make a huge difference in terms of making sure you do a good job minimizing the distortion function and giving you a good clustering.

So, that's K-means with random initialization. If you're trying to learn a clustering with a relatively small number of clusters, using multiple random initializations can sometimes help you find much better clustering of the data. But if you're learning a large number of clusters, the random initialization method should give K-means a reasonable starting point to start from for finding a good set of clusters.

<end>

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值