第8章 聚类
1 Unsupervised Learning
- 无标签数据
- 聚类(clustering):点集,就是聚在一起的点组成的一个小部落
2 K-Means Algorithm(K-均值算法)
- iterate algorithm
- cluster assignment(簇分配)
- move centroid(移动聚类中心)
- Randomly initialize
K
K
K cluster centroids
μ
1
,
μ
2
,
⋅
⋅
⋅
,
μ
K
∈
R
n
\mu_1,\mu_2,···,\mu_K∈\mathbb{R}^n
μ1,μ2,⋅⋅⋅,μK∈Rn
Repeat{
(1) Cluster assignment step
for i = 1 i=1 i=1 to m m m
c ( i ) c^{(i)} c(i) := index ( from 1 to K K K ) of cluster centroid closest to x ( i ) x^{(i)} x(i)
c ( i ) = min k ∣ ∣ x ( i ) − μ k ∣ ∣ 2 c^{(i)}=\mathop{\text{min}}\limits_{k}{||x^{(i)}-\mu_k||}^2 c(i)=kmin∣∣x(i)−μk∣∣2
(2) Move centroid step
for k = 1 k=1 k=1 to K K K
μ k \mu_k μk := average ( mean ) of points assigned to cluster k k k
}
3 Optimization Objective 优化目标
c ( i ) c^{(i)} c(i) = index of cluster ( 1 , 2 , ⋅ ⋅ ⋅ , K 1,2,···,K 1,2,⋅⋅⋅,K) to which example x ( i ) x^{(i)} x(i) is currently assigned 当前样本 x ( i ) x^{(i)} x(i)所属的那个簇的索引或引号
μ k \mu_k μk = cluster centroid k k k ( μ k ∈ R n \mu_k∈\mathbb{R}^n μk∈Rn) 第 k k k个聚类中心的位置
μ c ( i ) \mu_{c^{(i)}} μc(i) = cluster centroid of cluster to which example x ( i ) x^{(i)} x(i) has been assigned x ( i ) x^{(i)} x(i)所属的那个簇的聚类中心
- Distortion Function 畸变函数:
min c ( i ) , ⋅ ⋅ ⋅ , c ( m ) μ 1 , ⋅ ⋅ ⋅ , μ K J ( c ( 1 ) , ⋅ ⋅ ⋅ , c ( m ) , μ 1 , ⋅ ⋅ ⋅ , μ K ) = 1 m ∑ i = 1 m ∣ ∣ x ( i ) − μ c ( i ) ∣ ∣ 2 \mathop{\text{min}}\limits_{\begin{aligned}c^{(i)},···,c^{(m)}\\\mu_1,···,\mu_K\ \ \end{aligned}} J(c^{(1)},···,c^{(m)},\mu_1,···,\mu_K)=\frac{1}{m}\sum_{i=1}^m{||x^{(i)}-\mu_{c^{(i)}}||}^2 c(i),⋅⋅⋅,c(m)μ1,⋅⋅⋅,μK minJ(c(1),⋅⋅⋅,c(m),μ1,⋅⋅⋅,μK)=m1i=1∑m∣∣x(i)−μc(i)∣∣2
4 Random Initialization 随机初始化
- Should have K < m K<m K<m 聚类中心点的个数 < < <所有训练集实例的数量
- Randomly pick K K K training examples 随机选择 K K K个训练实例
- Set μ 1 , ⋅ ⋅ ⋅ , μ K \mu_1,···,\mu_K μ1,⋅⋅⋅,μK equal to these K K K examples 令 K K K个聚类中心分别与这 K K K个训练实例相等
为了解决局部最小值问题,若 K K K较小,需要多次运行 K − 均 值 K-均值 K−均值算法,每一次都重新进行随机初始化,最后再比较多次运行的结果,选择代价函数最小的结果
- for i =1 to 100 {
Randomly initialize K-means.
Run K-means. Get c ( i ) , ⋅ ⋅ ⋅ , c ( i ) , ⋅ ⋅ ⋅ , μ 1 , ⋅ ⋅ ⋅ , μ K c^{(i)},···,c^{(i)},···,\mu_1,···,\mu_K c(i),⋅⋅⋅,c(i),⋅⋅⋅,μ1,⋅⋅⋅,μK
Compute cost function (distortion) J ( c ( 1 ) , ⋅ ⋅ ⋅ , c ( m ) , μ 1 , ⋅ ⋅ ⋅ , μ K ) J(c^{(1)},···,c^{(m)},\mu_1,···,\mu_K) J(c(1),⋅⋅⋅,c(m),μ1,⋅⋅⋅,μK)
}
Pick clustering that gave lowest cost J ( c ( 1 ) , ⋅ ⋅ ⋅ , c ( m ) , μ 1 , ⋅ ⋅ ⋅ , μ K ) J(c^{(1)},···,c^{(m)},\mu_1,···,\mu_K) J(c(1),⋅⋅⋅,c(m),μ1,⋅⋅⋅,μK)
5 Choosing the Number of Clusters
- Elbow method 肘部法则
6 Reference
吴恩达 机器学习 coursera machine learning
黄海广 机器学习笔记