1. K-Means Algorithm
- Randomly choose x x x points as centroids, i-th is μ i \mu_i μi
- Divide all points into x x x groups by determining the minimum distance they have from all x x x centroids
- Change the centroids into the average of each groups
- Repeat until all centroids do not change
2. Obtimization Objective of K-Means
Let c ( i ) c_{(i)} c(i) denote the group i-th point belongs to, then our task is
m
i
n
c
,
μ
J
(
c
,
μ
)
=
∑
(
x
(
i
)
−
μ
c
(
i
)
)
2
min_{c,\mu} \quad J(c,\mu)=\sum (x^{(i)}-\mu_{c_{(i)}})^2
minc,μJ(c,μ)=∑(x(i)−μc(i))2
3. Random Initialization
Randomly pick
k
k
k examples in which
k
k
k is the number of centroids
May be stuck in local optima: Init and Run K-Means for many times, pick the solution with lowest
J
J
J
4. Choose the Number of Clusters
Elbow method / Depending on later purpose