Clustering(聚类算法)
Unsupervised learning introduction
supervised learning problem in case is given a set of labels to fit a hypothesis to it.
K-means algorithm(K均值)
-
coherent subsets子集
-
coherent clusters簇
-
The first step is to randomly initialize two points ,called the cluster centroids because I want to group my data into two clusters ,it is an iterative algorithm and does two things, First is a cluster assignment step(簇分配),second is a move centroid step(移动聚类中心).Specifically,what I mean by that, is going to go through your data set and colo each of the point like the color of cluster centroids ,depending on whether it is closer to the cluster centroid
-
the inner loop of k means, is the move centroid step,and what we are going to do is to take the cluster centroids that is ,red and blue and we are going to move them to the average of the points colored the same colour .(computer the average of same color point,and move the centroids to there)
-
and then inner loop
Optimization objective
optimization objective of K-means
J
(
c
(
1
)
,
.
.
.
,
c
(
m
)
,
μ
1
,
.
.
.
.
,
μ
K
)
=
1
m
∑
i
=
1
m
∣
∣
x
(
i
)
−
μ
c
(
i
)
∣
∣
2
J(c^{(1)},...,c^{(m)},\mu_1,....,\mu_K)=\frac{1}{m}\displaystyle \sum^{m}_{i=1}{||x^{(i)}-\mu_c(i)||^2}
J(c(1),...,c(m),μ1,....,μK)=m1i=1∑m∣∣x(i)−μc(i)∣∣2
m
i
n
J
(
c
(
1
)
,
.
.
.
,
c
(
m
)
,
μ
1
,
.
.
.
.
,
μ
K
)
minJ(c^{(1)},...,c^{(m)},\mu_1,....,\mu_K)
minJ(c(1),...,c(m),μ1,....,μK)
- distortion cost function(失真代价函数or k均值算法的失真)
Random initialization
how to initialize K-mean
how to make K-means avoid local optima as well
Most number of random initialization maybe can make situation be better when the number of cluster is less.
Choosing the number of cluster
how to choose the number of cluster or how to choose the value of parameter capital K
choosing the value of K
Elbow method(肘部法则):Going down rapidly and then going down slowly after that. (the elbow of the curve)
sometime, you’re running K-means to get clusters to use for some later/downstream purpose. Evaluate K-means base on a metric for how well it performs for that later purpose.