机器学习读书笔记(十一)：Clustering Analysis

最新推荐文章于 2023-12-01 12:51:13 发布

VIP文章 Flying Squirrel

最新推荐文章于 2023-12-01 12:51:13 发布

阅读量712

点赞数 1

分类专栏：机器学习算法 python可视化文章标签：机器学习聚类聚类算法算法 python

本文链接：https://blog.csdn.net/weixin_45783752/article/details/104124574

版权

Clustering Analysis

K-means
K-means++
- Steps
Show
Soft clustering (FCM)
- Steps
Elbow method
Silhouette plots 轮廓
- Steps
hierarchical clustering
- Steps
dendrogram with heatmap
sklearn
DBSCAN
- steps

K-means

some basic words

centroid:质心
medoid：most representative or most frequently occurring points

Steps

Randomly pick $k$ centroids from the sample points as initial cluster centers
Assign each sample to the nearest centroid $\mu^{(j)},j \in \{ 1,...,k\}$
Move the centroids to the center of the samples that were assigned to it.
Repeat the step 2 and 3 until the cluster assignment do not change or a user-defined tolerance or a maximum number of iterations is reached.

SSE

Based on the Euclidean distance metric, we can describe the k-means algorithm as a imple optimization problem, an iterative approach for minimizing the within-cluster sum of squared errors(SSE), which is sometimes also called cluster inertia.
$d(x,y)^2 = \sum_{j=1} ^{m} (x_j -y_j)^2 = ||x - y||_2^2$

$\sum_{i=1}^{n}\sum_{j=1}^{k} w^{(i,j)} ||x^{(i)}-\mu^{(j)} || _2^2$
Here, $\mu^{(j)}$ is the representative point (centroid) for cluster $j$ , and $w^{(i,j)}$ =1 if the sample $x^{(i)}$ is in cluster j and $w^{(i,j)}$ =0 otherwise.

K-means++

Place the initial centroids far away from each other via the k-means++ algorithm.

Steps

Initialize an empty set $M$ to store the k centroids being selected.
Randomly choose the first centroid $\mu^j$ from the input samples and assign it to $M$ .
For each sample $x^{(i)}$ that is not in $M$ , find the minimum squared distance