【笔记】【机器学习基础】凝聚聚类

最新推荐文章于 2022-11-09 21:02:40 发布

'VeNus

最新推荐文章于 2022-11-09 21:02:40 发布

阅读量267

点赞数

分类专栏：读书笔记文章标签：聚类

本文链接：https://blog.csdn.net/qq_47809408/article/details/125121843

版权

读书笔记专栏收录该内容

82 篇文章 4 订阅

订阅专栏

二、凝聚聚类

凝聚聚类（agglomerative clustering）指的是许多基于相同原则构建的聚类算法。
原则：先声明每个点是自己的簇，然后合并两个最相似的簇，直到满足某种停止准则为止。
停止准则：簇的个数
因此相似的簇被合并，直到仅剩下指定个数的簇。
链接准则：规定如何度量“最相似的簇”。
这种度量总是定义在两个现有的簇之间

3种选项：
1、ward：ward为默认的选项。word挑选两个簇来合并，使得所有簇的方差增加最小。这通常会得到大小差不多相同的簇。
2、average：将簇中所有点之间平均距离最小的两个簇合并
3、complete（最大链接）：将簇中点之间最大距离最小的两个簇合并
（1）聚类三个簇

mglearn.plots.plot_agglomerative_algorithm()

在这里插入图片描述
最开始时，每个点各成一簇。然后在每一步中，距离相聚最近的两个簇被合并，直到只剩下3个簇。

（2）构造模型并得到训练集上簇的成员关系（fit_predict）

from sklearn.cluster import AgglomerativeClustering
X, y = make_blobs(random_state=1)

agg = AgglomerativeClustering(n_clusters=3)
assignment = agg.fit_predict(X)

mglearn.discrete_scatter(X[:, 0], X[:, 1], assignment)
plt.legend(["Cluster 0", "Cluster 1", "Cluster 2"], loc="best")
plt.xlabel("Feature 0")
plt.ylabel("Feature 1")

在这里插入图片描述
凝聚聚类方法为选择正确的个数提供了一些帮助，如下：

1、层次聚类与树状图

凝聚聚类生成了所谓的层次聚类（hierarchical clustering）。聚类过程迭代进行，每个点都从一个单点簇变为属于最终的某个簇。

（1）凝聚聚类生成层次化的簇分配以及带有编号的数据点

mglearn.plots.plot_agglomerative()

在这里插入图片描述
（2）绘制树状图（使用scipy）

from scipy.cluster.hierarchy import dendrogram, ward

X, y = make_blobs(random_state=0, n_samples=12)

linkage_array = ward(X)

dendrogram(linkage_array)

ax = plt.gca()
bounds = ax.get_xbound()
ax.plot(bounds, [7.25, 7.25], '--', c='k')
ax.plot(bounds, [4, 4], '--', c='k')

ax.text(bounds[1], 7.25, ' two clusters', va='center', fontdict={'size': 15})
ax.text(bounds[1], 4, ' three clusters', va='center', fontdict={'size': 15})
plt.xlabel("Sample index")
plt.ylabel("Cluster distance")