聚类分析完整指南：k-均值和层次聚类(演算与程序) (一)

最新推荐文章于 2024-04-23 21:41:53 发布

Y.S.Boston

最新推荐文章于 2024-04-23 21:41:53 发布

阅读量2.1k

点赞数

分类专栏：算法文章标签：算法聚类机器学习概率论

本文链接：https://blog.csdn.net/Y_Shen_Boston/article/details/107477109

版权

What is clustering analysis?# We create the points in Ra <- c(0, 0)b <- c(1, 0)c <- c(5, 5)X <- rbind(a, b, c) # a, b and c are combined per rowcolnames(X) <- c("x", "y") # rename columnsX # display the points

摘要由CSDN通过智能技术生成

什么是聚类分析?

聚类分析是探索性数据分析的一种形式，在这种分析中，观测数据被分成具有共同特征的不同组。

聚类分析（也称为分类）的目的是构造群（或类或群），同时确保以下性质：在一个群中观测值必须尽可能相似，而属于不同群的观测值必须尽可能不同。

主要有两种分类：

K-means clustering
Hierarchical clustering

第一种方法通常在预先确定类的数量时使用，而第二种方法通常用于未知数量的类，并帮助确定最佳数量。这两种方法在下面通过演算和R程序中的应用进行了说明。注意，对于层次聚类，本文只介绍了升序分类。

聚类算法利用距离将观测数据分成不同的组。因此，在深入介绍这两种分类方法之前，将介绍如何计算点之间距离的演算。

Application 1: Computing distances

存在一个数据集，包含点 $a = (0, 0)^{'}, b = (1, 0)^{'}$ 和 $c = (5, 5)^{'}$ . 计算点间欧式距离矩阵(matrix of Euclidean distances)。

Solution

# We create the points in R
a <- c(0, 0)
b <- c(1, 0)
c <- c(5, 5)

X <- rbind(a, b, c) # a, b and c are combined per row
colnames(X) <- c("x", "y") # rename columns

X # display the points

OUTPUT:

##   x y
## a 0 0
## b 1 0
## c 5 5

根据勾股定理(Pythagorean formula)，我们知道 $x_a, y_a)$ 和 $x_b, y_b)$ 之间的距离在 $\mathbb{R}^2$ 中是 $\sqrt{(x_a - x_b)^2 + (y_a - y_b)^2}$

最低0.47元/天解锁文章

Y.S.Boston

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
聚类分析完整指南：k-均值和层次聚类(演算与程序) (一)

What is clustering analysis?# We create the points in Ra <- c(0, 0)b <- c(1, 0)c <- c(5, 5)X <- rbind(a, b, c) # a, b and c are combined per rowcolnames(X) <- c("x", "y") # rename columnsX # display the points
复制链接

扫一扫