#机器学习 (2) K-means 算法

最新推荐文章于 2024-07-19 15:45:46 发布

灯火君

最新推荐文章于 2024-07-19 15:45:46 发布

阅读量50

点赞数

分类专栏：机器学习文章标签：算法机器学习 kmeans

本文链接：https://blog.csdn.net/2301_78350263/article/details/131037789

版权

机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

K-means 算法属于无监督学习，用于将数据集聚类。

算法步骤如下：

选择K个初始聚类中心：从数据集中随机选择K个数据点作为初始聚类中心，或者使用其他启发式方法进行初始化。
分配数据点到最近的聚类中心：对于每个数据点，计算其与每个聚类中心之间的距离，并将其分配给距离最近的聚类中心。
更新聚类中心：对于每个聚类，计算其所有分配给它的数据点的平均值，并将该平均值作为新的聚类中心。
重复步骤2和3，直到满足终止条件：终止条件可以是达到最大迭代次数或者聚类中心不再发生显著变化。
输出聚类结果：聚类结果即为每个数据点所属的最终聚类中心。

代码实现：

import numpy as np
import matplotlib.pyplot as plt
import math


def printpoint_add(tpPoints, size, color='b', marker=None):
    plt.scatter(*zip(*[[point[0], point[1], size, color]
                       for point in tpPoints]),
                marker=marker)


def printpoint_show():
    plt.show()


def randpoint(size=0):
    if size == 0:
        return (np.random.randint(0, 10) + np.random.rand(),
                np.random.randint(0, 10) + np.random.rand())
    else:
        point = list()
        for _ in range(size):
            point.append(randpoint())
        return point


def distance(point, centroid):
    return math.sqrt((point[0] - centroid[0])**2 + (point[1] - centroid[1])**2)


countCluster = np.random.randint(2, 4)  #簇总数
listCentroid = randpoint(size=countCluster)  #质心点集
dicElement = dict()  #簇代号-->数据点集，实现基于簇序号将数据点聚类
listPoint = randpoint(size=np.random.randint(50, 150))  #全部数据点集
listColor = ['b', 'g', 'r', 'c', 'm', 'y', 'k', 'w']  #plt颜色列表

count = 100
while count:
    #每次循环都清空原有分类点，以便重新分类
    for i in range(countCluster):
        dicElement[i] = list()

    #以质心为中心，使数据点聚集成簇
    for point in listPoint:
        dist = math.inf
        index = 0

        #遍历质心，找到距离最近的质心序号(index)
        for i, centroid in enumerate(listCentroid):
            newdist = distance(point, centroid)
            if newdist < dist:
                dist = newdist
                index = i

        #数据点入簇
        dicElement[index].append(point)

    #以簇为单位，重新计算质心位置
    for i, points in dicElement.items():
        if (length := len(points)) == 0: continue
        x, y = 0, 0
        for point in points:
            x += point[0]
            y += point[1]
        x /= length
        y /= length
        listCentroid[i] = (x, y)
    count -= 1

for i, centroid in enumerate(listCentroid):
    printpoint_add([centroid], 120, listColor[i], '*')
    printpoint_add(dicElement[i], 20, listColor[i])

print('聚类后质心位置：', listCentroid)
printpoint_show()

运行结果：

灯火君

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
#机器学习 (2) K-means 算法

分配数据点到最近的聚类中心：对于每个数据点，计算其与每个聚类中心之间的距离，并将其分配给距离最近的聚类中心。选择K个初始聚类中心：从数据集中随机选择K个数据点作为初始聚类中心，或者使用其他启发式方法进行初始化。更新聚类中心：对于每个聚类，计算其所有分配给它的数据点的平均值，并将该平均值作为新的聚类中心。重复步骤2和3，直到满足终止条件：终止条件可以是达到最大迭代次数或者聚类中心不再发生显著变化。输出聚类结果：聚类结果即为每个数据点所属的最终聚类中心。K-means 算法属于无监督学习，用于将数据集聚类。
复制链接

扫一扫