K-means聚类算法第5关：组合已实现的函数完成K-means算法

畜牧当道

已于 2022-12-19 16:20:45 修改

阅读量1.7k

点赞数 5

分类专栏： K-means聚类算法文章标签：算法 kmeans

于 2022-12-19 15:49:52 首次发布

本文链接：https://blog.csdn.net/Online_Yan/article/details/128373571

版权

K-means聚类算法专栏收录该内容

7 篇文章 12 订阅

订阅专栏

本关任务

本关综合前面四个关卡的内容来实现K-means聚类算法。

编程任务

本关卡要求你完整如下代码块中星号圈出来的区域，实现K-means的核心算法步骤：

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
from distance import euclid_distance
from estimate import estimate_centers
from loss import acc
from near import nearest_cluster_center
#随机种子对聚类的效果会有影响，为了便于测试，固定随机数种子
np.random.seed(5)
#读入数据集
dataset = pd.read_csv('./data/iris.csv')
#取得样本特征矩阵
X = dataset[['150','4','setosa','versicolor']].as_matrix()
y = np.array(dataset['virginica'])
#读入数据
n_clusters, n_iteration = input().split(',')
n_clusters = int(n_clusters)#聚类中心个数
n_iteration = int(n_iteration)#迭代次数
#随机选择若干点作为聚类中心
point_index_lst = np.arange(len(y))
np.random.shuffle(point_index_lst)
cluster_centers = X[point_index_lst[:n_clusters]]
#开始算法流程
y_estimated = np.zeros(len(y))
# 请在此添加实现代码 #
#********** Begin *********#
#********** End ***********#
print('%.3f' % acc(y_estimated, y))

测试说明

平台将比对你的实现代码与正确结果的差异，结果正确则祝贺你完成了本实训。

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

from distance import euclid_distance
from estimate import estimate_centers
from loss import acc
from near import nearest_cluster_center
    
#随机种子对聚类的效果会有影响，为了便于测试，固定随机数种子
np.random.seed(5)

#读入数据集
dataset = pd.read_csv('./data/iris.csv')

#取得样本特征矩阵
X = dataset[['150','4','setosa','versicolor']].as_matrix()
y = np.array(dataset['virginica'])

#读入数据
n_clusters, n_iteration = input().split(',')
n_clusters = int(n_clusters)#聚类中心个数
n_iteration = int(n_iteration)#迭代次数

#随机选择若干点作为聚类中心
point_index_lst = np.arange(len(y))
np.random.shuffle(point_index_lst)
cluster_centers = X[point_index_lst[:n_clusters]]

#开始算法流程
y_estimated = np.zeros(len(y))
#   请在此添加实现代码     #
#********** Begin *********#
for iter in range(n_iteration):
    for xx_index in range(len(X)):
        #计算各个点最接近的聚类中心
        y_estimated[xx_index] = nearest_cluster_center(X[xx_index], cluster_centers)
    #计算各个聚类中心
    cluster_centers = estimate_centers(X, y_estimated, n_clusters)
#********** End ***********#
print('%.3f' % acc(y_estimated, y))