python 实现简单的KMediod

haiya2001

已于 2023-06-21 16:26:54 修改

阅读量1.2k

点赞数

分类专栏： Python AI 文章标签： python 聚类机器学习

于 2023-04-04 13:55:58 首次发布

本文链接：https://blog.csdn.net/qq_43928549/article/details/129951019

版权

Python 同时被 2 个专栏收录

18 篇文章 1 订阅

订阅专栏

3 篇文章 0 订阅

订阅专栏

K-medoids 是 K-means 算法的一种改进算法，可以解决 K-means 中不稳定的问题，是一种基于聚类中心的距离度量方法，因此也被称为 partitioning around medoids （PAM）。

本篇博客将介绍 K-medoids 算法的原理和实现过程，并用 Python 实现一个 K-medoids 算法。

K-medoids 算法原理

K-medoids 算法是 K-means 算法的改进，和 K-means 算法相似，K-medoids 算法也是一种聚类算法，它的原理可以概括为以下几个步骤：

随机选择 $k$ 个数据点作为初始聚类中心；
对于每个数据点，计算其到每个聚类中心的距离，将其划分到距离最近的聚类中心的簇中；
对于每个簇，选择一个离该簇内其他点距离之和最小的点作为新的聚类中心；
重复步骤 2 和步骤 3 直到聚类中心不再变化或达到最大迭代次数。

K-medoids 算法的主要特点是，每个聚类中心都是数据集中实际存在的点，而不像 K-means 算法那样只是虚拟点，这样可以有效避免 K-means 算法中聚类中心跑偏的问题。

K-medoids 算法实现

下面用 Python 实现一个简单的 K-medoids 算法，并使用 NumPy 库计算欧氏距离。

import numpy as np

class KMediod():
    def __init__(self, data, k_num_center):
        self.k_num_center = k_num_center
        self.data = data

    def init_medoids(self,data, k):
        '''
        选取K个簇，返回K个数量中心点
        :param data:
        :param k:
        :return:
        '''
        n = len(data)
        medoids_idx = random.sample(range(n), k)
        return medoids_idx


    # 计算欧式距离
    def euclidean_distance(self,a, b):
        return np.linalg.norm(np.array(a) - np.array(b))

    def run(self):
        Center = self.init_medoids(self.data,self.k_num_center)

        classify_points = [[centroid] for centroid in Center]
        sample_target = []
        for i in range(len(self.data)):
            # 每条数据到所有中心点的距离
            distances = [self.euclidean_distance(i, centroid) for centroid in Center]
            # print(distances)
            cur_level = np.argmin(distances)
            # 每条数据对应的类别
            sample_target.append(cur_level)
            # 统计，方便迭代完成后重新计算中间点
            classify_points[cur_level].append(i)
        new_medoids =self.select_new_medoids(classify_points,self.euclidean_distance)
        return new_medoids,classify_points,sample_target

           # 选出新的簇中心点
    def select_new_medoids(self,classify_points,func_of_dis):
        new_medoids = []
        for points in classify_points:
            distances = [sum([func_of_dis(data[i], data[j]) for j in points]) for i in points]
            new_medoid_index = np.argmin(distances)
            new_medoids.append(points[new_medoid_index])
        return new_medoids

haiya2001

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python 实现简单的KMediod

K-medoids 是 K-means 算法的一种改进算法，可以解决 K-means 中不稳定的问题，是一种基于聚类中心的距离度量方法，因此也被称为 partitioning around medoids （PAM）。本篇博客将介绍 K-medoids 算法的原理和实现过程，并用 Python 实现一个 K-medoids 算法。
复制链接

扫一扫

专栏目录