高斯混合模型做聚类

最新推荐文章于 2024-07-19 10:51:16 发布

薛定谔的智能

最新推荐文章于 2024-07-19 10:51:16 发布

阅读量5.3k

点赞数 1

分类专栏：机器学习文章标签：聚类

本文链接：https://blog.csdn.net/fanzonghao/article/details/85158359

版权

机器学习专栏收录该内容

12 篇文章 0 订阅

订阅专栏

概述

聚类算法大多数采用相似度来判断，而相似度又大多数采用欧式距离长短来衡量，而GMM采用了新的判断依据—–概率，即通过属于某一类的概率大小来判断最终的归属类别。
GMM的基本思想就是：任意形状的概率分布都可以用多个高斯分布函数去近似，也就是GMM就是有多个单高斯密度分布组成的，每一个Gaussian叫”Component”，线性的加成在一起就组成了GMM概率密度。

算法函数

n_components ：高斯模型的个数，即聚类的目标个数
covariance_type : 通过EM算法估算参数时使用的协方差类型，默认是”full”
full：每个模型使用自己的一般协方差矩阵
tied：所用模型共享一个一般协方差矩阵
diag：每个模型使用自己的对角线协方差矩阵
spherical：每个模型使用自己的单一方差

可与K-means聚类比较：

https://blog.csdn.net/fanzonghao/article/details/85045232

#coding:utf-8
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
import time
import os
import numpy as np
import matplotlib.pyplot as plt
data_path = "./Aggregation_cluster=7.txt"
# 导入数据
def load_data():
    points = pd.read_table(data_path, header=None)
    return points

def plotRes(data, clusterRes, clusterNum):
    """
    结果可视化
    :param data:样本集
    :param clusterRes:聚类结果
    :param clusterNum: 类个数
    :return:
    """
    nPoints = len(data)
    scatterColors = ['black', 'blue', 'green', 'yellow', 'red', 'purple', 'orange', 'brown']
    for i in range(clusterNum):
        color = scatterColors[i % len(scatterColors)]
        x1 = [];  y1 = []
        for j in range(nPoints):
            if clusterRes[j] == i:
                x1.append(data[j, 0])
                y1.append(data[j, 1])
        plt.scatter(x1, y1, c=color, alpha=1, marker='+')
    plt.show()
if __name__ == '__main__':
    n_cluster=7
    points=load_data()
    df = pd.DataFrame(points, index=None)
    X = df.iloc[:, :-1].values
    print(X)
    n = X.shape[0]
    print('========== Do clustering ==========')
    start_time = time.time()
    gmm = GaussianMixture(n_components=n_cluster, covariance_type='full')
    # gmm=KMeans(n_clusters=n_cluster)
    gmm.fit(X)

    y_pred = gmm.predict(X)
    end_time = time.time()
    print('--- {} s ---'.format(end_time - start_time))

    plotRes(X,y_pred,n_cluster)

n_cluster==3:

n_cluster==4:

n_cluster==5:

n_cluster==7:

由以上聚类图与kmeans比较可看出，比kmeans效果好。

薛定谔的智能

关注

1
点赞
踩
35

收藏

觉得还不错? 一键收藏
0
评论
高斯混合模型做聚类

概述聚类算法大多数采用相似度来判断，而相似度又大多数采用欧式距离长短来衡量，而GMM采用了新的判断依据—–概率，即通过属于某一类的概率大小来判断最终的归属类别。GMM的基本思想就是：任意形状的概率分布都可以用多个高斯分布函数去近似，也就是GMM就是有多个单高斯密度分布组成的，每一个Gaussian叫”Component”，线性的加成在一起就组成了GMM概率密度。算法函数n_comp...
复制链接

扫一扫

专栏目录