GMM 高斯混合模型聚类 Python实现

最新推荐文章于 2024-05-12 23:46:48 发布

原创

最新推荐文章于 2024-05-12 23:46:48 发布 · 1.4w 阅读

110 ·

CC 4.0 BY-SA版权

文章标签：

#GMM #高斯混合 #聚类

本文介绍了GMM（高斯混合模型）的聚类原理和Python实现，通过实例展示了如何进行EM迭代，并与sklearn库的GMM进行对比。同时，用随机数据验证了模型在不同类别数下的聚类效果，讨论了分类挑战和解决方案。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

原理也不多说了。。大致思路就是把数据建立成k个高斯分布，EM迭代N次。最后看每个点在哪个高斯分布的概率最高，就分到那个分布。

这里的computeGamma函数，用来算第i个簇的后验概率γji，用了2层循环，效率不高，本来想向量化的，搞了半天没搞出来。。干脆就先循环吧。。包括后面的fit方法也是一样。。用了很多循环。

X的shape是(n_samples,n_features)

mu的shape是(n_clusters,n_features)

sigma的shape是(n_clusters,n_features,n_features)

alpha的shape是(n_clusters,)

输出的gamma是(n_samples,n_clusters)

类方法fit里面，所有alpha初始为1/n_clusters，所有sigma协方差初始化为0.1，mu随机在样本里面选n_clusters个。

这里为了复现西瓜书的效果，所以指定mu为[[.403,.237],[.714,.346],[.532,.472]]。

import numpy as np
import matplotlib.pyplot as plt

def multiGaussian(x,mu,sigma):
    return 1/((2*np.pi)*pow(np.linalg.det(sigma),0.5))*np.exp(-0.5*(x-mu).dot(np.linalg.pinv(sigma)).dot((x-mu).T))

def computeGamma(X,mu,sigma,alpha,multiGaussian):
    n_samples=X.shape[0]
    n_clusters=len(alpha)
    gamma=np.zeros((n_samples,n_clusters))
    p=np.zeros(n_clusters)
    g=np.zeros(n_clust