EM算法进行GMM参数估计的Python实现

最新推荐文章于 2021-01-14 22:58:39 发布

swings_ss

最新推荐文章于 2021-01-14 22:58:39 发布

阅读量3.2k

点赞数 3

分类专栏：机器学习

本文链接：https://blog.csdn.net/zhwenx3/article/details/85268985

版权

本文通过Python实现了期望最大化(EM)算法来构建高斯混合模型(GMM)，以对TrainingData_GMM.csv文件中的四类数据进行建模。内容包括数据预处理、模型训练、聚类分配及对数似然函数值的变化分析。

摘要由CSDN通过智能技术生成

要求

Please build a Gaussian mixture model (GMM) to model the data in file TrainingData_GMM.csv. Note that the data is composed of 4 clusters, and the model should be trained by expectation maximization (EM) algorithm
Based on the GMM learned above, assign each training data point into one of 4 different clusters

代码运行效果

原始数据
GMM参数估计
对数似然函数值随训练步数的变化情况

代码

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import math


def get_pdf(sample, mu, sigma):
    res = stats.multivariate_normal(mu, sigma).pdf(sample)
    return res


def get_log_likelihood(data, k, mu, sigma, gama):
    res = 0.0
    for i in range(len(data)):
        cur = 0.0
        for j in range(len(k)):
            cur += gama[j][i] * get_pdf(data[i], mu[j], sigma[j])
        res += math.log(cur)
    return res


def em(data, k, mu, sigma, steps=1000):
    num_gau = len(k)  # 高斯分布个数
    num_data = data.shape[0]  # 数据个数
    gama = np.zeros((num_gau, num_data))  # gama[j][i]表示第i个样本点来自第j个高斯模型的概率
    likelihood_record = [