Please build a Gaussian mixture model (GMM) to model the data in file TrainingData_GMM.csv. Note that the data is composed of 4 clusters, and the model should be trained by expectation maximization (EM) algorithm
Based on the GMM learned above, assign each training data point into one of 4 different clusters
代码运行效果
原始数据
GMM参数估计
对数似然函数值随训练步数的变化情况
代码
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import math
defget_pdf(sample, mu, sigma):
res = stats.multivariate_normal(mu, sigma).pdf(sample)return res
defget_log_likelihood(data, k, mu, sigma, gama):
res =0.0for i inrange(len(data)):
cur =0.0for j inrange(len(k)):
cur += gama[j][i]* get_pdf(data[i], mu[j], sigma[j])
res += math.log(cur)return res
defem(data, k, mu, sigma, steps=1000):
num_gau =len(k)# 高斯分布个数
num_data = data.shape[0]# 数据个数
gama = np.zeros((num_gau, num_data))# gama[j][i]表示第i个样本点来自第j个高斯模型的概率
likelihood_record =[