# 【机器学习】聚类分析（三）——高斯混合模型

1.我们使用EM算法来求解高斯混合模型的相关参数。算法流程如下：
Repeat until convergence {
E-step: For each i, j, set

M-step: Update parameters

}

2.EM算法求解GMM的简单推导

%% 导入数据

%% 初始化混合模型参数
K = 3;
% 随机初始化均值和协方差
means = randn(K,2);
for k = 1:K
covs(:,:,k) = rand*eye(2);
end
priors = repmat(1/K,1,K);   % 初始化，假设隐含变量服从先验均匀分布

%% 主算法
MaxIts = 100;   % 最大迭代次数
N = size(X,1);  % 样本数
q = zeros(N,K); % 后验概率
D = size(X,2);  % 维数
cols = {'r','g','b'};
plotpoints = [1:1:10,12:2:30 40 50];
B(1) = -inf;
converged = 0;
it = 0;
tol = 1e-2;
while 1
it = it + 1;
% 把乘除化为对数加减运算，防止乘积结果过于接近于0
for k = 1:K
const = -(D/2)*log(2*pi) - 0.5*log(det(covs(:,:,k)));
Xm = X - repmat(means(k,:),N,1);
temp(:,k) = const - 0.5 * diag(Xm*inv(covs(:,:,k))*Xm');
end

% 计算似然下界
if it > 1
B(it) = sum(sum(q.*log(repmat(priors,N,1)))) + sum(sum(q.*temp)) - sum(sum(q.*log(q)));
if abs(B(it)-B(it-1))<tol
converged = 1;

end
end
if converged == 1 || it > MaxIts
break
end
% 计算每个样本属于第k类的后验概率
temp = temp + repmat(priors,N,1);
q = exp(temp - repmat(max(temp,[],2),1,K));

q(q < 1e-60) = 1e-60;
q(q > (1-(1e-60))) = 1-(1e-60);
q = q./repmat(sum(q,2),1,K);
% 更新先验分布
priors = mean(q,1);
% 更新均值
for k = 1:K
means(k,:) = sum(X.*repmat(q(:,k),1,D),1)./sum(q(:,k));
end
% 更新方差
for k = 1:K
Xm = X - repmat(means(k,:),N,1);
covs(:,:,k) = (Xm.*repmat(q(:,k),1,D))'*Xm;
covs(:,:,k) = covs(:,:,k)./sum(q(:,k));
end
end

%% plot the data
figure(1);hold on;

plot(X(:,1),X(:,2),'ko');

for k = 1:K
plot_2D_gauss(means(k,:), covs(:,:,k), -2:0.1:5,-6:0.1:6);
end
ti = sprintf('After %g iterations',it);
title(ti)
%% 绘制似然下界迭代过程图
figure(2);hold off
plot(2:length(B),B(2:end),'k');
xlabel('Iterations');
ylabel('Bound');