I would like to talk something about EM algorithm in my understanding.
This post is mainly based on Richard Xu’s machine learning course.
Gaussian Mixture Model
Gaussian Mixture Model (GMM) (k-mixture) is defined as:
and
For data X={x1,…,xn} , we introduce latent variable Z={z1,…,zn} , each zi indicates which mixture components xi belongs to. (The introduction of latent variable should not change the marginal distribution of p(X) .)
Then we can use MLE to estimate
Θ
:
This formula is difficult to solve because it is in ‘log-of-sum’ form. So, we solve this problem in an iterative way, called Expectation Maximization.
Expectation Maximization
Instead of performing
we assume some latent variable Z to the model, such that we generate a series of
For each iteration of the E-M algorithm, we perform:
We must ensure convergence:
Proof :
denote
then we have
Because
the second inequality can be derived using Jensen’s inequality.
Hence ,
Using EM algorithm to solve GMM
Put GMM into this frame work.
E-Step:
Define
p(X,Z|Θ)
:
Define p(Z|X,Θ) :
Then
M-Step:
The first term contains only α and the second term contains only μ,Σ , so we can maximize both terms independantly.
Maximizing
α
means that:
subject to ∑kl=1=1 .
Solving this problem via Lagrangian Multiplier, we have
Similarly, we can solve μ and Σ .