EM is typically used to compute maximum likelihood estimates given incomplete samples. The EM algorithm estimates the parameters of a model iteratively.
Starting from some initial guess, each iteration consists of
- an E step (Expectation step)
- an M step (Maximization step)
EM Applications
- Filling in missing data in samples
- Discovering the value of latent variables
- Estimating the parameters of HMMs
- Estimating parameters of finite mixtures
- Unsupervised learning of clusters
Silly Example
Let events be “grades in a class”:
event | likehood |
---|---|
w1 = Gets an A | P(A) = 1/2 |
w2 = Gets a B | P(A) = μ |
w3 = Gets a C | P(A) = 2μ |
w4 = Gets a D | P(A) = 1/2 - 3μ |
(Note 0 ≤ µ ≤1/6)
Assume we want to estimate µ from data. In a given class there were
a A’s
b B’s
c C’s
d D’s
What is the maximum likelihood estimate of µ given a,b,c,d ?
Trivial Statistics
P(a,b,c,d|μ)=(1/2)a(μ)b(2μ)c(1/2−3μ)d
logP(a,b,c,d|μ)=alog(1/2)+blog(μ)+clog(2μ)+dlog(1/2−3μ)
For max like μ, set ∂logP∂μ , gives max like μ=b+c6(b+c+d)
Same Problem with Hidden Information
Someone tells us that
Number of High grades (A’s + B’s) = h
Number of C’s = c
Number of D’s = d
What is the max. like estimate of µ now?
We can answer this question circularly:
EXPECTATION
If we know the value of µ we could compute the expected value of a and b
a:b=1/2:μ a=1/21/2+μh b=