EM Algorithm EM算法
EM algorithm is a iterative optimazation algorithm. EM is the abbreviation of expectation maximization. EM do not apply for all the optimization problems.
Remember! EM always converge but it can be a local optima.
EM applys for GMM, K-Means clustering, HMM, etc.
Jensen’s Inequality
convex function
f
,
f
′
′
>
0
f, f'' > 0
f,f′′>0; concave function
f
,
f
′
′
<
0
f, f''<0
f,f′′<0
E
[
f
(
x
)
]
≥
f
(
E
[
x
]
)
f
E[f(x)] \geq f(E[x]) \ \ \ \ \ f
E[f(x)]≥f(E[x]) f is convex
E
[
f
(
x
)
]
≤
f
(
E
[
x
]
)
f
E[f(x)] \leq f(E[x]) \ \ \ \ \ f
E[f(x)]≤f(E[x]) f is concave
If and only if
x
x
x is constant equal sign is taken.
图片来源
EM
{
z
i
}
i
=
1
∼
N
\{z_i\}_{i=1\sim N}
{zi}i=1∼N —
x
i
x_i
xi belongs to
(
z
i
)
t
h
(z_i)^{th}
(zi)th distribution
Q
i
(
z
i
)
Q_i(z_i)
Qi(zi) — the probability distribution of
z
i
z_i
zi
(You can also turn
z
i
z_i
zi into a vector and each dimension represents the probability.)
MLE:
max
E
(
θ
)
\max E(\theta)
maxE(θ)
E
(
θ
)
=
∏
i
=
1
N
P
(
x
i
∣
θ
)
=
∏
i
=
1
N
[
∑
z
i
P
(
x
i
,
z
i
∣
θ
)
]
E(\theta) = \prod\limits_{i=1}^N P(x_i|\theta) = \prod\limits_{i=1}^N [\sum\limits_{z_i}P(x_i, z_i | \theta)]
E(θ)=i=1∏NP(xi∣θ)=i=1∏N[zi∑P(xi,zi∣θ)]
l
(
θ
)
=
∑
i
=
1
N
log
[
∑
z
i
P
(
x
i
,
z
i
∣
θ
)
]
l(\theta) = \sum\limits_{i=1}^N\log[\sum\limits_{z_i} P(x_i,z_i|\theta)]
l(θ)=i=1∑Nlog[zi∑P(xi,zi∣θ)]
θ i \theta_i θi — the parameter of i t h i^{th} ith iteration
r a n d o m i z e θ 0 w h i l e ( ! c o n v e r g e ) { E − s t e p : Q i ( z i ) = P ( x i , z i ∣ θ k ) ∑ z i P ( x i , z i ∣ θ k ) M − s t e p : θ k + 1 = arg max θ ∑ i = 1 N ∑ z i Q i ( z i ) log P ( x i , z i ∣ θ ) Q i ( z i ) } randomize \ \theta_0 \\ while (!converge) \\ \{ \\ \ \ \ \ \ \ \ \ E-step: \ \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Q_i(z_i) = \frac{P(x_i, z_i|\theta_k)}{\sum\limits_{z_i} P(x_i,z_i|\theta_k)} \\ \ \ \ \ \ \ \ \ M-step: \ \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \theta_{k+1} = \arg\max\limits_{\theta}\sum\limits_{i=1}^N\sum\limits_{z_i} Q_i(z_i)\log \frac{P(x_i,z_i|\theta)}{Q_i(z_i)}\\ \} randomize θ0while(!converge){ E−step: Qi(zi)=zi∑P(xi,zi∣θk)P(xi,zi∣θk) M−step: θk+1=argθmaxi=1∑Nzi∑Qi(zi)logQi(zi)P(xi,zi∣θ)}
Jensen’s inequality applys in M-step and the prove of convergence. More details