Maximum-Likelihood Expectation-Maximization (ML-EM)

I. Notations

X={x1,x2,...,xN} i.i.d. observed variables

Z={z1,z2,...,zN} latent variables

Θ(t)   The estimate of the parameters at iteration t

l(Θ)  The marginal log-likelihood logp(X|Θ)

II. Derivations

1. Maximum-Likelihood:

Θ̂ =argmaxΘlogp(X|Θ)=argmaxΘiNlogp(xi|Θ)=argmaxΘiNlogkKP(xi,z=k|Θ)

which is hard to compute with a gradient method.

l(Θ)=logp(X|Θ)=iNlogkKP(xi,z=k|Θ)=iNlogkKq(z=k | xi,Θ)P(xi,z=k|Θ)q(z=k | xi,Θ)

iNkKq(z=k | xi,Θ)logP(xi,z=k|Θ)q(z=k | xi,Θ)Q(q,Θ)

where q(z | x,Θ) is an arbitrary density over Z , and the inequality is given by Jessen’s inequality, i.e, Ef(x)f(E(x)) for convex function and Ef(x)f(E(x)) for concave function. Here f(x)=log(x) is a concave function.

2. Expectation-Maximization:

Thus we have the lower bound of target function l(Θ) . Instead of maximizing l(Θ) directly, EM maximizes the lower-bound Q(q,Θ) via coordinate ascent:

Estep:q(t+1)=argmaxqQ(q,Θt)

Mstep:Θ(t+1)=argmaxΘQ(q(t+1),Θ)

E-Step: compute q(t+1)=argmaxqQ(q,Θt) with constraint Kkq(z=k | x,Θ)=1 (arbitrary density function over Z ), by introducing the lagrange multiplier λ, we define

G(q)=λ(1kKq(z=k | x,Θ))+kKq(z=k | x,Θ)logP(x,z=k|Θ)kKq(z=k | x,Θ)logq(z=k | x,Θ)

G(q)q=λ+logP(x,z=k|Θ)logq(z=k | x,Θ)1=0

q(z=k | x,Θ)P(x,z=k | Θ)=P(x,z=k|Θ)KkP(x,z=k|Θ)=P(z=k|x,Θ)

thus, q=P(z|x,Θ) give the closest lower bound of l(Θ)

M-Step: update parameters θ , with

Θ(t+1)=argmaxΘiNkKP(z=k | xi,Θ(t))logP(xi,z=k|Θ)P(z=k | xi,Θ(t))

=argmaxΘiNkKP(z=k | xi,Θ(t))logP(xi,z=k|Θ)

III. Applications to Gaussian Mixture Models(GMMs)

For general mixture models, we have

P(x|Θ)=kKP(x,z=k|Θ)=kKP(z=k|Θ)P(x|z=k,Θ)

For Gaussian Mixture Models (GMMs), we have
P(x|Θ)=kKπk (x | μk,Σk)withkKπk=1

E-Step for GMMs:

Define qi,kP(z=k|xi,Θ) , then

qi,k=P(z=k,xi,|Θ)KkP(z=k,xi,|Θ)=πk (xi | μk,Σk)Kkπk (xi | μk,Σk)

M-Step for GMMs:
update the parameters which maximizes the Log-likelihood below

Θ(t+1)=argmaxΘiNkKqi,klogP(xi,z=k|Θ)=argmaxΘiNkKqi,klog(πk (xi | μk,Σk))

with subject to Kkπk=1 , introducing the lanrange multiplier into the objective function, we thus have
G(Θ,λ)=λ(1kKπk)+iNkKqi,klogπk+iNkKqi,klog(xi | μk,Σk)

G(Θ,λ)πk=0andkKπk=1πk=Niqi,kKkNiqi,k=Niqi,kN

G(Θ,λ)μk=0μk=Niqi,kxiNiqi,k

G(Θ,λ)Σk=0Σk=Niqi,k(xiμk)(xiμk)TNiqi,k

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值