EM Algorithm

I would like to talk something about EM algorithm in my understanding.

This post is mainly based on Richard Xu’s machine learning course.

Gaussian Mixture Model

Gaussian Mixture Model (GMM) (k-mixture) is defined as:

p(X|Θ)=l=1kαl(X|μl,Σl)(1)

l=1kαl=1(2)

and

Θ={α1,,αk,μ1,,μk,Σ1,,Σk}(3)

For data X={x1,,xn} , we introduce latent variable Z={z1,,zn} , each zi indicates which mixture components xi belongs to. (The introduction of latent variable should not change the marginal distribution of p(X) .)

Then we can use MLE to estimate Θ :

ΘMLE=argmaxΘ(i=1Nlog[l=1kαl(xi|μl,Σl)])(4)

This formula is difficult to solve because it is in ‘log-of-sum’ form. So, we solve this problem in an iterative way, called Expectation Maximization.

Expectation Maximization

Instead of performing

ΘMLE=argmaxΘ((Θ))=argmaxΘ(log(p(X|Θ)))(5)

we assume some latent variable Z to the model, such that we generate a series of Θ={Θ(1),Θ(2),,Θ(t)}.

For each iteration of the E-M algorithm, we perform:

Θ(g+1)=argmaxΘ(Zlog(p(X,Z|Θ)p(Z|X,Θ(g))))dZ(6)

We must ensure convergence:

logp(X|Θ(g+1))logp(X|Θ(g))(7)

Proof :
Ep(Z|X,Θ(g))[logp(X|Θ)]=Ep(Z|X,Θ(g))[logp(X,Z|Θ)logp(Z|X,Θ)](8)

logp(X|Θ)=Zlogp(X,Z|Θ)p(Z|X,Θ(g))dZZlogp(Z|X,Θ)p(Z|X,Θ(g))dZ(9)

denote

Q(Θ,Θ(g))=Zlogp(X,Z|Θ)p(Z|X,Θ(g))dZH(Θ,Θ(g))=Zlogp(Z|X,Θ)p(Z|X,Θ(g))dZ

then we have
logp(X|Θ)=Q(Θ,Θ(g))H(Θ,Θ(g))(10)

Because
Q(Θ(g),Θ(g))Q(Θ(g+1),Θ(g))H(Θ(g),Θ(g))H(Θ(g+1),Θ(g))

the second inequality can be derived using Jensen’s inequality.

Hence ,

logp(X|Θ(g+1))logp(X|Θ(g))(11)

Using EM algorithm to solve GMM

Put GMM into this frame work.

Θ(g+1)=argmaxΘ[Q(Θ,Θ(g))]=argmaxΘ(Zlog(p(X,Z|Θ)p(Z|X,Θ(g))))dZ(12)

E-Step:

Define p(X,Z|Θ) :

p(X,Z|Θ)=Πni=1p(xi,zi|Θ)=Πni=1p(xi|zi,Θ)p(zi|Θ)=Πni=1αzi(μzi,Σzi)(13)

Define p(Z|X,Θ) :
p(Z|X,Θ)=Πni=1p(zi|xi,Θ)=Πni=1αzi(μzi,Σzi)kl=1αl(μl,Σl)(14)

Then
Q(Θ,Θ(g))=z1=1kz2=1kzN=1k(i=1N[logαzi+log(μzi,Σzi)]ΠNi=1p(zi|xi,Θ(g)))=i=1Nl=1k(logαl+log(μl,Σl))p(l|xi,Θ(g))(15)

M-Step:
Q(Θ,Θ(g))=i=1Nl=1klog(αl)p(l|xi,Θ(g))+i=1Nl=1klog(μl,Σl)p(l|xi,Θ(g))(16)

The first term contains only α and the second term contains only μ,Σ , so we can maximize both terms independantly.

Maximizing α means that:

Ni=1kl=1log(αl)p(l|xi,Θ(g))α1αk=0(17)

subject to kl=1=1 .

Solving this problem via Lagrangian Multiplier, we have

αl=1Ni=1Np(l|xi,Θ(g))(18)

Similarly, we can solve μ and Σ .

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值