高斯混合模型
假设数据产生过程如下:
z ∼ M u t ( π ) z \sim Mut(\pi) z∼Mut(π)
x ∣ z ∼ N ( μ z , σ z 2 ) x|z \sim N(\mu_z,\sigma_z^2) x∣z∼N(μz,σz2)
记 θ = { π , μ 1 , . . . , μ K , σ 1 , . . . , σ K } \theta = \{\pi,\mu_1,...,\mu_K,\sigma_1,...,\sigma_K\} θ={
π,μ1,...,μK,σ1,...,σK}为这个模型的所有参数
观察数据为 D = { x i } N D=\{x_i\}_N D={
xi}N
采用极大似然估计参数 θ \theta θ
l o g ( p ( D ; θ ) ) = ∑ i = 1 N l o g ( p ( x i ; θ ) ) log(p(D;\theta))=\sum_{i=1}^Nlog(p(x_i;\theta)) log(p(D;θ))=i=1∑Nlog(p(xi;θ))
EM算法
直接对上述结果进行优化,由于高斯分布的参数形式比较复杂,并不是那么好计算。
注意到
l o g ( p ( D ; θ ) ) = ∑ i = 1 N l o g ( p ( x i ; θ ) ) = ∑ i = 1 N l o g ( ∑ z i = 1 K p ( x i , z i ; θ ) ) = ∑ i = 1 N l o g ( ∑ z i = 1 K Q ( z i ) p ( x i , z i ; θ ) Q ( z i ) ) ≥ ∑ i = 1 N ∑ z i = 1 K Q ( z i ) l o g ( p ( x i , z i ; θ ) Q ( z i ) ) log(p(D;\theta))=\sum_{i=1}^Nlog(p(x_i;\theta))\newline =\sum_{i=1}^Nlog\left (\sum_{z_i=1}^K {p(x_i,z_i;\theta)} \right)\newline =\sum_{i=1}^Nlog\left (\sum_{z_i=1}^K Q(z_i)\frac{p(x_i,z_i;\theta)}{Q(z_i)} \right)\newline \geq \sum_{i=1}^N\sum_{z_i=1}^K Q(z_i)log \left (\frac{p(x_i,z_i;\theta)}{Q(z_i)} \right) log(p(D;θ))=i=1∑Nlog(p(xi;θ))=i=1∑Nlog(zi=1∑Kp(xi,zi;θ))=i=1∑Nlog(zi=1∑KQ(zi)Q(zi)p(xi,zi;θ))≥i=1∑Nzi=1∑KQ(zi)log(Q(zi)p(xi,zi;θ))
最后一步就是Jenssen不等式
注意到,当Jenssen不等式取等号时:
p ( x i , z i ; θ ) Q ( z i ) = c for any z i \frac{p(x_i,z_i;\theta)}{Q({z_i})}=c \text{ \ \ \ \ for any $z_i$} Q(zi)p(xi,zi;θ)=c for any zi
只有 Q ( z i ) = p ( z i ∣ x i ; θ ) Q(z_i )=p(z_i|x_i;\theta) Q(zi)=p(zi∣xi;θ)时,这个式子才是成立的
这样可以采用坐标上升法去求得最优的 Q ( z i ) Q(z_i) Q(zi)和 θ \theta θ
E-step:
固定 θ ,最优化 Q ( z i ) : Q ( z i ) = p ( z i ∣ x i ; θ ) \text{固定$\theta$,最优化$Q(z_i)$ : }\\ Q(z_i)=p(z_i|x_i;\theta) 固定θ,最优化Q(zi) : Q(zi)=p(zi