EM算法理解的第一层境界:期望E和最大化M(二)

前言:学生时代入门机器学习的时候接触的EM算法,当时感觉这后面应该有一套数学逻辑来约束EM算法的可行性。最近偶然在知乎上拜读了史博大佬的《EM算法理解的九层境界》[1],顿时感觉自己还是局限了。重新学习思考了一段时间,对EM算法有了更深的理解。

四、数学推导理解EM算法的形式

接下来我们通过严格的数学推导来理解EM算法形式式子的含义。接着上面给出的EM算法的式子:
θ ( t + 1 ) = arg max ⁡ θ Q ( θ ∣ θ ( t ) ) = arg max ⁡ θ E Z ∣ X , θ ( t ) [ log ⁡ L ( θ ; X , Z ) ] \theta^{(t+1)}=\argmax_{\theta}Q(\theta|\theta^{(t)})=\argmax_{\theta}E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] θ(t+1)=θargmaxQ(θθ(t))=θargmaxEZX,θ(t)[logL(θ;X,Z)]
下面我们一步一步进行推导。
首先隐变量 Z = [ z 1 , z 2 , . . . , z n ] , z i ∈ { 0 , 1 } Z=[z_1, z_2, ..., z_n], z_i\in\{0,1\} Z=[z1,z2,...,zn],zi{0,1},分量 z i = 1 z_i=1 zi=1表示 i i i轮次使用的是硬币 A A A,分量 z i = 0 z_i=0 zi=0表示 i i i轮次使用的是硬币 B B B,一共进行了 n n n轮次的投币行为。那么隐变量 Z Z Z的变量空间为 { 0 , 1 } n \{0, 1\}^n {0,1}n,一共 2 n 2^n 2n个离散点组成的离散空间。于是我们有:
E Z ∣ X , θ ( t ) [ log ⁡ L ( θ ; X , Z ) ] = ∑ Z ∈ { 0 , 1 } n [ P ( Z ∣ X , θ ( t ) ) ∗ log ⁡ L ( θ ; X , Z ) ] E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)]=\sum_{Z \in \{0, 1\}^n}[P(Z|X, \theta^{(t)}) * \log L(\theta;X,Z)] EZX,θ(t)[logL(θ;X,Z)]=Z{0,1}n[P(ZX,θ(t))logL(θ;X,Z)]
其中,隐变量 z i z_i zi之间是独立同分布的,所以我们有:
P ( Z ∣ X , θ ( t ) ) = ∏ i = 1 n P ( z i ∣ x i , θ ( t ) ) P(Z|X, \theta^{(t)}) = \prod_{i=1}^{n}P(z_i|x_i, \theta^{(t)}) P(ZX,θ(t))=i=1nP(zixi,θ(t))
对于每一个轮次 i i i而言,在给定观测变量 x i ∈ [ 0 , δ ] x_i \in [0, \delta] xi[0,δ]和硬币的概率参数 θ ( t ) \theta^{(t)} θ(t)(即 θ A ( t ) \theta_A^{(t)} θA(t) θ B ( t ) \theta_B^{(t)} θB(t) θ C ( t ) \theta_C^{(t)} θC(t))之后,我们有:
P ( z i ∣ x i , θ ( t ) ) = P ( z i , x i ∣ θ ( t ) ) P ( x i ∣ θ ( t ) ) = P ( z i , x i ∣ θ ( t ) ) ∑ z i ∈ 0 , 1 P ( z i , x i ∣ θ ( t ) ) \begin{aligned} P(z_i|x_i, \theta^{(t)}) & = \frac{P(z_i, x_i | \theta^{(t)})}{P(x_i| \theta^{(t)})} \\ & = \frac{P(z_i, x_i|\theta^{(t)})}{\sum_{z_i\in{0,1}} P(z_i, x_i| \theta^{(t)})} \end{aligned} P(zixi,θ(t))=P(xiθ(t))P(zi,xiθ(t))=zi0,1P(zi,xiθ(t))P(zi,xiθ(t))
其中,给定 θ ( t ) \theta^{(t)} θ(t)之后, ( z i , x i ) (z_i, x_i) (zi,xi)的联合概率分布可以直接给出来, P ( z i = 1 , x i ∣ θ ( t ) ) P(z_i=1, x_i | \theta^{(t)}) P(zi=1,xiθ(t))为第 i i i轮次是硬币 A A A且抛出了 x i x_i xi个正面的概率:
P ( z i = 1 , x i ∣ θ ( t ) ) = P ( z i = 1 ∣ θ ( t ) ) ∗ P ( x i ∣ z i = 1 , θ ( t ) ) = P ( z i = 1 ∣ θ ( t ) ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i = θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i \begin{aligned} P(z_i=1, x_i | \theta^{(t)}) & = P(z_i=1 | \theta^{(t))} * P(x_i | z_i=1, \theta^{(t)})\\ & = P(z_i=1 | \theta^{(t)})* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} \\ & = \theta_C^{(t)} * (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} \end{aligned} P(zi=1,xiθ(t))=P(zi=1θ(t))P(xizi=1,θ(t))=P(zi=1θ(t)(θA(t))xi(1θA(t))δxi=θC(t)(θA(t))xi(1θA(t))δxi

P ( z i = 0 , x i , θ ( t ) ) P(z_i=0, x_i, \theta^{(t)}) P(zi=0,xi,θ(t))为第 i i i轮次是硬币 B B B且抛出了 x i x_i xi个正面的概率:

P ( z i = 0 , x i ∣ θ ( t ) ) = P ( z i = 0 ∣ θ ( t ) ) ∗ P ( x i ∣ z i = 0 , θ ( t ) ) = P ( z i = 0 ∣ θ ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i = ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i \begin{aligned} P(z_i=0, x_i| \theta^{(t)}) & = P(z_i=0|\theta^{(t)}) * P(x_i|z_i=0, \theta^{(t)})\\ & = P(z_i=0| \theta^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} \\ & = (1-\theta_C^{(t)}) * (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} \end{aligned} P(zi=0,xiθ(t))=P(zi=0θ(t))P(xizi=0,θ(t))=P(zi=0θ(t)(θB(t))xi(1θB(t))δxi=(1θC(t))(θB(t))xi(1θB(t))δxi

接下来,给定 x i , θ ( t ) x_i, \theta^{(t)} xi,θ(t)情况下轮次 i i i是硬币A的概率 P ( z i = 1 ∣ x i , θ ( t ) ) P(z_i=1|x_i, \theta^{(t)}) P(zi=1xi,θ(t))和是硬币B的概率 P ( z i = 0 ∣ x i , θ ( t ) ) P(z_i=0|x_i, \theta^{(t)}) P(zi=0xi,θ(t))如下:
P ( z i = 1 ∣ x i , θ ( t ) ) = P ( z i = 1 , x i ∣ θ ( t ) ) P ( z i = 1 , x i ∣ θ ( t ) ) + P ( z i = 0 , x i ∣ θ ( t ) ) = θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i + ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i \begin{aligned} P(z_i=1|x_i, \theta^{(t)}) & = \frac{P(z_i=1, x_i | \theta^{(t)})}{P(z_i=1, x_i | \theta^{(t)}) + P(z_i=0, x_i | \theta^{(t)})} \\ & = \frac{\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}}{\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}+(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}} \end{aligned} P(zi=1xi,θ(t))=P(zi=1,xiθ(t))+P(zi=0,xiθ(t))P(zi=1,xiθ(t))=θC(t)(θA(t))xi(1θA(t))δxi+(1θC(t))(θB(t))xi(1θB(t))δxiθC(t)(θA(t))xi(1θA(t))δxi

P ( z i = 0 ∣ x i , θ ( t ) ) = P ( z i = 0 , x i ∣ θ ( t ) ) P ( z i = 1 , x i ∣ θ ( t ) ) + P ( z i = 0 , x i ∣ θ ( t ) ) = ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i + ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i \begin{aligned} P(z_i=0|x_i, \theta^{(t)}) & = \frac{P(z_i=0, x_i|\theta^{(t)})}{P(z_i=1, x_i | \theta^{(t)}) + P(z_i=0, x_i | \theta^{(t)})} \\ & = \frac{(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}}{\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}+(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}} \end{aligned} P(zi=0xi,θ(t))=P(zi=1,xiθ(t))+P(zi=0,xiθ(t))P(zi=0,xiθ(t))=θC(t)(θA(t))xi(1θA(t))δxi+(1θC(t))(θB(t))xi(1θB(t))δxi(1θC(t))(θB(t))xi(1θB(t))δxi

两个式子的分母是一样的,我们记
η i = θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i + ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i \eta_i=\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}+(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} ηi=θC(t)(θA(t))xi(1θA(t))δxi+(1θC(t))(θB(t))xi(1θB(t))δxi
则有:
P ( z i = 0 ∣ x i , θ ( t ) ) = 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i P(z_i=0|x_i, \theta^{(t)})=\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} P(zi=0xi,θ(t))=ηi1θC(t)(θA(t))xi(1θA(t))δxi
P ( z i = 1 ∣ x i , θ ( t ) ) = 1 η i ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i P(z_i=1|x_i, \theta^{(t)})=\frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} P(zi=1xi,θ(t))=ηi1(1θC(t))(θB(t))xi(1θB(t))δxi

变量 ( Z ∣ X , θ ( t ) ) (Z|X, \theta^{(t)}) (ZX,θ(t))的分布我们表示出来了,下一步来求联合分布 ( θ ; X , Z ) (\theta;X,Z) (θ;X,Z)的log-likelyhood: log ⁡ L ( θ ; X , Z ) \log L(\theta;X,Z) logL(θ;X,Z)
对于每一轮次的likelyhood,我们有:
L ( θ ; x i , z i ) = P ( x i ∣ z i , θ ) ∗ P ( z i ∣ θ ) ∗ P ( θ ) = ( z i ∗ θ A + ( 1 − z i ) θ B ) x i ∗ ( 1 − z i ∗ θ A − ( 1 − z i ) θ B ) ( δ − x i ) ∗ θ C z i ∗ ( 1 − θ C ) 1 − z i ∗ 1 \begin{aligned} L(\theta; x_i, z_i) & = P(x_i|z_i, \theta) * P(z_i|\theta) * P(\theta) \\ &= (z_i*\theta_A+(1-z_i)\theta_B)^{x_i}*(1-z_i*\theta_A-(1-z_i)\theta_B)^{(\delta-x_i)} * \theta_C^{z_i} * (1-\theta_C)^{1-z_i} * 1 \end{aligned} L(θ;xi,zi)=P(xizi,θ)P(ziθ)P(θ)=(ziθA+(1zi)θB)xi(1ziθA(1zi)θB)(δxi)θCzi(1θC)1zi1
同样,由于每个轮次都是独立同分布的,我们有:
L ( θ ; X , Z ) = ∏ i = 1 n L ( θ ; x i , z i ) = ∏ i = 1 n ( z i ∗ θ A + ( 1 − z i ) θ B ) x i ∗ ( 1 − z i ∗ θ A − ( 1 − z i ) θ B ) ( δ − x i ) ∗ θ C z i ∗ ( 1 − θ C ) 1 − z i \begin{aligned} L(\theta;X,Z) & =\prod_{i=1}^{n} L(\theta; x_i, z_i)\\ &=\prod_{i=1}^{n}(z_i*\theta_A+(1-z_i)\theta_B)^{x_i}*(1-z_i*\theta_A-(1-z_i)\theta_B)^{(\delta-x_i)}*\theta_C^{z_i} * (1-\theta_C)^{1-z_i} \end{aligned} L(θ;X,Z)=i=1nL(θ;xi,zi)=i=1n(ziθA+(1zi)θB)xi(1ziθA(1zi)θB)(δxi)θCzi(1θC)1zi
log ⁡ \log log一下我们有:
log ⁡ L ( θ ; X , Z ) = log ⁡ ∏ i = 1 n ( z i ∗ θ A + ( 1 − z i ) θ B ) x i ∗ ( 1 − z i ∗ θ A − ( 1 − z i ) θ B ) ( δ − x i ) ∗ θ C z i ∗ ( 1 − θ C ) 1 − z i = ∑ i = 1 n [ x i ∗ log ⁡ ( z i ∗ θ A + ( 1 − z i ) θ B ) + ( δ − x i ) ∗ log ⁡ ( 1 − z i ∗ θ A − ( 1 − z i ) θ B ) + z i ∗ log ⁡ θ C + ( 1 − z i ) ∗ log ⁡ ( 1 − θ C ) ] \begin{aligned} \log L(\theta;X,Z) &= \log \prod_{i=1}^{n}(z_i*\theta_A+(1-z_i)\theta_B)^{x_i}*(1-z_i*\theta_A-(1-z_i)\theta_B)^{(\delta-x_i)}*\theta_C^{z_i} * (1-\theta_C)^{1-z_i}\\ &= \sum_{i=1}^{n}[x_i*\log(z_i*\theta_A+(1-z_i)\theta_B) + (\delta-x_i)*\log(1-z_i*\theta_A-(1-z_i)\theta_B)+z_i*\log \theta_C+(1-z_i)*\log (1-\theta_C)] \end{aligned} logL(θ;X,Z)=logi=1n(ziθA+(1zi)θB)xi(1ziθA(1zi)θB)(δxi)θCzi(1θC)1zi=i=1n[xilog(ziθA+(1zi)θB)+(δxi)log(1ziθA(1zi)θB)+zilogθC+(1zi)log(1θC)]

OK,到这一步 log ⁡ L ( θ ; X , Z ) \log L(\theta;X,Z) logL(θ;X,Z) P ( Z ∣ X , θ ( t ) ) P(Z|X, \theta^{(t)}) P(ZX,θ(t))我们都表示出来了,接下来就是对他们求个期望:
E Z ∣ X , θ ( t ) [ log ⁡ L ( θ ; X , Z ) ] = ∑ Z ∈ { 0 , 1 } n [ P ( Z ∣ X , θ ( t ) ) ∗ log ⁡ L ( θ ; X , Z ) ] = ∑ Z ∈ { 0 , 1 } n ∏ i = 1 n P ( z i ∣ x i , θ ( t ) ) ∗ ∑ j = 1 n l ( x j , z j , θ ) \begin{aligned} E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] & =\sum_{Z \in \{0, 1\}^n}[P(Z|X, \theta^{(t)}) * \log L(\theta;X,Z)] \\ &=\sum_{Z \in \{0, 1\}^n} \prod_{i=1}^{n} P(z_i|x_i, \theta^{(t)})*\sum_{j=1}^{n}l(x_j, z_j, \theta) \end{aligned} EZX,θ(t)[logL(θ;X,Z)]=Z{0,1}n[P(ZX,θ(t))logL(θ;X,Z)]=Z{0,1}ni=1nP(zixi,θ(t))j=1nl(xj,zj,θ)
注意,这里我们为了防止变量重复,把 log ⁡ L ( θ ; X , Z ) \log L(\theta;X,Z) logL(θ;X,Z)中的遍历变量 i i i替换为了 j j j。其中
l ( x j , z j , Q ) = x j ∗ log ⁡ ( z j ∗ θ A + ( 1 − z j ) θ B ) + ( δ − x j ) ∗ log ⁡ ( 1 − z j ∗ θ A − ( 1 − z j ) θ B ) + z j ∗ log ⁡ θ C + ( 1 − z j ) ∗ log ⁡ ( 1 − θ C ) l(x_j, z_j, Q) = x_j*\log(z_j*\theta_A+(1-z_j)\theta_B) + (\delta-x_j)*\log(1-z_j*\theta_A-(1-z_j)\theta_B)+z_j*\log \theta_C+(1-z_j)*\log (1-\theta_C) l(xj,zj,Q)=xjlog(zjθA+(1zj)θB)+(δxj)log(1zjθA(1zj)θB)+zjlogθC+(1zj)log(1θC)
P ( z i = 0 ∣ x i , θ ( t ) ) = 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i P(z_i=0|x_i, \theta^{(t)})=\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} P(zi=0xi,θ(t))=ηi1θC(t)(θA(t))xi(1θA(t))δxi
P ( z i = 1 ∣ x i , θ ( t ) ) = 1 η i ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i P(z_i=1|x_i, \theta^{(t)})=\frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} P(zi=1xi,θ(t))=ηi1(1θC(t))(θB(t))xi(1θB(t))δxi

这里得到的 E Z ∣ X , θ ( t ) [ log ⁡ L ( θ ; X , Z ) ] E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] EZX,θ(t)[logL(θ;X,Z)]式子开起来很吓人,直接使用期望的性质可以将其化简为简单的形式,这里我们不使用独立分布期望的性质,直接使用一些数学技巧来进行化简。

对于所有的变量空间 Z ∈ { 0 , 1 } n Z \in \{0, 1\}^n Z{0,1}n,我们将其拆分一下, z 1 z_1 z1 z 2 , z 3 , . . . , z n z2, z3, ..., z_n z2,z3,...,zn分别表示为 z 1 z_1 z1 Z 2 n Z_2^n Z2n,然后将其分为 z 1 = 1 z_1=1 z1=1 z 1 = 0 z_1=0 z1=0两个部分。那么上面式子可以分解为:
E Z ∣ X , θ ( t ) [ log ⁡ L ( θ ; X , Z ) ] = ∑ z 1 = 1 , Z 2 n ∈ { 0 , 1 } n − 1 ∏ i = 1 n P ( z i ∣ x i , θ ( t ) ) ∗ ∑ j = 1 n l ( x j , z j , θ ) + ∑ z 1 = 0 , Z 2 n ∈ { 0 , 1 } n − 1 ∏ i = 1 n P ( z i ∣ x i , θ ( t ) ) ∗ ∑ j = 1 n l ( x j , z j , θ ) = ∑ z 1 = 1 , Z 2 n ∈ { 0 , 1 } n − 1 ∏ i = 2 n P ( z i ∣ x i , θ ( t ) ) ∗ P ( z 1 = 1 ∣ x 1 , θ ( t ) ) ∗ [ l ( x 1 , z 1 = 1 , θ ) + ∑ j = 2 n l ( x j , z j , θ ) ] + ∑ z 1 = 0 , Z 2 n ∈ { 0 , 1 } n − 1 ∏ i = 2 n P ( z i ∣ x i , θ ( t ) ) ∗ P ( z 0 = 1 ∣ x 1 , θ ( t ) ) ∗ [ l ( x 1 , z 1 = 0 , θ ) + ∑ j = 2 n l ( x j , z j , θ ) ] = P ( z 1 = 1 ∣ x 1 , θ ( t ) ) ∗ l ( x 1 , z 1 = 1 , θ ) ∗ ∑ Z 2 n ∈ { 0 , 1 } n − 1 ∏ i = 2 n P ( z i ∣ x i , θ ( t ) ) + P ( z 1 = 1 ∣ x 1 , θ ( t ) ) ∗ ∑ Z 2 n ∈ { 0 , 1 } n − 1 ∏ i = 2 n P ( z i ∣ x i , θ ( t ) ) ∗ ∑ j = 2 n l ( x j , z j , θ ) + P ( z 1 = 0 ∣ x 1 , θ ( t ) ) ∗ l ( x 1 , z 1 = 0 , θ ) ∗ ∑ Z 2 n ∈ { 0 , 1 } n − 1 ∏ i = 2 n P ( z i ∣ x i , θ ( t ) ) + P ( z 1 = 0 ∣ x 1 , θ ( t ) ) ∗ ∑ Z 2 n ∈ { 0 , 1 } n − 1 ∏ i = 2 n P ( z i ∣ x i , θ ( t ) ) ∗ ∑ j = 2 n l ( x j , z j , θ ) \begin{aligned} E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] &= \sum_{z1=1, Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=1}^{n} P(z_i|x_i, \theta^{(t)})*\sum_{j=1}^{n}l(x_j, z_j, \theta) \\ &+ \sum_{z1=0, Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=1}^{n} P(z_i|x_i, \theta^{(t)})*\sum_{j=1}^{n}l(x_j, z_j, \theta) \\ &= \sum_{z1=1, Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)})*P(z_1=1|x_1, \theta^{(t)})*[l(x_1, z_1=1, \theta)+\sum_{j=2}^{n}l(x_j, z_j, \theta)] \\ &+ \sum_{z1=0, Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)})*P(z_0=1|x_1, \theta^{(t)})*[l(x_1, z_1=0, \theta)+\sum_{j=2}^{n}l(x_j, z_j, \theta)] \\ & = P(z_1=1|x_1, \theta^{(t)}) * l(x_1, z_1=1, \theta) * \sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)}) \\ & + P(z_1=1|x_1, \theta^{(t)}) * \sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)}) * \sum_{j=2}^{n}l(x_j, z_j, \theta)\\ & + P(z_1=0|x_1, \theta^{(t)}) * l(x_1, z_1=0, \theta) * \sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)}) \\ & + P(z_1=0|x_1, \theta^{(t)}) * \sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)}) * \sum_{j=2}^{n}l(x_j, z_j, \theta) \end{aligned} EZX,θ(t)[logL(θ;X,Z)]=z1=1,Z2n{0,1}n1i=1nP(zixi,θ(t))j=1nl(xj,zj,θ)+z1=0,Z2n{0,1}n1i=1nP(zixi,θ(t))j=1nl(xj,zj,θ)=z1=1,Z2n{0,1}n1i=2nP(zixi,θ(t))P(z1=1x1,θ(t))[l(x1,z1=1,θ)+j=2nl(xj,zj,θ)]+z1=0,Z2n{0,1}n1i=2nP(zixi,θ(t))P(z0=1x1,θ(t))[l(x1,z1=0,θ)+j=2nl(xj,zj,θ)]=P(z1=1x1,θ(t))l(x1,z1=1,θ)Z2n{0,1}n1i=2nP(zixi,θ(t))+P(z1=1x1,θ(t))Z2n{0,1}n1i=2nP(zixi,θ(t))j=2nl(xj,zj,θ)+P(z1=0x1,θ(t))l(x1,z1=0,θ)Z2n{0,1}n1i=2nP(zixi,θ(t))+P(z1=0x1,θ(t))Z2n{0,1}n1i=2nP(zixi,θ(t))j=2nl(xj,zj,θ)

由概率分布之和为1的性质,我们有:
P ( z 1 = 1 ∣ x 1 , θ ( t ) ) + P ( z 1 = 0 ∣ x 1 , θ ( t ) ) = 1 P(z_1=1|x_1, \theta^{(t)})+P(z_1=0|x_1, \theta^{(t)})=1 P(z1=1x1,θ(t))+P(z1=0x1,θ(t))=1
∑ Z 2 n ∈ { 0 , 1 } n − 1 ∏ i = 2 n P ( z i ∣ x i , θ ( t ) ) = ∑ Z 2 n ∈ { 0 , 1 } n − 1 P ( Z 2 n ∣ X 2 n , θ ( t ) ) = 1 \sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)})=\sum_{Z_2^n \in \{0, 1\}^{n-1}}P(Z_2^n|X_2^n, \theta^{(t)})=1 Z2n{0,1}n1i=2nP(zixi,θ(t))=Z2n{0,1}n1P(Z2nX2n,θ(t))=1
所以我们有:
E Z ∣ X , θ ( t ) [ log ⁡ L ( θ ; X , Z ) ] = P ( z 1 = 1 ∣ x 1 , θ ( t ) ) ∗ l ( x 1 , z 1 = 1 , θ ) + P ( z 1 = 0 ∣ x 1 , θ ( t ) ) ∗ l ( x 1 , z 1 = 0 , θ ) + ∑ Z 2 n ∈ { 0 , 1 } n − 1 ∏ i = 2 n P ( z i ∣ x i , θ ( t ) ) ∗ ∑ j = 2 n l ( x j , z j , θ ) = P ( z 1 = 1 ∣ x 1 , θ ( t ) ) ∗ l ( x 1 , z 1 = 1 , θ ) + P ( z 1 = 0 ∣ x 1 , θ ( t ) ) ∗ l ( x 1 , z 1 = 0 , θ ) + P ( z 2 = 1 ∣ x 1 , θ ( t ) ) ∗ l ( x 2 , z 2 = 1 , θ ) + P ( z 2 = 0 ∣ x 1 , θ ( t ) ) ∗ l ( x 2 , z 2 = 0 , θ ) + ∑ Z 3 n ∈ { 0 , 1 } n − 2 ∏ i = 3 n P ( z i ∣ x i , θ ( t ) ) ∗ ∑ j = 3 n l ( x j , z j , θ ) = . . . . . . = ∑ i = 1 n P ( z i = 1 ∣ x i , θ ( t ) ) ∗ l ( x i , z i = 1 , θ ) + ∑ i = 1 n P ( z i = 0 ∣ x i , θ ( t ) ) ∗ l ( x i , z i = 0 , θ ) \begin{aligned} E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] &= P(z_1=1|x_1, \theta^{(t)}) * l(x_1, z_1=1, \theta) \\ &+ P(z_1=0|x_1, \theta^{(t)}) * l(x_1, z_1=0, \theta) \\ &+ \sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)}) * \sum_{j=2}^{n}l(x_j, z_j,\theta)\\ &= P(z_1=1|x_1, \theta^{(t)}) * l(x_1, z_1=1, \theta) \\ &+ P(z_1=0|x_1, \theta^{(t)}) * l(x_1, z_1=0, \theta) \\ &+ P(z_2=1|x_1, \theta^{(t)}) * l(x_2, z_2=1, \theta) \\ &+ P(z_2=0|x_1, \theta^{(t)}) * l(x_2, z_2=0, \theta) \\ &+ \sum_{Z_3^n \in \{0, 1\}^{n-2}} \prod_{i=3}^{n} P(z_i|x_i, \theta^{(t)}) * \sum_{j=3}^{n}l(x_j, z_j, \theta)\\ &=......\\ &= \sum_{i=1}^n P(z_i=1|x_i, \theta^{(t)}) * l(x_i, z_i=1, \theta) + \sum_{i=1}^n P(z_i=0|x_i, \theta^{(t)}) * l(x_i, z_i=0, \theta) \end{aligned} EZX,θ(t)[logL(θ;X,Z)]=P(z1=1x1,θ(t))l(x1,z1=1,θ)+P(z1=0x1,θ(t))l(x1,z1=0,θ)+Z2n{0,1}n1i=2nP(zixi,θ(t))j=2nl(xj,zj,θ)=P(z1=1x1,θ(t))l(x1,z1=1,θ)+P(z1=0x1,θ(t))l(x1,z1=0,θ)+P(z2=1x1,θ(t))l(x2,z2=1,θ)+P(z2=0x1,θ(t))l(x2,z2=0,θ)+Z3n{0,1}n2i=3nP(zixi,θ(t))j=3nl(xj,zj,θ)=......=i=1nP(zi=1xi,θ(t))l(xi,zi=1,θ)+i=1nP(zi=0xi,θ(t))l(xi,zi=0,θ)
最终我们求得了需要maximum的式子:
E Z ∣ X , θ ( t ) [ log ⁡ L ( θ ; X , Z ) ] = ∑ i = 1 n P ( z i = 1 ∣ x i , θ ( t ) ) ∗ l ( x i , z i = 1 , Q ) + ∑ i = 1 n P ( z i = 0 ∣ x i , θ ( t ) ) ∗ l ( x i , z i = 0 , θ ) E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] = \sum_{i=1}^n P(z_i=1|x_i, \theta^{(t)}) * l(x_i, z_i=1, Q) + \sum_{i=1}^n P(z_i=0|x_i, \theta^{(t)}) * l(x_i, z_i=0, \theta) EZX,θ(t)[logL(θ;X,Z)]=i=1nP(zi=1xi,θ(t))l(xi,zi=1,Q)+i=1nP(zi=0xi,θ(t))l(xi,zi=0,θ)
其中
l ( x j , z j , θ ) = x j ∗ log ⁡ ( z j ∗ θ A + ( 1 − z j ) θ B ) + ( δ − x j ) ∗ log ⁡ ( 1 − z j ∗ θ A − ( 1 − z j ) θ B ) + z j ∗ log ⁡ θ C + ( 1 − z j ) ∗ log ⁡ ( 1 − θ C ) l(x_j, z_j, \theta) = x_j*\log(z_j*\theta_A+(1-z_j)\theta_B) + (\delta-x_j)*\log(1-z_j*\theta_A-(1-z_j)\theta_B)+z_j*\log \theta_C+(1-z_j)*\log (1-\theta_C) l(xj,zj,θ)=xjlog(zjθA+(1zj)θB)+(δxj)log(1zjθA(1zj)θB)+zjlogθC+(1zj)log(1θC)
P ( z i = 0 ∣ x i , θ ( t ) ) = 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i P(z_i=0|x_i, \theta^{(t)})=\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} P(zi=0xi,θ(t))=ηi1θC(t)(θA(t))xi(1θA(t))δxi
P ( z i = 1 ∣ x i , θ ( t ) ) = 1 η i ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i P(z_i=1|x_i, \theta^{(t)})=\frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} P(zi=1xi,θ(t))=ηi1(1θC(t))(θB(t))xi(1θB(t))δxi
把上述式子带入,有:
E Z ∣ X , θ ( t ) [ log ⁡ L ( θ ; X , Z ) ] = ∑ i = 1 n P ( z i = 1 ∣ x i , θ ( t ) ) ∗ l ( x i , z i = 1 , θ ) + ∑ i = 1 n P ( z i = 0 ∣ x i , θ ( t ) ) ∗ l ( x i , z i = 0 , θ ) = ∑ i = 1 n 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i ∗ [ x i ∗ log ⁡ θ A + ( δ − x i ) ∗ log ⁡ ( 1 − θ A ) + log ⁡ θ C ] + ∑ i = 1 n 1 η i ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i ∗ [ x i ∗ log ⁡ θ B + ( δ − x i ) ∗ log ⁡ ( 1 − θ B ) + log ⁡ ( 1 − θ C ) ] \begin{aligned} E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] & = \sum_{i=1}^n P(z_i=1|x_i, \theta^{(t)}) * l(x_i, z_i=1, \theta) + \sum_{i=1}^n P(z_i=0|x_i, \theta^{(t)}) * l(x_i, z_i=0, \theta) \\ &= \sum_{i=1}^n \frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} * [x_i*\log \theta_A + (\delta-x_i)*\log(1-\theta_A)+\log \theta_C] \\ &+\sum_{i=1}^n \frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} * [x_i*\log \theta_B + (\delta-x_i)*\log(1-\theta_B)+\log (1-\theta_C)] \end{aligned} EZX,θ(t)[logL(θ;X,Z)]=i=1nP(zi=1xi,θ(t))l(xi,zi=1,θ)+i=1nP(zi=0xi,θ(t))l(xi,zi=0,θ)=i=1nηi1θC(t)(θA(t))xi(1θA(t))δxi[xilogθA+(δxi)log(1θA)+logθC]+i=1nηi1(1θC(t))(θB(t))xi(1θB(t))δxi[xilogθB+(δxi)log(1θB)+log(1θC)]
其中,
η i = θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i + ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i \eta_i=\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}+(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} ηi=θC(t)(θA(t))xi(1θA(t))δxi+(1θC(t))(θB(t))xi(1θB(t))δxi

可以看到这个式子中,我们已经把所有隐变量 Z Z Z积分消除掉了,到这里我们就可以进行求解了。我们需要求解

θ ( t + 1 ) = arg max ⁡ θ Q ( θ ∣ θ ( t ) ) = arg max ⁡ θ E Z ∣ X , θ ( t ) [ log ⁡ L ( θ ; X , Z ) ] = arg max ⁡ θ ∑ i = 1 n 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i ∗ [ x i ∗ log ⁡ θ A + ( δ − x i ) ∗ log ⁡ ( 1 − θ A ) + log ⁡ θ C ] + ∑ i = 1 n 1 η i ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i ∗ [ x i ∗ log ⁡ θ B + ( δ − x i ) ∗ log ⁡ ( 1 − θ B ) + log ⁡ ( 1 − θ C ) ] \begin{aligned} \theta^{(t+1)}&=\argmax_{\theta}Q(\theta|\theta^{(t)})=\argmax_{\theta}E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)]\\ &=\argmax_{\theta} \sum_{i=1}^n \frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} * [x_i*\log \theta_A + (\delta-x_i)*\log(1-\theta_A)+\log \theta_C] \\ &+\sum_{i=1}^n \frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} * [x_i*\log \theta_B + (\delta-x_i)*\log(1-\theta_B)+\log (1-\theta_C)] \end{aligned} θ(t+1)=θargmaxQ(θθ(t))=θargmaxEZX,θ(t)[logL(θ;X,Z)]=θargmaxi=1nηi1θC(t)(θA(t))xi(1θA(t))δxi[xilogθA+(δxi)log(1θA)+logθC]+i=1nηi1(1θC(t))(θB(t))xi(1θB(t))δxi[xilogθB+(δxi)log(1θB)+log(1θC)]

其中, ( θ A , θ B ) (\theta_A, \theta_B) (θA,θB)是需要求解的变量,其他 ( θ A ( t ) , θ C ( t ) , θ C ( t ) ) , η i (\theta_A^{(t)}, \theta_C^{(t)}, \theta_C^{(t)}), \eta_i (θA(t),θC(t),θC(t)),ηi都是常量。这里我们联立偏导方程进行求解:
{ ∂ Q ∂ θ A = 0 ∂ Q ∂ θ B = 0 ∂ Q ∂ θ C = 0 \{ \begin{aligned} \frac{\partial Q}{\partial \theta_A} = 0 \\ \frac{\partial Q}{\partial \theta_B} = 0 \\ \frac{\partial Q}{\partial \theta_C} = 0 \end{aligned} {θAQ=0θBQ=0θCQ=0
一个一个来:
∂ Q ∂ θ A = ∑ i = 1 n 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i ∗ [ x i ∗ 1 θ A − ( δ − x i ) ∗ 1 ( 1 − θ A ) ] = ∑ i = 1 n 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i ∗ [ x i ∗ 1 θ A − ( δ − x i ) ∗ 1 ( 1 − θ A ) ] = ∑ i = 1 n 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i ∗ [ x i − δ ∗ θ A θ A ∗ ( 1 − θ A ) ] \begin{aligned} \frac{\partial Q}{\partial \theta_A} &= \sum_{i=1}^n \frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} * [x_i*\frac{1}{\theta_A} - (\delta-x_i)*\frac{1}{(1-\theta_A)}] \\ &=\sum_{i=1}^n \frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} * [x_i*\frac{1}{\theta_A} - (\delta-x_i)*\frac{1}{(1-\theta_A)}] \\ &=\sum_{i=1}^n \frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} * [\frac{x_i-\delta*\theta_A}{\theta_A*(1-\theta_A)}] \end{aligned} θAQ=i=1nηi1θC(t)(θA(t))xi(1θA(t))δxi[xiθA1(δxi)(1θA)1]=i=1nηi1θC(t)(θA(t))xi(1θA(t))δxi[xiθA1(δxi)(1θA)1]=i=1nηi1θC(t)(θA(t))xi(1θA(t))δxi[θA(1θA)xiδθA]
∂ Q ∂ θ A = 0 \frac{\partial Q}{\partial \theta_A}=0 θAQ=0,我们有:
θ A ( t + 1 ) = ∑ i = 1 n 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i ∗ x i ∑ i = 1 n 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i ∗ δ \theta_A^{(t+1)} = \frac{\sum_{i=1}^{n}\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}*x_i}{\sum_{i=1}^{n}\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}*\delta} θA(t+1)=i=1nηi1θC(t)(θA(t))xi(1θA(t))δxiδi=1nηi1θC(t)(θA(t))xi(1θA(t))δxixi
同理我们有:
θ B ( t + 1 ) = ∑ i = 1 n 1 η i ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i ∗ x i ∑ i = 1 n 1 η i ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i ∗ δ \theta_B^{(t+1)} = \frac{\sum_{i=1}^{n}\frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}*x_i}{\sum_{i=1}^{n}\frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}*\delta} θB(t+1)=i=1nηi1(1θC(t))(θB(t))xi(1θB(t))δxiδi=1nηi1(1θC(t))(θB(t))xi(1θB(t))δxixi
θ C ( t + 1 ) = ∑ i = 1 n 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i ∑ i = 1 n 1 η i [ θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i + ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i ] = 1 n ∑ i = 1 n 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i \begin{aligned} \theta_C^{(t+1)} &= \frac{\sum_{i=1}^{n}\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}}{\sum_{i=1}^{n}\frac{1}{\eta_i} [\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}+(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}]} \\ &=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} \end{aligned} θC(t+1)=i=1nηi1[θC(t)(θA(t))xi(1θA(t))δxi+(1θC(t))(θB(t))xi(1θB(t))δxi]i=1nηi1θC(t)(θA(t))xi(1θA(t))δxi=n1i=1nηi1θC(t)(θA(t))xi(1θA(t))δxi

对比下前面直觉的结果,形式上非常接近:
θ ^ A ( t + 1 ) = ∑ i = 1 n x i ∗ P ( z i = 1 ∣ ∗ ) ∑ i = 1 n δ ∗ P ( z i = 1 ∣ ∗ ) \begin{aligned} \hat\theta_A^{(t+1)} & = \frac{\sum_{i=1}^{n}x_i * P(z_i=1|*)}{\sum_{i=1}^{n}\delta * P(z_i=1|*)} \end{aligned} θ^A(t+1)=i=1nδP(zi=1)i=1nxiP(zi=1)
θ ^ B ( t + 1 ) = ∑ i = 1 n x i ∗ P ( z i = 0 ∣ ∗ ) ∑ i = 1 n δ ∗ P ( z i = 0 ∣ ∗ ) \begin{aligned} \hat\theta_B^{(t+1)} & = \frac{\sum_{i=1}^{n}x_i * P(z_i=0|*)}{\sum_{i=1}^{n}\delta * P(z_i=0|*)} \end{aligned} θ^B(t+1)=i=1nδP(zi=0)i=1nxiP(zi=0)
θ ^ C ( t + 1 ) = ∑ i = 1 n P ( z i = 1 ∣ ∗ ) ∑ i = 1 n [ P ( z i = 0 ∣ ∗ ) + P ( z i = 1 ∣ ∗ ) ] = 1 n ∑ i = 1 n P ( z i = 1 ∣ ∗ ) \begin{aligned} \hat\theta_C^{(t+1)} & = \frac{\sum_{i=1}^{n} P(z_i=1|*)}{\sum_{i=1}^{n} [P(z_i=0|*)+P(z_i=1|*)]}=\frac{1}{n}\sum_{i=1}^{n} P(z_i=1|*) \end{aligned} θ^C(t+1)=i=1n[P(zi=0)+P(zi=1)]i=1nP(zi=1)=n1i=1nP(zi=1)

结束语

到此为止是为EM算法的第一层境界,我们通过直觉和推导两条线从形式上理解了EM算法是什么,解决了What的问题。接下来我们需要解决的是Why的问题,即EM算法的必要性和可行性的问题,这就是EM算法的第二重境界的问题了:

  • 必要性:原问题无法求解,或者求解难度很大。
  • 可行性:EM算法通过迭代“可以”达到理论最优点。

References

[1] https://www.zhihu.com/question/40797593/answer/275171156
[2] Do C B, Batzoglou S. What is the expectation maximization algorithm?[J]. Nature biotechnology, 2008, 26(8): 897-899.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值