EM算法的推导、证明和例子

EM

推导 1

X = { x 1 , x 2 , ⋯   , x N } \mathcal{X}=\{x_1, x_2, \cdots, x_N\} X={x1,x2,,xN}: 观察数据

z: 潜在变量

似然估计函数
log ⁡ P ( X ; θ ) = log ⁡ ∏ i N P ( x i ; θ ) = ∑ i N log ⁡ P ( x i ; θ ) (1) \log P(\mathcal{X};\theta)=\log \prod_i^N P(x_i;\theta) = \sum_i^N \log P(x_i;\theta) \tag{1} logP(X;θ)=logiNP(xi;θ)=iNlogP(xi;θ)(1)
我们的目标是找到最大化似然估计函数的参数 θ \theta θ
arg ⁡ max ⁡ θ log ⁡ P ( X ; θ ) (2) \underset{\theta}{\arg \max} \log P(\mathcal{X};\theta) \tag{2} θargmaxlogP(X;θ)(2)

现有
P ( x , z ; θ ) = P ( x ; θ ) P ( z ∣ x ; θ ) (3) P(x,z;\theta) = P(x;\theta)P(z|x;\theta) \tag{3} P(x,z;θ)=P(x;θ)P(zx;θ)(3)
因此
log ⁡ P ( x ; θ ) = log ⁡ P ( x , z ; θ ) P ( z ∣ x ; θ ) = log ⁡ P ( x , z ; θ ) − log ⁡ P ( z ∣ x ; θ ) (4) \log P(x;\theta) = \log \frac{P(x,z;\theta)}{P(z|x;\theta)} = \log P(x,z;\theta) - \log P(z|x;\theta) \tag{4} logP(x;θ)=logP(zx;θ)P(x,z;θ)=logP(x,z;θ)logP(zx;θ)(4)
假设z属于 Q ( z ; ϕ ) Q(z;\phi) Q(z;ϕ)分布
log ⁡ P ( x ; θ ) = log ⁡ P ( x , z ; θ ) − log ⁡ P ( z ∣ x ; θ ) − log ⁡ Q ( z ; ϕ ) − log ⁡ Q ( z ; ϕ ) = ( log ⁡ P ( x , z ; θ ) − log ⁡ Q ( z ; ϕ ) ) − ( log ⁡ P ( z ∣ x ; θ ) − log ⁡ Q ( z ; ϕ ) ) = log ⁡ P ( x , z ; θ ) Q ( z ; ϕ ) − log ⁡ log ⁡ P ( z ∣ x ; θ ) Q ( z ; ϕ ) (5) \begin{aligned} \log P(x;\theta) & = \log P(x,z;\theta) - \log P(z|x;\theta) - \log Q(z;\phi) - \log Q(z;\phi) \\ & = (\log P(x,z;\theta) - \log Q(z;\phi)) - (\log P(z|x;\theta) - \log Q(z;\phi) ) \\ & = \log \frac{P(x,z;\theta)}{Q(z;\phi)} - \log \frac{\log P(z|x;\theta)}{Q(z;\phi)} \end{aligned} \tag{5} logP(x;θ)=logP(x,z;θ)logP(zx;θ)logQ(z;ϕ)logQ(z;ϕ)=(logP(x,z;θ)logQ(z;ϕ))(logP(zx;θ)logQ(z;ϕ))=logQ(z;ϕ)P(x,z;θ)logQ(z;ϕ)logP(zx;θ)(5)
公式5左右两边求关于z的期望
left = ∫ z Q ( z ; ϕ ) log ⁡ P ( x ; θ ) d z = log ⁡ P ( x ; θ ) ∫ z Q ( z ; ϕ ) d z = log ⁡ P ( x ; θ ) (6) \text{left} = \int_z Q(z;\phi) \log P(x;\theta) dz = \log P(x;\theta) \int_z Q(z;\phi) dz = \log P(x;\theta) \tag{6} left=zQ(z;ϕ)logP(x;θ)dz=logP(x;θ)zQ(z;ϕ)dz=logP(x;θ)(6)
因为 ∫ z Q ( z ; ϕ ) d z = 1 \int_z Q(z;\phi) dz = 1 zQ(z;ϕ)dz=1

right = ∫ z Q ( z ; ϕ ) log ⁡ P ( x , z ; θ ) Q ( z ; ϕ ) d z − ∫ z Q ( z ; ϕ ) log ⁡ log ⁡ P ( z ∣ x ; θ ) Q ( z ; ϕ ) d z = ELBO + KL ( Q ( z ; ϕ ) ∣ ∣ P ( z ∣ x ; θ ) ) (7) \begin{aligned} \text{right} & = \int_z Q(z;\phi) \log \frac{P(x,z;\theta)}{Q(z;\phi)} dz - \int_z Q(z;\phi) \log \frac{\log P(z|x;\theta)}{Q(z;\phi)} dz \\ & = \text{ELBO} + \text{KL}(Q(z;\phi)||P(z|x;\theta)) \end{aligned} \tag{7} right=zQ(z;ϕ)logQ(z;ϕ)P(x,z;θ)dzzQ(z;ϕ)logQ(z;ϕ)logP(zx;θ)dz=ELBO+KL(Q(z;ϕ)P(zx;θ))(7)
公式7第一项称为ELBO(evidence lower bound),第二项是KL散度。
因此,我们得到
log ⁡ P ( x ; θ ) = ELBO + KL ( Q ( z ; ϕ ) ∣ ∣ P ( z ∣ x ; θ ) ) (8) \log P(x;\theta) = \text{ELBO} + \text{KL}(Q(z;\phi)||P(z|x;\theta)) \tag{8} logP(x;θ)=ELBO+KL(Q(z;ϕ)P(zx;θ))(8)
因为 KL ( ⋅ ) ≥ 0 \text{KL}(\cdot) \ge 0 KL()0,因此 log ⁡ P ( x ; θ ) ≥ ELBO \log P(x;\theta) \ge \text{ELBO} logP(x;θ)ELBO,当且仅当 Q ( z ; ϕ ) = P ( z ∣ x ; θ ) Q(z;\phi) = P(z|x;\theta) Q(z;ϕ)=P(zx;θ)时,等号成立。ELBO相当于一个下界,不断地提高ELBO,就能不断提高 log ⁡ P ( x ; θ ) \log P(x;\theta) logP(x;θ),达到我们的目的——最大化似然估计函数。

假设我们有 θ ( t ) \theta^{(t)} θ(t),我们想要最大化ELBO,即最小化 KL ( Q ( z ; ϕ ) ∣ ∣ P ( z ∣ x ; θ ) ) \text{KL}(Q(z;\phi)||P(z|x;\theta)) KL(Q(z;ϕ)P(zx;θ)):
ϕ ( t ) = arg ⁡ min ⁡ ϕ KL ( Q ( z ; ϕ ) ∣ ∣ P ( z ∣ x ; θ ( t ) ) ) = arg ⁡ max ⁡ ϕ ELBO ( ϕ , θ ( t ) ) (9) \phi^{(t)} = \underset{\phi}{\arg \min} \text{KL}(Q(z;\phi)||P(z|x;\theta^{(t)})) = \underset{\phi}{\arg \max} \text{ELBO}(\phi, \theta^{(t)}) \tag{9} ϕ(t)=ϕargminKL(Q(z;ϕ)P(zx;θ(t)))=ϕargmaxELBO(ϕ,θ(t))(9)
当得到最优的 ϕ ( t ) \phi^{(t)} ϕ(t),有 Q ( z ; ϕ ( t ) ) = P ( z ∣ x ; θ t ) Q(z;\phi^{(t)}) = P(z|x;\theta^{t}) Q(z;ϕ(t))=P(zx;θt)。实际情况下很难得到最优的 ϕ ( t ) \phi^{(t)} ϕ(t),我们的目的时尽可能最大化ELBO。当我们计算出ELBO后,我们可以反过来求 θ ( t + 1 ) = arg ⁡ max ⁡ θ ELBO ( ϕ ( t ) , θ ) \theta^{(t+1)} = \underset{\theta}{\arg \max}\text{ELBO}(\phi^{(t)}, \theta) θ(t+1)=θargmaxELBO(ϕ(t),θ)

通过不断重复这个两个最大化的过程,我们就可以近似地求出最大化似然估计函数的参数 θ \theta θ

继续看看 ELBO ( ϕ ( t ) , θ ) \text{ELBO}(\phi^{(t)}, \theta) ELBO(ϕ(t),θ)

ELBO ( ϕ ( t ) , θ ) = ∫ z Q ( z ; ϕ ( t ) ) log ⁡ P ( x , z ; θ ) Q ( z ; ϕ ( t ) ) d z = ∫ z Q ( z ; ϕ ( t ) ) log ⁡ P ( x , z ; θ ) d z − ∫ z Q ( z ; ϕ ( t ) ) log ⁡ Q ( z ; ϕ ( t ) d z = E z ∽ Q ( z ; ϕ ( t ) ) [ log ⁡ P ( x , z ; θ ) ] − E z ∽ Q ( z ; ϕ ( t ) ) [ log ⁡ Q ( z ; ϕ ( t ) ] (10) \begin{aligned} \text{ELBO}(\phi^{(t)}, \theta) &= \int_z Q(z;\phi^{(t)}) \log \frac{P(x,z;\theta)}{Q(z;\phi^{(t)})} dz \\ &= \int_z Q(z;\phi^{(t)}) \log P(x,z;\theta) dz - \int_z Q(z;\phi^{(t)}) \log Q(z;\phi^{(t)} dz \\ &= E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] - E_{z\backsim Q(z;\phi^{(t)})}[\log Q(z;\phi^{(t)}] \end{aligned} \tag{10} ELBO(ϕ(t),θ)=zQ(z;ϕ(t))logQ(z;ϕ(t))P(x,z;θ)dz=zQ(z;ϕ(t))logP(x,z;θ)dzzQ(z;ϕ(t))logQ(z;ϕ(t)dz=EzQ(z;ϕ(t))[logP(x,z;θ)]EzQ(z;ϕ(t))[logQ(z;ϕ(t)](10)
公式10的最后一行的第一项是关于z的期望,第二项是一个常数( ϕ ( t ) \phi^{(t)} ϕ(t)已知),所以
θ ( t + 1 ) = arg ⁡ max ⁡ θ ELBO ( ϕ ( t ) , θ ) = arg ⁡ max ⁡ θ E z ∽ Q ( z ; ϕ ( t ) ) [ log ⁡ P ( x , z ; θ ) ] (11) \begin{aligned} \theta^{(t+1)} &= \underset{\theta}{\arg \max}\text{ELBO}(\phi^{(t)}, \theta) \\ &= \underset{\theta}{\arg \max} E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] \end{aligned} \tag{11} θ(t+1)=θargmaxELBO(ϕ(t),θ)=θargmaxEzQ(z;ϕ(t))[logP(x,z;θ)](11)
因此EM算法叫做期望最大化算法。

总结,EM算法的迭代过程如下

  • E-step: 固定 θ ( t ) \theta^{(t)} θ(t) ϕ ( t ) = arg ⁡ max ⁡ ϕ E z ∽ Q ( z ; ϕ ( t ) ) [ log ⁡ P ( x , z ; θ ) ] \phi^{(t)}=\underset{\phi}{\arg \max} E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] ϕ(t)=ϕargmaxEzQ(z;ϕ(t))[logP(x,z;θ)]
  • M-step: 固定 ϕ ( t ) \phi^{(t)} ϕ(t) θ ( t + 1 ) = arg ⁡ max ⁡ θ E z ∽ Q ( z ; ϕ ( t ) ) [ log ⁡ P ( x , z ; θ ) ] \theta^{(t+1)} = \underset{\theta}{\arg \max}E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] θ(t+1)=θargmaxEzQ(z;ϕ(t))[logP(x,z;θ)]

E-step和M-step的顺序可以互换。

EM算法收敛性证明1

简单的证明:只要 θ ( t ) → θ ( t + 1 ) , log ⁡ P ( x ; θ ( t ) ) ≤ log ⁡ P ( x ; θ ( t + 1 ) ) \theta^{(t)} \to \theta^{(t+1)}, \log P(x;\theta^{(t)}) \le \log P(x;\theta^{(t+1)}) θ(t)θ(t+1),logP(x;θ(t))logP(x;θ(t+1)),就能保证算法收敛。

从公式4出发,两边求关于z的期望
left = ∫ z Q ( z ; ϕ ( t ) ) log ⁡ P ( x ; θ ) d z = log ⁡ P ( x ; θ ) ∫ z Q ( z ; ϕ ( t ) ) d z = log ⁡ P ( x ; θ ) (12) \begin{aligned} \text{left} &= \int_z Q(z;\phi^{(t)}) \log P(x;\theta) dz \\ &= \log P(x;\theta) \int_z Q(z; \phi^{(t)}) dz \\ &= \log P(x;\theta) \end{aligned} \tag{12} left=zQ(z;ϕ(t))logP(x;θ)dz=logP(x;θ)zQ(z;ϕ(t))dz=logP(x;θ)(12)

right = ∫ z Q ( z ; ϕ ( t ) ) log ⁡ P ( x , z ; θ ) d z − ∫ z Q ( z ; ϕ ( t ) ) log ⁡ P ( z ∣ x ; θ ) d z (13) \text{right} = \int_z Q(z;\phi^{(t)}) \log P(x,z;\theta)dz - \int_z Q(z;\phi^{(t)}) \log P(z|x;\theta)dz \tag{13} right=zQ(z;ϕ(t))logP(x,z;θ)dzzQ(z;ϕ(t))logP(zx;θ)dz(13)

因为 ϕ ( t ) \phi^{(t)} ϕ(t)根据公式9求解得到的,假设我们得到的是最优解,则 Q ( z ; ϕ ( t ) ) = P ( z ∣ x ; θ t ) Q(z;\phi^{(t)}) = P(z|x;\theta^{t}) Q(z;ϕ(t))=P(zx;θt),代入公式13得
right = ∫ z P ( z ∣ x ; θ ( t ) ) log ⁡ P ( x , z ; θ ) d z − ∫ z P ( z ∣ x ; θ ( t ) ) log ⁡ P ( z ∣ x ; θ ) d z = H 1 ( θ , θ ( t ) ) − H 2 ( θ , θ ( t ) ) (14) \begin{aligned} \text{right} &= \int_z P(z|x;\theta^{(t)}) \log P(x,z;\theta)dz - \int_z P(z|x;\theta^{(t)}) \log P(z|x;\theta)dz \\ &= H_1 (\theta, \theta^{(t)}) - H_2 (\theta, \theta^{(t)}) \end{aligned} \tag{14} right=zP(zx;θ(t))logP(x,z;θ)dzzP(zx;θ(t))logP(zx;θ)dz=H1(θ,θ(t))H2(θ,θ(t))(14)
我们分别用 H 1 H_1 H1 H 2 H_2 H2指代公式14的两项。

由公式12和14得
log ⁡ P ( x ; θ ) = H 1 ( θ , θ ( t ) ) − H 2 ( θ , θ ( t ) ) (15) \log P(x;\theta) = H_1 (\theta, \theta^{(t)}) - H_2 (\theta, \theta^{(t)}) \tag{15} logP(x;θ)=H1(θ,θ(t))H2(θ,θ(t))(15)

因为 θ ( t + 1 ) = arg ⁡ max ⁡ θ E z ∽ Q ( z ; ϕ ( t ) ) [ log ⁡ P ( x , z ; θ ) ] \theta^{(t+1)} = \underset{\theta}{\arg \max}E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] θ(t+1)=θargmaxEzQ(z;ϕ(t))[logP(x,z;θ)],所以有 H 1 ( θ ( t + 1 ) , θ ( t ) ) ≥ H 1 ( θ ( t ) , θ ( t ) ) H_1 (\theta^{(t+1)}, \theta^{(t)}) \ge H_1 (\theta^{(t)}, \theta^{(t)}) H1(θ(t+1),θ(t))H1(θ(t),θ(t))。接下来,只要证明 − H 2 ( θ ( t + 1 ) , θ ( t ) ) ≥ − H 2 ( θ ( t ) , θ ( t ) ) -H_2(\theta^{(t+1)}, \theta^{(t)}) \ge -H_2(\theta^{(t)}, \theta^{(t)}) H2(θ(t+1),θ(t))H2(θ(t),θ(t)),就能证明 log ⁡ P ( x ; θ ( t + 1 ) ) ≥ log ⁡ P ( x ; θ ( t + 1 ) ) \log P(x;\theta^{(t+1)}) \ge \log P(x;\theta^{(t+1)}) logP(x;θ(t+1))logP(x;θ(t+1))

现在
H 2 ( θ ( t + 1 ) , θ ( t ) ) − H 2 ( θ ( t ) , θ ( t ) ) = ∫ z P ( z ∣ x ; θ ( t ) ) log ⁡ P ( z ∣ x ; θ ( t + 1 ) ) d z − ∫ z P ( z ∣ x ; θ ( t ) ) log ⁡ P ( z ∣ x ; θ ( t ) ) d z = ∫ z P ( z ∣ x ; θ ( t ) ) log ⁡ P ( z ∣ x ; θ ( t + 1 ) ) P ( z ∣ x ; θ ( t ) ) d z (16) \begin{aligned} & H_2(\theta^{(t+1)}, \theta^{(t)}) - H_2(\theta^{(t)}, \theta^{(t)}) \\ =& \int_z P(z|x;\theta^{(t)}) \log P(z|x; \theta^{(t+1)}) dz - \int_z P(z|x;\theta^{(t)}) \log P(z|x; \theta^{(t)}) dz \\ =& \int_z P(z|x;\theta^{(t)}) \log \frac{P(z|x; \theta^{(t+1)})}{P(z|x; \theta^{(t)})} dz \end{aligned} \tag{16} ==H2(θ(t+1),θ(t))H2(θ(t),θ(t))zP(zx;θ(t))logP(zx;θ(t+1))dzzP(zx;θ(t))logP(zx;θ(t))dzzP(zx;θ(t))logP(zx;θ(t))P(zx;θ(t+1))dz(16)
证明公式16小于等于0:

方法1:公式16是负KL散度 − K L ( P ( z ∣ x ; θ ( t ) ) ∣ ∣ P ( z ∣ x ; θ ( t + 1 ) ) ) ≤ 0 -KL(P(z|x;\theta^{(t)})||P(z|x;\theta^{(t+1)})) \le 0 KL(P(zx;θ(t))P(zx;θ(t+1)))0

方法2:公式16等于 E z ∽ P ( z ∣ x , θ ( t ) ) [ log ⁡ P ( z ∣ x ; θ ( t + 1 ) ) P ( z ∣ x ; θ ( t ) ) ] E_{z \backsim P(z|x,\theta^{(t)})}[\log \frac{P(z|x; \theta^{(t+1)})}{P(z|x; \theta^{(t)})}] EzP(zx,θ(t))[logP(zx;θ(t))P(zx;θ(t+1))]。根据Jensen不等式 E [ log ⁡ ( x ) ] ≤ log ⁡ E [ x ] E[\log (x)] \le \log E[x] E[log(x)]logE[x],因此
E z ∽ P ( z ∣ x , θ ( t ) ) [ log ⁡ P ( z ∣ x ; θ ( t + 1 ) ) P ( z ∣ x ; θ ( t ) ) ] ≤ log ⁡ E z ∽ P ( z ∣ x , θ ( t ) ) [ P ( z ∣ x ; θ ( t + 1 ) ) P ( z ∣ x ; θ ( t ) ) ] = log ⁡ ∫ z P ( z ∣ x ; θ ( t ) ) P ( z ∣ x ; θ ( t + 1 ) ) P ( z ∣ x ; θ ( t ) ) d z = log ⁡ ∫ z P ( z ∣ x ; θ ( t + 1 ) ) d z = log ⁡ 1 = 0 (17) \begin{aligned} & E_{z \backsim P(z|x,\theta^{(t)})}[\log \frac{P(z|x; \theta^{(t+1)})}{P(z|x; \theta^{(t)})}] \\ \le & \log E_{z \backsim P(z|x,\theta^{(t)})}[\frac{P(z|x; \theta^{(t+1)})}{P(z|x; \theta^{(t)})}] \\ = & \log \int_z P(z|x;\theta^{(t)}) \frac{P(z|x; \theta^{(t+1)})}{P(z|x; \theta^{(t)})} dz \\ =& \log \int_z P(z|x; \theta^{(t+1)}) dz \\ =& \log 1 = 0 \end{aligned} \tag{17} ===EzP(zx,θ(t))[logP(zx;θ(t))P(zx;θ(t+1))]logEzP(zx,θ(t))[P(zx;θ(t))P(zx;θ(t+1))]logzP(zx;θ(t))P(zx;θ(t))P(zx;θ(t+1))dzlogzP(zx;θ(t+1))dzlog1=0(17)

例子 2

抛硬币,有两个硬币,但是两个硬币的材质不同导致其出现正反面的概率不一样,目前我们只有一组观测数据,要求出每一种硬币投掷时正面向上的概率。总共投了五轮,每轮投掷五次。假设我们不知道每一次投掷用的是哪一种硬币,等于是现在的问题加上了一个隐变量,就是每一次选取的硬币的种类。
抛硬币例子
(图片来自https://blog.csdn.net/u010834867/article/details/90762296)

设两个硬币分别是AB,P(正|A)= x 1 x_1 x1,P(反|A)= 1 − x 1 1-x_1 1x1,P(正|B)= x 2 x_2 x2,P(正|B)= x 2 x_2 x2

假设第i次实验选择硬币A的概率是 P ( z i = A ) = y i P(z_{i}=A)=y_i P(zi=A)=yi,选择硬币B的概率是 P ( z i = B ) = 1 − y i P(z_i=B)=1-y_i P(zi=B)=1yi

看实验i的数据j,用 x i j x_{ij} xij表示,似然估计函数为
log ⁡ P ( x ) = log ⁡ ∏ i ∏ j P ( x i j ) = ∑ i ∑ j log ⁡ P ( x i j ) \log P(x) = \log \prod_i \prod_j P(x_{ij}) = \sum_i \sum_j \log P(x_{ij}) logP(x)=logijP(xij)=ijlogP(xij)
P ( x i j ) = P ( x , z ) P ( z ∣ x ) = P ( z ) P ( x ∣ z ) P ( z ∣ x ) P(x_{ij})=\frac{P(x,z)}{P(z|x)}=\frac{P(z)P(x|z)}{P(z|x)} P(xij)=P(zx)P(x,z)=P(zx)P(z)P(xz)

其中 P ( z ∣ x ) P(z|x) P(zx)不好求出来。我们使用EM算法来解 x 1 x_1 x1 x 2 x_2 x2

首先,我们求出期望,
E z ∽ Q ( z ; ϕ ( t ) ) [ log ⁡ P ( x , z ; θ ) ] = y 1 log ⁡ y 1 ( x 1 x 1 ( 1 − x 1 ) x 1 ( 1 − x 1 ) ) + ( 1 − y 1 ) log ⁡ ( 1 − y 1 ) ( x 2 x 2 ( 1 − x 2 ) x 2 ( 1 − x 2 ) ) + y 2 log ⁡ y 2 ( ( 1 − x 1 ) ( 1 − x 1 ) x 1 x 1 ( 1 − x 1 ) ) + ( 1 − y 2 ) log ⁡ ( 1 − y 2 ) ( ( 1 − x 2 ) ( 1 − x 2 ) x 2 x 2 ( 1 − x 2 ) ) + y 3 log ⁡ y 3 ( x 1 ( 1 − x 1 ) ( 1 − x 1 ) ( 1 − x 1 ) ( 1 − x 1 ) ) + ( 1 − y 3 ) log ⁡ ( 1 − y 3 ) ( x 2 ( 1 − x 2 ) ( 1 − x 2 ) ( 1 − x 2 ) ( 1 − x 2 ) ) + y 4 log ⁡ y 4 ( x 1 ( 1 − x 1 ) ( 1 − x 1 ) x 1 x 1 ) + ( 1 − y 4 ) log ⁡ ( 1 − y 4 ) ( x 2 ( 1 − x 2 ) ( 1 − x 2 ) x 2 x 2 ) + y 5 log ⁡ y 5 ( ( 1 − x 1 ) x 1 x 1 ( 1 − x 1 ) ( 1 − x 1 ) ) + ( 1 − y 5 ) log ⁡ ( 1 − y 5 ) ( ( 1 − x 2 ) x 2 x 2 ( 1 − x 2 ) ( 1 − x 2 ) ) \begin{aligned} & E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] \\ =& y_1 \log y_1(x_1 x_1 (1-x_1) x_1 (1-x_1)) &+& (1-y_1)\log (1-y_1)(x_2 x_2 (1-x_2) x_2 (1-x_2)) \\ +& y_2 \log y_2((1-x_1)(1-x_1)x_1 x_1(1-x_1)) &+& (1-y_2)\log (1-y_2)((1-x_2)(1-x_2)x_2 x_2(1-x_2)) \\ +& y_3 \log y_3(x_1 (1-x_1)(1-x_1)(1-x_1)(1-x_1)) &+& (1-y_3)\log (1-y_3)(x_2 (1-x_2)(1-x_2)(1-x_2)(1-x_2)) \\ +& y_4 \log y_4(x_1 (1-x_1)(1-x_1) x_1 x_1) &+& (1-y_4) \log (1-y_4)(x_2 (1-x_2)(1-x_2) x_2 x_2) \\ +& y_5 \log y_5((1-x_1)x_1 x_1 (1-x_1) (1-x_1)) &+& (1-y_5) \log (1-y_5)((1-x_2)x_2 x_2 (1-x_2) (1-x_2)) \end{aligned} =++++EzQ(z;ϕ(t))[logP(x,z;θ)]y1logy1(x1x1(1x1)x1(1x1))y2logy2((1x1)(1x1)x1x1(1x1))y3logy3(x1(1x1)(1x1)(1x1)(1x1))y4logy4(x1(1x1)(1x1)x1x1)y5logy5((1x1)x1x1(1x1)(1x1))+++++(1y1)log(1y1)(x2x2(1x2)x2(1x2))(1y2)log(1y2)((1x2)(1x2)x2x2(1x2))(1y3)log(1y3)(x2(1x2)(1x2)(1x2)(1x2))(1y4)log(1y4)(x2(1x2)(1x2)x2x2)(1y5)log(1y5)((1x2)x2x2(1x2)(1x2))

假设 x 1 = 0.2 , x 2 = 0.7 x_1=0.2,x_2=0.7 x1=0.2,x2=0.7,代入上式得
E z ∽ Q ( z ; ϕ ( t ) ) [ log ⁡ P ( x , z ; θ ) ] = y 1 log ⁡ 0.00512 y 1 + ( 1 − y 1 ) log ⁡ 0.03087 ( 1 − y 1 ) + y 2 log ⁡ 0.02048 y 2 + ( 1 − y 2 ) log ⁡ 0.01323 ( 1 − y 2 ) + y 3 log ⁡ 0.08192 y 3 + ( 1 − y 3 ) log ⁡ 0.00567 ( 1 − y 3 ) + y 4 log ⁡ 0.00512 y 4 + ( 1 − y 4 ) log ⁡ 0.03087 ( 1 − y 4 ) + y 5 log ⁡ 0.02048 y 5 + ( 1 − y 5 ) log ⁡ 0.01323 ( 1 − y 5 ) \begin{aligned} & E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] \\ =& y_1 \log 0.00512 y_1 &+& (1-y_1)\log 0.03087(1-y_1) \\ +& y_2 \log 0.02048 y_2 &+& (1-y_2)\log 0.01323(1-y_2) \\ +& y_3 \log 0.08192 y_3 &+& (1-y_3)\log 0.00567 (1-y_3) \\ +& y_4 \log 0.00512 y_4 &+& (1-y_4) \log 0.03087 (1-y_4) \\ +& y_5 \log 0.02048 y_5 &+& (1-y_5) \log 0.01323(1-y_5) \end{aligned} =++++EzQ(z;ϕ(t))[logP(x,z;θ)]y1log0.00512y1y2log0.02048y2y3log0.08192y3y4log0.00512y4y5log0.02048y5+++++(1y1)log0.03087(1y1)(1y2)log0.01323(1y2)(1y3)log0.00567(1y3)(1y4)log0.03087(1y4)(1y5)log0.01323(1y5)
当正面概率已知
(图片来自https://blog.csdn.net/u010834867/article/details/90762296)

现在求
max ⁡ E z ∽ Q ( z ; ϕ ( t ) ) [ log ⁡ P ( x , z ; θ ) ] \max E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] maxEzQ(z;ϕ(t))[logP(x,z;θ)]

为了简单运行,我们取 Q ( z ; ϕ ( t ) ) Q(z;\phi^{(t)}) Q(z;ϕ(t)) y = { 0 , 1 , 1 , 0 , 1 } y=\{0,1,1,0,1\} y={0,1,1,0,1},即 z = { B , A , A , B , A } z=\{B,A,A,B,A\} z={B,A,A,B,A}。虽然求出来的期望不是最大的,但不影响算法的收敛。因为z的结果已经固定了,可以直接计算 θ ( t + 1 ) \theta^{(t+1)} θ(t+1)
x 1 = ( 2 + 1 + 2 ) / 15 = 0.33 , x 2 = ( 3 + 3 ) / 10 = 0.6 x_1 = (2+1+2) / 15 = 0.33, x_2 = (3+3) / 10 = 0.6 x1=(2+1+2)/15=0.33,x2=(3+3)/10=0.6

接着不断迭代,直到z或者 x 1 x_1 x1 x 2 x_2 x2收敛。

若有不恰当之处,请指正。


  1. https://www.bilibili.com/video/av31906558?p=1 ↩︎ ↩︎

  2. https://blog.csdn.net/u010834867/article/details/90762296 ↩︎

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值