EM
推导 1
X = { x 1 , x 2 , ⋯ , x N } \mathcal{X}=\{x_1, x_2, \cdots, x_N\} X={x1,x2,⋯,xN}: 观察数据
z: 潜在变量
似然估计函数
log
P
(
X
;
θ
)
=
log
∏
i
N
P
(
x
i
;
θ
)
=
∑
i
N
log
P
(
x
i
;
θ
)
(1)
\log P(\mathcal{X};\theta)=\log \prod_i^N P(x_i;\theta) = \sum_i^N \log P(x_i;\theta) \tag{1}
logP(X;θ)=logi∏NP(xi;θ)=i∑NlogP(xi;θ)(1)
我们的目标是找到最大化似然估计函数的参数
θ
\theta
θ
arg
max
θ
log
P
(
X
;
θ
)
(2)
\underset{\theta}{\arg \max} \log P(\mathcal{X};\theta) \tag{2}
θargmaxlogP(X;θ)(2)
现有
P
(
x
,
z
;
θ
)
=
P
(
x
;
θ
)
P
(
z
∣
x
;
θ
)
(3)
P(x,z;\theta) = P(x;\theta)P(z|x;\theta) \tag{3}
P(x,z;θ)=P(x;θ)P(z∣x;θ)(3)
因此
log
P
(
x
;
θ
)
=
log
P
(
x
,
z
;
θ
)
P
(
z
∣
x
;
θ
)
=
log
P
(
x
,
z
;
θ
)
−
log
P
(
z
∣
x
;
θ
)
(4)
\log P(x;\theta) = \log \frac{P(x,z;\theta)}{P(z|x;\theta)} = \log P(x,z;\theta) - \log P(z|x;\theta) \tag{4}
logP(x;θ)=logP(z∣x;θ)P(x,z;θ)=logP(x,z;θ)−logP(z∣x;θ)(4)
假设z属于
Q
(
z
;
ϕ
)
Q(z;\phi)
Q(z;ϕ)分布
log
P
(
x
;
θ
)
=
log
P
(
x
,
z
;
θ
)
−
log
P
(
z
∣
x
;
θ
)
−
log
Q
(
z
;
ϕ
)
−
log
Q
(
z
;
ϕ
)
=
(
log
P
(
x
,
z
;
θ
)
−
log
Q
(
z
;
ϕ
)
)
−
(
log
P
(
z
∣
x
;
θ
)
−
log
Q
(
z
;
ϕ
)
)
=
log
P
(
x
,
z
;
θ
)
Q
(
z
;
ϕ
)
−
log
log
P
(
z
∣
x
;
θ
)
Q
(
z
;
ϕ
)
(5)
\begin{aligned} \log P(x;\theta) & = \log P(x,z;\theta) - \log P(z|x;\theta) - \log Q(z;\phi) - \log Q(z;\phi) \\ & = (\log P(x,z;\theta) - \log Q(z;\phi)) - (\log P(z|x;\theta) - \log Q(z;\phi) ) \\ & = \log \frac{P(x,z;\theta)}{Q(z;\phi)} - \log \frac{\log P(z|x;\theta)}{Q(z;\phi)} \end{aligned} \tag{5}
logP(x;θ)=logP(x,z;θ)−logP(z∣x;θ)−logQ(z;ϕ)−logQ(z;ϕ)=(logP(x,z;θ)−logQ(z;ϕ))−(logP(z∣x;θ)−logQ(z;ϕ))=logQ(z;ϕ)P(x,z;θ)−logQ(z;ϕ)logP(z∣x;θ)(5)
公式5左右两边求关于z的期望
left
=
∫
z
Q
(
z
;
ϕ
)
log
P
(
x
;
θ
)
d
z
=
log
P
(
x
;
θ
)
∫
z
Q
(
z
;
ϕ
)
d
z
=
log
P
(
x
;
θ
)
(6)
\text{left} = \int_z Q(z;\phi) \log P(x;\theta) dz = \log P(x;\theta) \int_z Q(z;\phi) dz = \log P(x;\theta) \tag{6}
left=∫zQ(z;ϕ)logP(x;θ)dz=logP(x;θ)∫zQ(z;ϕ)dz=logP(x;θ)(6)
因为
∫
z
Q
(
z
;
ϕ
)
d
z
=
1
\int_z Q(z;\phi) dz = 1
∫zQ(z;ϕ)dz=1。
right
=
∫
z
Q
(
z
;
ϕ
)
log
P
(
x
,
z
;
θ
)
Q
(
z
;
ϕ
)
d
z
−
∫
z
Q
(
z
;
ϕ
)
log
log
P
(
z
∣
x
;
θ
)
Q
(
z
;
ϕ
)
d
z
=
ELBO
+
KL
(
Q
(
z
;
ϕ
)
∣
∣
P
(
z
∣
x
;
θ
)
)
(7)
\begin{aligned} \text{right} & = \int_z Q(z;\phi) \log \frac{P(x,z;\theta)}{Q(z;\phi)} dz - \int_z Q(z;\phi) \log \frac{\log P(z|x;\theta)}{Q(z;\phi)} dz \\ & = \text{ELBO} + \text{KL}(Q(z;\phi)||P(z|x;\theta)) \end{aligned} \tag{7}
right=∫zQ(z;ϕ)logQ(z;ϕ)P(x,z;θ)dz−∫zQ(z;ϕ)logQ(z;ϕ)logP(z∣x;θ)dz=ELBO+KL(Q(z;ϕ)∣∣P(z∣x;θ))(7)
公式7第一项称为ELBO(evidence lower bound),第二项是KL散度。
因此,我们得到
log
P
(
x
;
θ
)
=
ELBO
+
KL
(
Q
(
z
;
ϕ
)
∣
∣
P
(
z
∣
x
;
θ
)
)
(8)
\log P(x;\theta) = \text{ELBO} + \text{KL}(Q(z;\phi)||P(z|x;\theta)) \tag{8}
logP(x;θ)=ELBO+KL(Q(z;ϕ)∣∣P(z∣x;θ))(8)
因为
KL
(
⋅
)
≥
0
\text{KL}(\cdot) \ge 0
KL(⋅)≥0,因此
log
P
(
x
;
θ
)
≥
ELBO
\log P(x;\theta) \ge \text{ELBO}
logP(x;θ)≥ELBO,当且仅当
Q
(
z
;
ϕ
)
=
P
(
z
∣
x
;
θ
)
Q(z;\phi) = P(z|x;\theta)
Q(z;ϕ)=P(z∣x;θ)时,等号成立。ELBO相当于一个下界,不断地提高ELBO,就能不断提高
log
P
(
x
;
θ
)
\log P(x;\theta)
logP(x;θ),达到我们的目的——最大化似然估计函数。
假设我们有
θ
(
t
)
\theta^{(t)}
θ(t),我们想要最大化ELBO,即最小化
KL
(
Q
(
z
;
ϕ
)
∣
∣
P
(
z
∣
x
;
θ
)
)
\text{KL}(Q(z;\phi)||P(z|x;\theta))
KL(Q(z;ϕ)∣∣P(z∣x;θ)):
ϕ
(
t
)
=
arg
min
ϕ
KL
(
Q
(
z
;
ϕ
)
∣
∣
P
(
z
∣
x
;
θ
(
t
)
)
)
=
arg
max
ϕ
ELBO
(
ϕ
,
θ
(
t
)
)
(9)
\phi^{(t)} = \underset{\phi}{\arg \min} \text{KL}(Q(z;\phi)||P(z|x;\theta^{(t)})) = \underset{\phi}{\arg \max} \text{ELBO}(\phi, \theta^{(t)}) \tag{9}
ϕ(t)=ϕargminKL(Q(z;ϕ)∣∣P(z∣x;θ(t)))=ϕargmaxELBO(ϕ,θ(t))(9)
当得到最优的
ϕ
(
t
)
\phi^{(t)}
ϕ(t),有
Q
(
z
;
ϕ
(
t
)
)
=
P
(
z
∣
x
;
θ
t
)
Q(z;\phi^{(t)}) = P(z|x;\theta^{t})
Q(z;ϕ(t))=P(z∣x;θt)。实际情况下很难得到最优的
ϕ
(
t
)
\phi^{(t)}
ϕ(t),我们的目的时尽可能最大化ELBO。当我们计算出ELBO后,我们可以反过来求
θ
(
t
+
1
)
=
arg
max
θ
ELBO
(
ϕ
(
t
)
,
θ
)
\theta^{(t+1)} = \underset{\theta}{\arg \max}\text{ELBO}(\phi^{(t)}, \theta)
θ(t+1)=θargmaxELBO(ϕ(t),θ)。
通过不断重复这个两个最大化的过程,我们就可以近似地求出最大化似然估计函数的参数 θ \theta θ。
继续看看 ELBO ( ϕ ( t ) , θ ) \text{ELBO}(\phi^{(t)}, \theta) ELBO(ϕ(t),θ)
ELBO
(
ϕ
(
t
)
,
θ
)
=
∫
z
Q
(
z
;
ϕ
(
t
)
)
log
P
(
x
,
z
;
θ
)
Q
(
z
;
ϕ
(
t
)
)
d
z
=
∫
z
Q
(
z
;
ϕ
(
t
)
)
log
P
(
x
,
z
;
θ
)
d
z
−
∫
z
Q
(
z
;
ϕ
(
t
)
)
log
Q
(
z
;
ϕ
(
t
)
d
z
=
E
z
∽
Q
(
z
;
ϕ
(
t
)
)
[
log
P
(
x
,
z
;
θ
)
]
−
E
z
∽
Q
(
z
;
ϕ
(
t
)
)
[
log
Q
(
z
;
ϕ
(
t
)
]
(10)
\begin{aligned} \text{ELBO}(\phi^{(t)}, \theta) &= \int_z Q(z;\phi^{(t)}) \log \frac{P(x,z;\theta)}{Q(z;\phi^{(t)})} dz \\ &= \int_z Q(z;\phi^{(t)}) \log P(x,z;\theta) dz - \int_z Q(z;\phi^{(t)}) \log Q(z;\phi^{(t)} dz \\ &= E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] - E_{z\backsim Q(z;\phi^{(t)})}[\log Q(z;\phi^{(t)}] \end{aligned} \tag{10}
ELBO(ϕ(t),θ)=∫zQ(z;ϕ(t))logQ(z;ϕ(t))P(x,z;θ)dz=∫zQ(z;ϕ(t))logP(x,z;θ)dz−∫zQ(z;ϕ(t))logQ(z;ϕ(t)dz=Ez∽Q(z;ϕ(t))[logP(x,z;θ)]−Ez∽Q(z;ϕ(t))[logQ(z;ϕ(t)](10)
公式10的最后一行的第一项是关于z的期望,第二项是一个常数(
ϕ
(
t
)
\phi^{(t)}
ϕ(t)已知),所以
θ
(
t
+
1
)
=
arg
max
θ
ELBO
(
ϕ
(
t
)
,
θ
)
=
arg
max
θ
E
z
∽
Q
(
z
;
ϕ
(
t
)
)
[
log
P
(
x
,
z
;
θ
)
]
(11)
\begin{aligned} \theta^{(t+1)} &= \underset{\theta}{\arg \max}\text{ELBO}(\phi^{(t)}, \theta) \\ &= \underset{\theta}{\arg \max} E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] \end{aligned} \tag{11}
θ(t+1)=θargmaxELBO(ϕ(t),θ)=θargmaxEz∽Q(z;ϕ(t))[logP(x,z;θ)](11)
因此EM算法叫做期望最大化算法。
总结,EM算法的迭代过程如下
- E-step: 固定 θ ( t ) \theta^{(t)} θ(t), ϕ ( t ) = arg max ϕ E z ∽ Q ( z ; ϕ ( t ) ) [ log P ( x , z ; θ ) ] \phi^{(t)}=\underset{\phi}{\arg \max} E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] ϕ(t)=ϕargmaxEz∽Q(z;ϕ(t))[logP(x,z;θ)];
- M-step: 固定 ϕ ( t ) \phi^{(t)} ϕ(t), θ ( t + 1 ) = arg max θ E z ∽ Q ( z ; ϕ ( t ) ) [ log P ( x , z ; θ ) ] \theta^{(t+1)} = \underset{\theta}{\arg \max}E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] θ(t+1)=θargmaxEz∽Q(z;ϕ(t))[logP(x,z;θ)]。
E-step和M-step的顺序可以互换。
EM算法收敛性证明1
简单的证明:只要 θ ( t ) → θ ( t + 1 ) , log P ( x ; θ ( t ) ) ≤ log P ( x ; θ ( t + 1 ) ) \theta^{(t)} \to \theta^{(t+1)}, \log P(x;\theta^{(t)}) \le \log P(x;\theta^{(t+1)}) θ(t)→θ(t+1),logP(x;θ(t))≤logP(x;θ(t+1)),就能保证算法收敛。
从公式4出发,两边求关于z的期望
left
=
∫
z
Q
(
z
;
ϕ
(
t
)
)
log
P
(
x
;
θ
)
d
z
=
log
P
(
x
;
θ
)
∫
z
Q
(
z
;
ϕ
(
t
)
)
d
z
=
log
P
(
x
;
θ
)
(12)
\begin{aligned} \text{left} &= \int_z Q(z;\phi^{(t)}) \log P(x;\theta) dz \\ &= \log P(x;\theta) \int_z Q(z; \phi^{(t)}) dz \\ &= \log P(x;\theta) \end{aligned} \tag{12}
left=∫zQ(z;ϕ(t))logP(x;θ)dz=logP(x;θ)∫zQ(z;ϕ(t))dz=logP(x;θ)(12)
right = ∫ z Q ( z ; ϕ ( t ) ) log P ( x , z ; θ ) d z − ∫ z Q ( z ; ϕ ( t ) ) log P ( z ∣ x ; θ ) d z (13) \text{right} = \int_z Q(z;\phi^{(t)}) \log P(x,z;\theta)dz - \int_z Q(z;\phi^{(t)}) \log P(z|x;\theta)dz \tag{13} right=∫zQ(z;ϕ(t))logP(x,z;θ)dz−∫zQ(z;ϕ(t))logP(z∣x;θ)dz(13)
因为
ϕ
(
t
)
\phi^{(t)}
ϕ(t)根据公式9求解得到的,假设我们得到的是最优解,则
Q
(
z
;
ϕ
(
t
)
)
=
P
(
z
∣
x
;
θ
t
)
Q(z;\phi^{(t)}) = P(z|x;\theta^{t})
Q(z;ϕ(t))=P(z∣x;θt),代入公式13得
right
=
∫
z
P
(
z
∣
x
;
θ
(
t
)
)
log
P
(
x
,
z
;
θ
)
d
z
−
∫
z
P
(
z
∣
x
;
θ
(
t
)
)
log
P
(
z
∣
x
;
θ
)
d
z
=
H
1
(
θ
,
θ
(
t
)
)
−
H
2
(
θ
,
θ
(
t
)
)
(14)
\begin{aligned} \text{right} &= \int_z P(z|x;\theta^{(t)}) \log P(x,z;\theta)dz - \int_z P(z|x;\theta^{(t)}) \log P(z|x;\theta)dz \\ &= H_1 (\theta, \theta^{(t)}) - H_2 (\theta, \theta^{(t)}) \end{aligned} \tag{14}
right=∫zP(z∣x;θ(t))logP(x,z;θ)dz−∫zP(z∣x;θ(t))logP(z∣x;θ)dz=H1(θ,θ(t))−H2(θ,θ(t))(14)
我们分别用
H
1
H_1
H1和
H
2
H_2
H2指代公式14的两项。
由公式12和14得
log
P
(
x
;
θ
)
=
H
1
(
θ
,
θ
(
t
)
)
−
H
2
(
θ
,
θ
(
t
)
)
(15)
\log P(x;\theta) = H_1 (\theta, \theta^{(t)}) - H_2 (\theta, \theta^{(t)}) \tag{15}
logP(x;θ)=H1(θ,θ(t))−H2(θ,θ(t))(15)
因为 θ ( t + 1 ) = arg max θ E z ∽ Q ( z ; ϕ ( t ) ) [ log P ( x , z ; θ ) ] \theta^{(t+1)} = \underset{\theta}{\arg \max}E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] θ(t+1)=θargmaxEz∽Q(z;ϕ(t))[logP(x,z;θ)],所以有 H 1 ( θ ( t + 1 ) , θ ( t ) ) ≥ H 1 ( θ ( t ) , θ ( t ) ) H_1 (\theta^{(t+1)}, \theta^{(t)}) \ge H_1 (\theta^{(t)}, \theta^{(t)}) H1(θ(t+1),θ(t))≥H1(θ(t),θ(t))。接下来,只要证明 − H 2 ( θ ( t + 1 ) , θ ( t ) ) ≥ − H 2 ( θ ( t ) , θ ( t ) ) -H_2(\theta^{(t+1)}, \theta^{(t)}) \ge -H_2(\theta^{(t)}, \theta^{(t)}) −H2(θ(t+1),θ(t))≥−H2(θ(t),θ(t)),就能证明 log P ( x ; θ ( t + 1 ) ) ≥ log P ( x ; θ ( t + 1 ) ) \log P(x;\theta^{(t+1)}) \ge \log P(x;\theta^{(t+1)}) logP(x;θ(t+1))≥logP(x;θ(t+1))。
现在
H
2
(
θ
(
t
+
1
)
,
θ
(
t
)
)
−
H
2
(
θ
(
t
)
,
θ
(
t
)
)
=
∫
z
P
(
z
∣
x
;
θ
(
t
)
)
log
P
(
z
∣
x
;
θ
(
t
+
1
)
)
d
z
−
∫
z
P
(
z
∣
x
;
θ
(
t
)
)
log
P
(
z
∣
x
;
θ
(
t
)
)
d
z
=
∫
z
P
(
z
∣
x
;
θ
(
t
)
)
log
P
(
z
∣
x
;
θ
(
t
+
1
)
)
P
(
z
∣
x
;
θ
(
t
)
)
d
z
(16)
\begin{aligned} & H_2(\theta^{(t+1)}, \theta^{(t)}) - H_2(\theta^{(t)}, \theta^{(t)}) \\ =& \int_z P(z|x;\theta^{(t)}) \log P(z|x; \theta^{(t+1)}) dz - \int_z P(z|x;\theta^{(t)}) \log P(z|x; \theta^{(t)}) dz \\ =& \int_z P(z|x;\theta^{(t)}) \log \frac{P(z|x; \theta^{(t+1)})}{P(z|x; \theta^{(t)})} dz \end{aligned} \tag{16}
==H2(θ(t+1),θ(t))−H2(θ(t),θ(t))∫zP(z∣x;θ(t))logP(z∣x;θ(t+1))dz−∫zP(z∣x;θ(t))logP(z∣x;θ(t))dz∫zP(z∣x;θ(t))logP(z∣x;θ(t))P(z∣x;θ(t+1))dz(16)
证明公式16小于等于0:
方法1:公式16是负KL散度 − K L ( P ( z ∣ x ; θ ( t ) ) ∣ ∣ P ( z ∣ x ; θ ( t + 1 ) ) ) ≤ 0 -KL(P(z|x;\theta^{(t)})||P(z|x;\theta^{(t+1)})) \le 0 −KL(P(z∣x;θ(t))∣∣P(z∣x;θ(t+1)))≤0。
方法2:公式16等于
E
z
∽
P
(
z
∣
x
,
θ
(
t
)
)
[
log
P
(
z
∣
x
;
θ
(
t
+
1
)
)
P
(
z
∣
x
;
θ
(
t
)
)
]
E_{z \backsim P(z|x,\theta^{(t)})}[\log \frac{P(z|x; \theta^{(t+1)})}{P(z|x; \theta^{(t)})}]
Ez∽P(z∣x,θ(t))[logP(z∣x;θ(t))P(z∣x;θ(t+1))]。根据Jensen不等式
E
[
log
(
x
)
]
≤
log
E
[
x
]
E[\log (x)] \le \log E[x]
E[log(x)]≤logE[x],因此
E
z
∽
P
(
z
∣
x
,
θ
(
t
)
)
[
log
P
(
z
∣
x
;
θ
(
t
+
1
)
)
P
(
z
∣
x
;
θ
(
t
)
)
]
≤
log
E
z
∽
P
(
z
∣
x
,
θ
(
t
)
)
[
P
(
z
∣
x
;
θ
(
t
+
1
)
)
P
(
z
∣
x
;
θ
(
t
)
)
]
=
log
∫
z
P
(
z
∣
x
;
θ
(
t
)
)
P
(
z
∣
x
;
θ
(
t
+
1
)
)
P
(
z
∣
x
;
θ
(
t
)
)
d
z
=
log
∫
z
P
(
z
∣
x
;
θ
(
t
+
1
)
)
d
z
=
log
1
=
0
(17)
\begin{aligned} & E_{z \backsim P(z|x,\theta^{(t)})}[\log \frac{P(z|x; \theta^{(t+1)})}{P(z|x; \theta^{(t)})}] \\ \le & \log E_{z \backsim P(z|x,\theta^{(t)})}[\frac{P(z|x; \theta^{(t+1)})}{P(z|x; \theta^{(t)})}] \\ = & \log \int_z P(z|x;\theta^{(t)}) \frac{P(z|x; \theta^{(t+1)})}{P(z|x; \theta^{(t)})} dz \\ =& \log \int_z P(z|x; \theta^{(t+1)}) dz \\ =& \log 1 = 0 \end{aligned} \tag{17}
≤===Ez∽P(z∣x,θ(t))[logP(z∣x;θ(t))P(z∣x;θ(t+1))]logEz∽P(z∣x,θ(t))[P(z∣x;θ(t))P(z∣x;θ(t+1))]log∫zP(z∣x;θ(t))P(z∣x;θ(t))P(z∣x;θ(t+1))dzlog∫zP(z∣x;θ(t+1))dzlog1=0(17)
例子 2
抛硬币,有两个硬币,但是两个硬币的材质不同导致其出现正反面的概率不一样,目前我们只有一组观测数据,要求出每一种硬币投掷时正面向上的概率。总共投了五轮,每轮投掷五次。假设我们不知道每一次投掷用的是哪一种硬币,等于是现在的问题加上了一个隐变量,就是每一次选取的硬币的种类。
(图片来自https://blog.csdn.net/u010834867/article/details/90762296)
设两个硬币分别是AB,P(正|A)= x 1 x_1 x1,P(反|A)= 1 − x 1 1-x_1 1−x1,P(正|B)= x 2 x_2 x2,P(正|B)= x 2 x_2 x2。
假设第i次实验选择硬币A的概率是 P ( z i = A ) = y i P(z_{i}=A)=y_i P(zi=A)=yi,选择硬币B的概率是 P ( z i = B ) = 1 − y i P(z_i=B)=1-y_i P(zi=B)=1−yi。
看实验i的数据j,用
x
i
j
x_{ij}
xij表示,似然估计函数为
log
P
(
x
)
=
log
∏
i
∏
j
P
(
x
i
j
)
=
∑
i
∑
j
log
P
(
x
i
j
)
\log P(x) = \log \prod_i \prod_j P(x_{ij}) = \sum_i \sum_j \log P(x_{ij})
logP(x)=logi∏j∏P(xij)=i∑j∑logP(xij)
P
(
x
i
j
)
=
P
(
x
,
z
)
P
(
z
∣
x
)
=
P
(
z
)
P
(
x
∣
z
)
P
(
z
∣
x
)
P(x_{ij})=\frac{P(x,z)}{P(z|x)}=\frac{P(z)P(x|z)}{P(z|x)}
P(xij)=P(z∣x)P(x,z)=P(z∣x)P(z)P(x∣z)
其中 P ( z ∣ x ) P(z|x) P(z∣x)不好求出来。我们使用EM算法来解 x 1 x_1 x1和 x 2 x_2 x2。
首先,我们求出期望,
E
z
∽
Q
(
z
;
ϕ
(
t
)
)
[
log
P
(
x
,
z
;
θ
)
]
=
y
1
log
y
1
(
x
1
x
1
(
1
−
x
1
)
x
1
(
1
−
x
1
)
)
+
(
1
−
y
1
)
log
(
1
−
y
1
)
(
x
2
x
2
(
1
−
x
2
)
x
2
(
1
−
x
2
)
)
+
y
2
log
y
2
(
(
1
−
x
1
)
(
1
−
x
1
)
x
1
x
1
(
1
−
x
1
)
)
+
(
1
−
y
2
)
log
(
1
−
y
2
)
(
(
1
−
x
2
)
(
1
−
x
2
)
x
2
x
2
(
1
−
x
2
)
)
+
y
3
log
y
3
(
x
1
(
1
−
x
1
)
(
1
−
x
1
)
(
1
−
x
1
)
(
1
−
x
1
)
)
+
(
1
−
y
3
)
log
(
1
−
y
3
)
(
x
2
(
1
−
x
2
)
(
1
−
x
2
)
(
1
−
x
2
)
(
1
−
x
2
)
)
+
y
4
log
y
4
(
x
1
(
1
−
x
1
)
(
1
−
x
1
)
x
1
x
1
)
+
(
1
−
y
4
)
log
(
1
−
y
4
)
(
x
2
(
1
−
x
2
)
(
1
−
x
2
)
x
2
x
2
)
+
y
5
log
y
5
(
(
1
−
x
1
)
x
1
x
1
(
1
−
x
1
)
(
1
−
x
1
)
)
+
(
1
−
y
5
)
log
(
1
−
y
5
)
(
(
1
−
x
2
)
x
2
x
2
(
1
−
x
2
)
(
1
−
x
2
)
)
\begin{aligned} & E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] \\ =& y_1 \log y_1(x_1 x_1 (1-x_1) x_1 (1-x_1)) &+& (1-y_1)\log (1-y_1)(x_2 x_2 (1-x_2) x_2 (1-x_2)) \\ +& y_2 \log y_2((1-x_1)(1-x_1)x_1 x_1(1-x_1)) &+& (1-y_2)\log (1-y_2)((1-x_2)(1-x_2)x_2 x_2(1-x_2)) \\ +& y_3 \log y_3(x_1 (1-x_1)(1-x_1)(1-x_1)(1-x_1)) &+& (1-y_3)\log (1-y_3)(x_2 (1-x_2)(1-x_2)(1-x_2)(1-x_2)) \\ +& y_4 \log y_4(x_1 (1-x_1)(1-x_1) x_1 x_1) &+& (1-y_4) \log (1-y_4)(x_2 (1-x_2)(1-x_2) x_2 x_2) \\ +& y_5 \log y_5((1-x_1)x_1 x_1 (1-x_1) (1-x_1)) &+& (1-y_5) \log (1-y_5)((1-x_2)x_2 x_2 (1-x_2) (1-x_2)) \end{aligned}
=++++Ez∽Q(z;ϕ(t))[logP(x,z;θ)]y1logy1(x1x1(1−x1)x1(1−x1))y2logy2((1−x1)(1−x1)x1x1(1−x1))y3logy3(x1(1−x1)(1−x1)(1−x1)(1−x1))y4logy4(x1(1−x1)(1−x1)x1x1)y5logy5((1−x1)x1x1(1−x1)(1−x1))+++++(1−y1)log(1−y1)(x2x2(1−x2)x2(1−x2))(1−y2)log(1−y2)((1−x2)(1−x2)x2x2(1−x2))(1−y3)log(1−y3)(x2(1−x2)(1−x2)(1−x2)(1−x2))(1−y4)log(1−y4)(x2(1−x2)(1−x2)x2x2)(1−y5)log(1−y5)((1−x2)x2x2(1−x2)(1−x2))
假设
x
1
=
0.2
,
x
2
=
0.7
x_1=0.2,x_2=0.7
x1=0.2,x2=0.7,代入上式得
E
z
∽
Q
(
z
;
ϕ
(
t
)
)
[
log
P
(
x
,
z
;
θ
)
]
=
y
1
log
0.00512
y
1
+
(
1
−
y
1
)
log
0.03087
(
1
−
y
1
)
+
y
2
log
0.02048
y
2
+
(
1
−
y
2
)
log
0.01323
(
1
−
y
2
)
+
y
3
log
0.08192
y
3
+
(
1
−
y
3
)
log
0.00567
(
1
−
y
3
)
+
y
4
log
0.00512
y
4
+
(
1
−
y
4
)
log
0.03087
(
1
−
y
4
)
+
y
5
log
0.02048
y
5
+
(
1
−
y
5
)
log
0.01323
(
1
−
y
5
)
\begin{aligned} & E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)] \\ =& y_1 \log 0.00512 y_1 &+& (1-y_1)\log 0.03087(1-y_1) \\ +& y_2 \log 0.02048 y_2 &+& (1-y_2)\log 0.01323(1-y_2) \\ +& y_3 \log 0.08192 y_3 &+& (1-y_3)\log 0.00567 (1-y_3) \\ +& y_4 \log 0.00512 y_4 &+& (1-y_4) \log 0.03087 (1-y_4) \\ +& y_5 \log 0.02048 y_5 &+& (1-y_5) \log 0.01323(1-y_5) \end{aligned}
=++++Ez∽Q(z;ϕ(t))[logP(x,z;θ)]y1log0.00512y1y2log0.02048y2y3log0.08192y3y4log0.00512y4y5log0.02048y5+++++(1−y1)log0.03087(1−y1)(1−y2)log0.01323(1−y2)(1−y3)log0.00567(1−y3)(1−y4)log0.03087(1−y4)(1−y5)log0.01323(1−y5)
(图片来自https://blog.csdn.net/u010834867/article/details/90762296)
现在求
max
E
z
∽
Q
(
z
;
ϕ
(
t
)
)
[
log
P
(
x
,
z
;
θ
)
]
\max E_{z\backsim Q(z;\phi^{(t)})}[\log P(x,z;\theta)]
maxEz∽Q(z;ϕ(t))[logP(x,z;θ)]
为了简单运行,我们取
Q
(
z
;
ϕ
(
t
)
)
Q(z;\phi^{(t)})
Q(z;ϕ(t))为
y
=
{
0
,
1
,
1
,
0
,
1
}
y=\{0,1,1,0,1\}
y={0,1,1,0,1},即
z
=
{
B
,
A
,
A
,
B
,
A
}
z=\{B,A,A,B,A\}
z={B,A,A,B,A}。虽然求出来的期望不是最大的,但不影响算法的收敛。因为z的结果已经固定了,可以直接计算
θ
(
t
+
1
)
\theta^{(t+1)}
θ(t+1):
x
1
=
(
2
+
1
+
2
)
/
15
=
0.33
,
x
2
=
(
3
+
3
)
/
10
=
0.6
x_1 = (2+1+2) / 15 = 0.33, x_2 = (3+3) / 10 = 0.6
x1=(2+1+2)/15=0.33,x2=(3+3)/10=0.6
接着不断迭代,直到z或者 x 1 x_1 x1和 x 2 x_2 x2收敛。
若有不恰当之处,请指正。