前言:学生时代入门机器学习的时候接触的EM算法,当时感觉这后面应该有一套数学逻辑来约束EM算法的可行性。最近偶然在知乎上拜读了史博大佬的《EM算法理解的九层境界》[1],顿时感觉自己还是局限了。重新学习思考了一段时间,对EM算法有了更深的理解。
四、数学推导理解EM算法的形式
接下来我们通过严格的数学推导来理解EM算法形式式子的含义。接着上面给出的EM算法的式子:
θ
(
t
+
1
)
=
arg max
θ
Q
(
θ
∣
θ
(
t
)
)
=
arg max
θ
E
Z
∣
X
,
θ
(
t
)
[
log
L
(
θ
;
X
,
Z
)
]
\theta^{(t+1)}=\argmax_{\theta}Q(\theta|\theta^{(t)})=\argmax_{\theta}E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)]
θ(t+1)=θargmaxQ(θ∣θ(t))=θargmaxEZ∣X,θ(t)[logL(θ;X,Z)]
下面我们一步一步进行推导。
首先隐变量
Z
=
[
z
1
,
z
2
,
.
.
.
,
z
n
]
,
z
i
∈
{
0
,
1
}
Z=[z_1, z_2, ..., z_n], z_i\in\{0,1\}
Z=[z1,z2,...,zn],zi∈{0,1},分量
z
i
=
1
z_i=1
zi=1表示
i
i
i轮次使用的是硬币
A
A
A,分量
z
i
=
0
z_i=0
zi=0表示
i
i
i轮次使用的是硬币
B
B
B,一共进行了
n
n
n轮次的投币行为。那么隐变量
Z
Z
Z的变量空间为
{
0
,
1
}
n
\{0, 1\}^n
{0,1}n,一共
2
n
2^n
2n个离散点组成的离散空间。于是我们有:
E
Z
∣
X
,
θ
(
t
)
[
log
L
(
θ
;
X
,
Z
)
]
=
∑
Z
∈
{
0
,
1
}
n
[
P
(
Z
∣
X
,
θ
(
t
)
)
∗
log
L
(
θ
;
X
,
Z
)
]
E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)]=\sum_{Z \in \{0, 1\}^n}[P(Z|X, \theta^{(t)}) * \log L(\theta;X,Z)]
EZ∣X,θ(t)[logL(θ;X,Z)]=Z∈{0,1}n∑[P(Z∣X,θ(t))∗logL(θ;X,Z)]
其中,隐变量
z
i
z_i
zi之间是独立同分布的,所以我们有:
P
(
Z
∣
X
,
θ
(
t
)
)
=
∏
i
=
1
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
P(Z|X, \theta^{(t)}) = \prod_{i=1}^{n}P(z_i|x_i, \theta^{(t)})
P(Z∣X,θ(t))=i=1∏nP(zi∣xi,θ(t))
对于每一个轮次
i
i
i而言,在给定观测变量
x
i
∈
[
0
,
δ
]
x_i \in [0, \delta]
xi∈[0,δ]和硬币的概率参数
θ
(
t
)
\theta^{(t)}
θ(t)(即
θ
A
(
t
)
\theta_A^{(t)}
θA(t)、
θ
B
(
t
)
\theta_B^{(t)}
θB(t)和
θ
C
(
t
)
\theta_C^{(t)}
θC(t))之后,我们有:
P
(
z
i
∣
x
i
,
θ
(
t
)
)
=
P
(
z
i
,
x
i
∣
θ
(
t
)
)
P
(
x
i
∣
θ
(
t
)
)
=
P
(
z
i
,
x
i
∣
θ
(
t
)
)
∑
z
i
∈
0
,
1
P
(
z
i
,
x
i
∣
θ
(
t
)
)
\begin{aligned} P(z_i|x_i, \theta^{(t)}) & = \frac{P(z_i, x_i | \theta^{(t)})}{P(x_i| \theta^{(t)})} \\ & = \frac{P(z_i, x_i|\theta^{(t)})}{\sum_{z_i\in{0,1}} P(z_i, x_i| \theta^{(t)})} \end{aligned}
P(zi∣xi,θ(t))=P(xi∣θ(t))P(zi,xi∣θ(t))=∑zi∈0,1P(zi,xi∣θ(t))P(zi,xi∣θ(t))
其中,给定
θ
(
t
)
\theta^{(t)}
θ(t)之后,
(
z
i
,
x
i
)
(z_i, x_i)
(zi,xi)的联合概率分布可以直接给出来,
P
(
z
i
=
1
,
x
i
∣
θ
(
t
)
)
P(z_i=1, x_i | \theta^{(t)})
P(zi=1,xi∣θ(t))为第
i
i
i轮次是硬币
A
A
A且抛出了
x
i
x_i
xi个正面的概率:
P
(
z
i
=
1
,
x
i
∣
θ
(
t
)
)
=
P
(
z
i
=
1
∣
θ
(
t
)
)
∗
P
(
x
i
∣
z
i
=
1
,
θ
(
t
)
)
=
P
(
z
i
=
1
∣
θ
(
t
)
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
=
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
\begin{aligned} P(z_i=1, x_i | \theta^{(t)}) & = P(z_i=1 | \theta^{(t))} * P(x_i | z_i=1, \theta^{(t)})\\ & = P(z_i=1 | \theta^{(t)})* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} \\ & = \theta_C^{(t)} * (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} \end{aligned}
P(zi=1,xi∣θ(t))=P(zi=1∣θ(t))∗P(xi∣zi=1,θ(t))=P(zi=1∣θ(t))∗(θA(t))xi∗(1−θA(t))δ−xi=θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi
P ( z i = 0 , x i , θ ( t ) ) P(z_i=0, x_i, \theta^{(t)}) P(zi=0,xi,θ(t))为第 i i i轮次是硬币 B B B且抛出了 x i x_i xi个正面的概率:
P ( z i = 0 , x i ∣ θ ( t ) ) = P ( z i = 0 ∣ θ ( t ) ) ∗ P ( x i ∣ z i = 0 , θ ( t ) ) = P ( z i = 0 ∣ θ ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i = ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i \begin{aligned} P(z_i=0, x_i| \theta^{(t)}) & = P(z_i=0|\theta^{(t)}) * P(x_i|z_i=0, \theta^{(t)})\\ & = P(z_i=0| \theta^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} \\ & = (1-\theta_C^{(t)}) * (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} \end{aligned} P(zi=0,xi∣θ(t))=P(zi=0∣θ(t))∗P(xi∣zi=0,θ(t))=P(zi=0∣θ(t))∗(θB(t))xi∗(1−θB(t))δ−xi=(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi
接下来,给定
x
i
,
θ
(
t
)
x_i, \theta^{(t)}
xi,θ(t)情况下轮次
i
i
i是硬币A的概率
P
(
z
i
=
1
∣
x
i
,
θ
(
t
)
)
P(z_i=1|x_i, \theta^{(t)})
P(zi=1∣xi,θ(t))和是硬币B的概率
P
(
z
i
=
0
∣
x
i
,
θ
(
t
)
)
P(z_i=0|x_i, \theta^{(t)})
P(zi=0∣xi,θ(t))如下:
P
(
z
i
=
1
∣
x
i
,
θ
(
t
)
)
=
P
(
z
i
=
1
,
x
i
∣
θ
(
t
)
)
P
(
z
i
=
1
,
x
i
∣
θ
(
t
)
)
+
P
(
z
i
=
0
,
x
i
∣
θ
(
t
)
)
=
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
+
(
1
−
θ
C
(
t
)
)
∗
(
θ
B
(
t
)
)
x
i
∗
(
1
−
θ
B
(
t
)
)
δ
−
x
i
\begin{aligned} P(z_i=1|x_i, \theta^{(t)}) & = \frac{P(z_i=1, x_i | \theta^{(t)})}{P(z_i=1, x_i | \theta^{(t)}) + P(z_i=0, x_i | \theta^{(t)})} \\ & = \frac{\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}}{\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}+(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}} \end{aligned}
P(zi=1∣xi,θ(t))=P(zi=1,xi∣θ(t))+P(zi=0,xi∣θ(t))P(zi=1,xi∣θ(t))=θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi+(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xiθC(t)∗(θA(t))xi∗(1−θA(t))δ−xi
P ( z i = 0 ∣ x i , θ ( t ) ) = P ( z i = 0 , x i ∣ θ ( t ) ) P ( z i = 1 , x i ∣ θ ( t ) ) + P ( z i = 0 , x i ∣ θ ( t ) ) = ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i + ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i \begin{aligned} P(z_i=0|x_i, \theta^{(t)}) & = \frac{P(z_i=0, x_i|\theta^{(t)})}{P(z_i=1, x_i | \theta^{(t)}) + P(z_i=0, x_i | \theta^{(t)})} \\ & = \frac{(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}}{\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}+(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}} \end{aligned} P(zi=0∣xi,θ(t))=P(zi=1,xi∣θ(t))+P(zi=0,xi∣θ(t))P(zi=0,xi∣θ(t))=θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi+(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi
两个式子的分母是一样的,我们记
η
i
=
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
+
(
1
−
θ
C
(
t
)
)
∗
(
θ
B
(
t
)
)
x
i
∗
(
1
−
θ
B
(
t
)
)
δ
−
x
i
\eta_i=\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}+(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}
ηi=θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi+(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi
则有:
P
(
z
i
=
0
∣
x
i
,
θ
(
t
)
)
=
1
η
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
P(z_i=0|x_i, \theta^{(t)})=\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}
P(zi=0∣xi,θ(t))=ηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi
P
(
z
i
=
1
∣
x
i
,
θ
(
t
)
)
=
1
η
i
(
1
−
θ
C
(
t
)
)
∗
(
θ
B
(
t
)
)
x
i
∗
(
1
−
θ
B
(
t
)
)
δ
−
x
i
P(z_i=1|x_i, \theta^{(t)})=\frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}
P(zi=1∣xi,θ(t))=ηi1(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi
变量
(
Z
∣
X
,
θ
(
t
)
)
(Z|X, \theta^{(t)})
(Z∣X,θ(t))的分布我们表示出来了,下一步来求联合分布
(
θ
;
X
,
Z
)
(\theta;X,Z)
(θ;X,Z)的log-likelyhood:
log
L
(
θ
;
X
,
Z
)
\log L(\theta;X,Z)
logL(θ;X,Z)。
对于每一轮次的likelyhood,我们有:
L
(
θ
;
x
i
,
z
i
)
=
P
(
x
i
∣
z
i
,
θ
)
∗
P
(
z
i
∣
θ
)
∗
P
(
θ
)
=
(
z
i
∗
θ
A
+
(
1
−
z
i
)
θ
B
)
x
i
∗
(
1
−
z
i
∗
θ
A
−
(
1
−
z
i
)
θ
B
)
(
δ
−
x
i
)
∗
θ
C
z
i
∗
(
1
−
θ
C
)
1
−
z
i
∗
1
\begin{aligned} L(\theta; x_i, z_i) & = P(x_i|z_i, \theta) * P(z_i|\theta) * P(\theta) \\ &= (z_i*\theta_A+(1-z_i)\theta_B)^{x_i}*(1-z_i*\theta_A-(1-z_i)\theta_B)^{(\delta-x_i)} * \theta_C^{z_i} * (1-\theta_C)^{1-z_i} * 1 \end{aligned}
L(θ;xi,zi)=P(xi∣zi,θ)∗P(zi∣θ)∗P(θ)=(zi∗θA+(1−zi)θB)xi∗(1−zi∗θA−(1−zi)θB)(δ−xi)∗θCzi∗(1−θC)1−zi∗1
同样,由于每个轮次都是独立同分布的,我们有:
L
(
θ
;
X
,
Z
)
=
∏
i
=
1
n
L
(
θ
;
x
i
,
z
i
)
=
∏
i
=
1
n
(
z
i
∗
θ
A
+
(
1
−
z
i
)
θ
B
)
x
i
∗
(
1
−
z
i
∗
θ
A
−
(
1
−
z
i
)
θ
B
)
(
δ
−
x
i
)
∗
θ
C
z
i
∗
(
1
−
θ
C
)
1
−
z
i
\begin{aligned} L(\theta;X,Z) & =\prod_{i=1}^{n} L(\theta; x_i, z_i)\\ &=\prod_{i=1}^{n}(z_i*\theta_A+(1-z_i)\theta_B)^{x_i}*(1-z_i*\theta_A-(1-z_i)\theta_B)^{(\delta-x_i)}*\theta_C^{z_i} * (1-\theta_C)^{1-z_i} \end{aligned}
L(θ;X,Z)=i=1∏nL(θ;xi,zi)=i=1∏n(zi∗θA+(1−zi)θB)xi∗(1−zi∗θA−(1−zi)θB)(δ−xi)∗θCzi∗(1−θC)1−zi
log
\log
log一下我们有:
log
L
(
θ
;
X
,
Z
)
=
log
∏
i
=
1
n
(
z
i
∗
θ
A
+
(
1
−
z
i
)
θ
B
)
x
i
∗
(
1
−
z
i
∗
θ
A
−
(
1
−
z
i
)
θ
B
)
(
δ
−
x
i
)
∗
θ
C
z
i
∗
(
1
−
θ
C
)
1
−
z
i
=
∑
i
=
1
n
[
x
i
∗
log
(
z
i
∗
θ
A
+
(
1
−
z
i
)
θ
B
)
+
(
δ
−
x
i
)
∗
log
(
1
−
z
i
∗
θ
A
−
(
1
−
z
i
)
θ
B
)
+
z
i
∗
log
θ
C
+
(
1
−
z
i
)
∗
log
(
1
−
θ
C
)
]
\begin{aligned} \log L(\theta;X,Z) &= \log \prod_{i=1}^{n}(z_i*\theta_A+(1-z_i)\theta_B)^{x_i}*(1-z_i*\theta_A-(1-z_i)\theta_B)^{(\delta-x_i)}*\theta_C^{z_i} * (1-\theta_C)^{1-z_i}\\ &= \sum_{i=1}^{n}[x_i*\log(z_i*\theta_A+(1-z_i)\theta_B) + (\delta-x_i)*\log(1-z_i*\theta_A-(1-z_i)\theta_B)+z_i*\log \theta_C+(1-z_i)*\log (1-\theta_C)] \end{aligned}
logL(θ;X,Z)=logi=1∏n(zi∗θA+(1−zi)θB)xi∗(1−zi∗θA−(1−zi)θB)(δ−xi)∗θCzi∗(1−θC)1−zi=i=1∑n[xi∗log(zi∗θA+(1−zi)θB)+(δ−xi)∗log(1−zi∗θA−(1−zi)θB)+zi∗logθC+(1−zi)∗log(1−θC)]
OK,到这一步
log
L
(
θ
;
X
,
Z
)
\log L(\theta;X,Z)
logL(θ;X,Z)和
P
(
Z
∣
X
,
θ
(
t
)
)
P(Z|X, \theta^{(t)})
P(Z∣X,θ(t))我们都表示出来了,接下来就是对他们求个期望:
E
Z
∣
X
,
θ
(
t
)
[
log
L
(
θ
;
X
,
Z
)
]
=
∑
Z
∈
{
0
,
1
}
n
[
P
(
Z
∣
X
,
θ
(
t
)
)
∗
log
L
(
θ
;
X
,
Z
)
]
=
∑
Z
∈
{
0
,
1
}
n
∏
i
=
1
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
∗
∑
j
=
1
n
l
(
x
j
,
z
j
,
θ
)
\begin{aligned} E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] & =\sum_{Z \in \{0, 1\}^n}[P(Z|X, \theta^{(t)}) * \log L(\theta;X,Z)] \\ &=\sum_{Z \in \{0, 1\}^n} \prod_{i=1}^{n} P(z_i|x_i, \theta^{(t)})*\sum_{j=1}^{n}l(x_j, z_j, \theta) \end{aligned}
EZ∣X,θ(t)[logL(θ;X,Z)]=Z∈{0,1}n∑[P(Z∣X,θ(t))∗logL(θ;X,Z)]=Z∈{0,1}n∑i=1∏nP(zi∣xi,θ(t))∗j=1∑nl(xj,zj,θ)
注意,这里我们为了防止变量重复,把
log
L
(
θ
;
X
,
Z
)
\log L(\theta;X,Z)
logL(θ;X,Z)中的遍历变量
i
i
i替换为了
j
j
j。其中
l
(
x
j
,
z
j
,
Q
)
=
x
j
∗
log
(
z
j
∗
θ
A
+
(
1
−
z
j
)
θ
B
)
+
(
δ
−
x
j
)
∗
log
(
1
−
z
j
∗
θ
A
−
(
1
−
z
j
)
θ
B
)
+
z
j
∗
log
θ
C
+
(
1
−
z
j
)
∗
log
(
1
−
θ
C
)
l(x_j, z_j, Q) = x_j*\log(z_j*\theta_A+(1-z_j)\theta_B) + (\delta-x_j)*\log(1-z_j*\theta_A-(1-z_j)\theta_B)+z_j*\log \theta_C+(1-z_j)*\log (1-\theta_C)
l(xj,zj,Q)=xj∗log(zj∗θA+(1−zj)θB)+(δ−xj)∗log(1−zj∗θA−(1−zj)θB)+zj∗logθC+(1−zj)∗log(1−θC)
P
(
z
i
=
0
∣
x
i
,
θ
(
t
)
)
=
1
η
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
P(z_i=0|x_i, \theta^{(t)})=\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}
P(zi=0∣xi,θ(t))=ηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi
P
(
z
i
=
1
∣
x
i
,
θ
(
t
)
)
=
1
η
i
(
1
−
θ
C
(
t
)
)
∗
(
θ
B
(
t
)
)
x
i
∗
(
1
−
θ
B
(
t
)
)
δ
−
x
i
P(z_i=1|x_i, \theta^{(t)})=\frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}
P(zi=1∣xi,θ(t))=ηi1(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi
这里得到的 E Z ∣ X , θ ( t ) [ log L ( θ ; X , Z ) ] E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] EZ∣X,θ(t)[logL(θ;X,Z)]式子开起来很吓人,直接使用期望的性质可以将其化简为简单的形式,这里我们不使用独立分布期望的性质,直接使用一些数学技巧来进行化简。
对于所有的变量空间
Z
∈
{
0
,
1
}
n
Z \in \{0, 1\}^n
Z∈{0,1}n,我们将其拆分一下,
z
1
z_1
z1和
z
2
,
z
3
,
.
.
.
,
z
n
z2, z3, ..., z_n
z2,z3,...,zn分别表示为
z
1
z_1
z1和
Z
2
n
Z_2^n
Z2n,然后将其分为
z
1
=
1
z_1=1
z1=1和
z
1
=
0
z_1=0
z1=0两个部分。那么上面式子可以分解为:
E
Z
∣
X
,
θ
(
t
)
[
log
L
(
θ
;
X
,
Z
)
]
=
∑
z
1
=
1
,
Z
2
n
∈
{
0
,
1
}
n
−
1
∏
i
=
1
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
∗
∑
j
=
1
n
l
(
x
j
,
z
j
,
θ
)
+
∑
z
1
=
0
,
Z
2
n
∈
{
0
,
1
}
n
−
1
∏
i
=
1
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
∗
∑
j
=
1
n
l
(
x
j
,
z
j
,
θ
)
=
∑
z
1
=
1
,
Z
2
n
∈
{
0
,
1
}
n
−
1
∏
i
=
2
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
∗
P
(
z
1
=
1
∣
x
1
,
θ
(
t
)
)
∗
[
l
(
x
1
,
z
1
=
1
,
θ
)
+
∑
j
=
2
n
l
(
x
j
,
z
j
,
θ
)
]
+
∑
z
1
=
0
,
Z
2
n
∈
{
0
,
1
}
n
−
1
∏
i
=
2
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
∗
P
(
z
0
=
1
∣
x
1
,
θ
(
t
)
)
∗
[
l
(
x
1
,
z
1
=
0
,
θ
)
+
∑
j
=
2
n
l
(
x
j
,
z
j
,
θ
)
]
=
P
(
z
1
=
1
∣
x
1
,
θ
(
t
)
)
∗
l
(
x
1
,
z
1
=
1
,
θ
)
∗
∑
Z
2
n
∈
{
0
,
1
}
n
−
1
∏
i
=
2
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
+
P
(
z
1
=
1
∣
x
1
,
θ
(
t
)
)
∗
∑
Z
2
n
∈
{
0
,
1
}
n
−
1
∏
i
=
2
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
∗
∑
j
=
2
n
l
(
x
j
,
z
j
,
θ
)
+
P
(
z
1
=
0
∣
x
1
,
θ
(
t
)
)
∗
l
(
x
1
,
z
1
=
0
,
θ
)
∗
∑
Z
2
n
∈
{
0
,
1
}
n
−
1
∏
i
=
2
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
+
P
(
z
1
=
0
∣
x
1
,
θ
(
t
)
)
∗
∑
Z
2
n
∈
{
0
,
1
}
n
−
1
∏
i
=
2
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
∗
∑
j
=
2
n
l
(
x
j
,
z
j
,
θ
)
\begin{aligned} E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] &= \sum_{z1=1, Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=1}^{n} P(z_i|x_i, \theta^{(t)})*\sum_{j=1}^{n}l(x_j, z_j, \theta) \\ &+ \sum_{z1=0, Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=1}^{n} P(z_i|x_i, \theta^{(t)})*\sum_{j=1}^{n}l(x_j, z_j, \theta) \\ &= \sum_{z1=1, Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)})*P(z_1=1|x_1, \theta^{(t)})*[l(x_1, z_1=1, \theta)+\sum_{j=2}^{n}l(x_j, z_j, \theta)] \\ &+ \sum_{z1=0, Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)})*P(z_0=1|x_1, \theta^{(t)})*[l(x_1, z_1=0, \theta)+\sum_{j=2}^{n}l(x_j, z_j, \theta)] \\ & = P(z_1=1|x_1, \theta^{(t)}) * l(x_1, z_1=1, \theta) * \sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)}) \\ & + P(z_1=1|x_1, \theta^{(t)}) * \sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)}) * \sum_{j=2}^{n}l(x_j, z_j, \theta)\\ & + P(z_1=0|x_1, \theta^{(t)}) * l(x_1, z_1=0, \theta) * \sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)}) \\ & + P(z_1=0|x_1, \theta^{(t)}) * \sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)}) * \sum_{j=2}^{n}l(x_j, z_j, \theta) \end{aligned}
EZ∣X,θ(t)[logL(θ;X,Z)]=z1=1,Z2n∈{0,1}n−1∑i=1∏nP(zi∣xi,θ(t))∗j=1∑nl(xj,zj,θ)+z1=0,Z2n∈{0,1}n−1∑i=1∏nP(zi∣xi,θ(t))∗j=1∑nl(xj,zj,θ)=z1=1,Z2n∈{0,1}n−1∑i=2∏nP(zi∣xi,θ(t))∗P(z1=1∣x1,θ(t))∗[l(x1,z1=1,θ)+j=2∑nl(xj,zj,θ)]+z1=0,Z2n∈{0,1}n−1∑i=2∏nP(zi∣xi,θ(t))∗P(z0=1∣x1,θ(t))∗[l(x1,z1=0,θ)+j=2∑nl(xj,zj,θ)]=P(z1=1∣x1,θ(t))∗l(x1,z1=1,θ)∗Z2n∈{0,1}n−1∑i=2∏nP(zi∣xi,θ(t))+P(z1=1∣x1,θ(t))∗Z2n∈{0,1}n−1∑i=2∏nP(zi∣xi,θ(t))∗j=2∑nl(xj,zj,θ)+P(z1=0∣x1,θ(t))∗l(x1,z1=0,θ)∗Z2n∈{0,1}n−1∑i=2∏nP(zi∣xi,θ(t))+P(z1=0∣x1,θ(t))∗Z2n∈{0,1}n−1∑i=2∏nP(zi∣xi,θ(t))∗j=2∑nl(xj,zj,θ)
由概率分布之和为1的性质,我们有:
P
(
z
1
=
1
∣
x
1
,
θ
(
t
)
)
+
P
(
z
1
=
0
∣
x
1
,
θ
(
t
)
)
=
1
P(z_1=1|x_1, \theta^{(t)})+P(z_1=0|x_1, \theta^{(t)})=1
P(z1=1∣x1,θ(t))+P(z1=0∣x1,θ(t))=1
∑
Z
2
n
∈
{
0
,
1
}
n
−
1
∏
i
=
2
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
=
∑
Z
2
n
∈
{
0
,
1
}
n
−
1
P
(
Z
2
n
∣
X
2
n
,
θ
(
t
)
)
=
1
\sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)})=\sum_{Z_2^n \in \{0, 1\}^{n-1}}P(Z_2^n|X_2^n, \theta^{(t)})=1
Z2n∈{0,1}n−1∑i=2∏nP(zi∣xi,θ(t))=Z2n∈{0,1}n−1∑P(Z2n∣X2n,θ(t))=1
所以我们有:
E
Z
∣
X
,
θ
(
t
)
[
log
L
(
θ
;
X
,
Z
)
]
=
P
(
z
1
=
1
∣
x
1
,
θ
(
t
)
)
∗
l
(
x
1
,
z
1
=
1
,
θ
)
+
P
(
z
1
=
0
∣
x
1
,
θ
(
t
)
)
∗
l
(
x
1
,
z
1
=
0
,
θ
)
+
∑
Z
2
n
∈
{
0
,
1
}
n
−
1
∏
i
=
2
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
∗
∑
j
=
2
n
l
(
x
j
,
z
j
,
θ
)
=
P
(
z
1
=
1
∣
x
1
,
θ
(
t
)
)
∗
l
(
x
1
,
z
1
=
1
,
θ
)
+
P
(
z
1
=
0
∣
x
1
,
θ
(
t
)
)
∗
l
(
x
1
,
z
1
=
0
,
θ
)
+
P
(
z
2
=
1
∣
x
1
,
θ
(
t
)
)
∗
l
(
x
2
,
z
2
=
1
,
θ
)
+
P
(
z
2
=
0
∣
x
1
,
θ
(
t
)
)
∗
l
(
x
2
,
z
2
=
0
,
θ
)
+
∑
Z
3
n
∈
{
0
,
1
}
n
−
2
∏
i
=
3
n
P
(
z
i
∣
x
i
,
θ
(
t
)
)
∗
∑
j
=
3
n
l
(
x
j
,
z
j
,
θ
)
=
.
.
.
.
.
.
=
∑
i
=
1
n
P
(
z
i
=
1
∣
x
i
,
θ
(
t
)
)
∗
l
(
x
i
,
z
i
=
1
,
θ
)
+
∑
i
=
1
n
P
(
z
i
=
0
∣
x
i
,
θ
(
t
)
)
∗
l
(
x
i
,
z
i
=
0
,
θ
)
\begin{aligned} E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] &= P(z_1=1|x_1, \theta^{(t)}) * l(x_1, z_1=1, \theta) \\ &+ P(z_1=0|x_1, \theta^{(t)}) * l(x_1, z_1=0, \theta) \\ &+ \sum_{Z_2^n \in \{0, 1\}^{n-1}} \prod_{i=2}^{n} P(z_i|x_i, \theta^{(t)}) * \sum_{j=2}^{n}l(x_j, z_j,\theta)\\ &= P(z_1=1|x_1, \theta^{(t)}) * l(x_1, z_1=1, \theta) \\ &+ P(z_1=0|x_1, \theta^{(t)}) * l(x_1, z_1=0, \theta) \\ &+ P(z_2=1|x_1, \theta^{(t)}) * l(x_2, z_2=1, \theta) \\ &+ P(z_2=0|x_1, \theta^{(t)}) * l(x_2, z_2=0, \theta) \\ &+ \sum_{Z_3^n \in \{0, 1\}^{n-2}} \prod_{i=3}^{n} P(z_i|x_i, \theta^{(t)}) * \sum_{j=3}^{n}l(x_j, z_j, \theta)\\ &=......\\ &= \sum_{i=1}^n P(z_i=1|x_i, \theta^{(t)}) * l(x_i, z_i=1, \theta) + \sum_{i=1}^n P(z_i=0|x_i, \theta^{(t)}) * l(x_i, z_i=0, \theta) \end{aligned}
EZ∣X,θ(t)[logL(θ;X,Z)]=P(z1=1∣x1,θ(t))∗l(x1,z1=1,θ)+P(z1=0∣x1,θ(t))∗l(x1,z1=0,θ)+Z2n∈{0,1}n−1∑i=2∏nP(zi∣xi,θ(t))∗j=2∑nl(xj,zj,θ)=P(z1=1∣x1,θ(t))∗l(x1,z1=1,θ)+P(z1=0∣x1,θ(t))∗l(x1,z1=0,θ)+P(z2=1∣x1,θ(t))∗l(x2,z2=1,θ)+P(z2=0∣x1,θ(t))∗l(x2,z2=0,θ)+Z3n∈{0,1}n−2∑i=3∏nP(zi∣xi,θ(t))∗j=3∑nl(xj,zj,θ)=......=i=1∑nP(zi=1∣xi,θ(t))∗l(xi,zi=1,θ)+i=1∑nP(zi=0∣xi,θ(t))∗l(xi,zi=0,θ)
最终我们求得了需要maximum的式子:
E
Z
∣
X
,
θ
(
t
)
[
log
L
(
θ
;
X
,
Z
)
]
=
∑
i
=
1
n
P
(
z
i
=
1
∣
x
i
,
θ
(
t
)
)
∗
l
(
x
i
,
z
i
=
1
,
Q
)
+
∑
i
=
1
n
P
(
z
i
=
0
∣
x
i
,
θ
(
t
)
)
∗
l
(
x
i
,
z
i
=
0
,
θ
)
E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] = \sum_{i=1}^n P(z_i=1|x_i, \theta^{(t)}) * l(x_i, z_i=1, Q) + \sum_{i=1}^n P(z_i=0|x_i, \theta^{(t)}) * l(x_i, z_i=0, \theta)
EZ∣X,θ(t)[logL(θ;X,Z)]=i=1∑nP(zi=1∣xi,θ(t))∗l(xi,zi=1,Q)+i=1∑nP(zi=0∣xi,θ(t))∗l(xi,zi=0,θ)
其中
l
(
x
j
,
z
j
,
θ
)
=
x
j
∗
log
(
z
j
∗
θ
A
+
(
1
−
z
j
)
θ
B
)
+
(
δ
−
x
j
)
∗
log
(
1
−
z
j
∗
θ
A
−
(
1
−
z
j
)
θ
B
)
+
z
j
∗
log
θ
C
+
(
1
−
z
j
)
∗
log
(
1
−
θ
C
)
l(x_j, z_j, \theta) = x_j*\log(z_j*\theta_A+(1-z_j)\theta_B) + (\delta-x_j)*\log(1-z_j*\theta_A-(1-z_j)\theta_B)+z_j*\log \theta_C+(1-z_j)*\log (1-\theta_C)
l(xj,zj,θ)=xj∗log(zj∗θA+(1−zj)θB)+(δ−xj)∗log(1−zj∗θA−(1−zj)θB)+zj∗logθC+(1−zj)∗log(1−θC)
P
(
z
i
=
0
∣
x
i
,
θ
(
t
)
)
=
1
η
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
P(z_i=0|x_i, \theta^{(t)})=\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}
P(zi=0∣xi,θ(t))=ηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi
P
(
z
i
=
1
∣
x
i
,
θ
(
t
)
)
=
1
η
i
(
1
−
θ
C
(
t
)
)
∗
(
θ
B
(
t
)
)
x
i
∗
(
1
−
θ
B
(
t
)
)
δ
−
x
i
P(z_i=1|x_i, \theta^{(t)})=\frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}
P(zi=1∣xi,θ(t))=ηi1(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi
把上述式子带入,有:
E
Z
∣
X
,
θ
(
t
)
[
log
L
(
θ
;
X
,
Z
)
]
=
∑
i
=
1
n
P
(
z
i
=
1
∣
x
i
,
θ
(
t
)
)
∗
l
(
x
i
,
z
i
=
1
,
θ
)
+
∑
i
=
1
n
P
(
z
i
=
0
∣
x
i
,
θ
(
t
)
)
∗
l
(
x
i
,
z
i
=
0
,
θ
)
=
∑
i
=
1
n
1
η
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
∗
[
x
i
∗
log
θ
A
+
(
δ
−
x
i
)
∗
log
(
1
−
θ
A
)
+
log
θ
C
]
+
∑
i
=
1
n
1
η
i
(
1
−
θ
C
(
t
)
)
∗
(
θ
B
(
t
)
)
x
i
∗
(
1
−
θ
B
(
t
)
)
δ
−
x
i
∗
[
x
i
∗
log
θ
B
+
(
δ
−
x
i
)
∗
log
(
1
−
θ
B
)
+
log
(
1
−
θ
C
)
]
\begin{aligned} E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)] & = \sum_{i=1}^n P(z_i=1|x_i, \theta^{(t)}) * l(x_i, z_i=1, \theta) + \sum_{i=1}^n P(z_i=0|x_i, \theta^{(t)}) * l(x_i, z_i=0, \theta) \\ &= \sum_{i=1}^n \frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} * [x_i*\log \theta_A + (\delta-x_i)*\log(1-\theta_A)+\log \theta_C] \\ &+\sum_{i=1}^n \frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} * [x_i*\log \theta_B + (\delta-x_i)*\log(1-\theta_B)+\log (1-\theta_C)] \end{aligned}
EZ∣X,θ(t)[logL(θ;X,Z)]=i=1∑nP(zi=1∣xi,θ(t))∗l(xi,zi=1,θ)+i=1∑nP(zi=0∣xi,θ(t))∗l(xi,zi=0,θ)=i=1∑nηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi∗[xi∗logθA+(δ−xi)∗log(1−θA)+logθC]+i=1∑nηi1(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi∗[xi∗logθB+(δ−xi)∗log(1−θB)+log(1−θC)]
其中,
η
i
=
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
+
(
1
−
θ
C
(
t
)
)
∗
(
θ
B
(
t
)
)
x
i
∗
(
1
−
θ
B
(
t
)
)
δ
−
x
i
\eta_i=\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}+(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}
ηi=θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi+(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi
可以看到这个式子中,我们已经把所有隐变量 Z Z Z积分消除掉了,到这里我们就可以进行求解了。我们需要求解
θ ( t + 1 ) = arg max θ Q ( θ ∣ θ ( t ) ) = arg max θ E Z ∣ X , θ ( t ) [ log L ( θ ; X , Z ) ] = arg max θ ∑ i = 1 n 1 η i θ C ( t ) ∗ ( θ A ( t ) ) x i ∗ ( 1 − θ A ( t ) ) δ − x i ∗ [ x i ∗ log θ A + ( δ − x i ) ∗ log ( 1 − θ A ) + log θ C ] + ∑ i = 1 n 1 η i ( 1 − θ C ( t ) ) ∗ ( θ B ( t ) ) x i ∗ ( 1 − θ B ( t ) ) δ − x i ∗ [ x i ∗ log θ B + ( δ − x i ) ∗ log ( 1 − θ B ) + log ( 1 − θ C ) ] \begin{aligned} \theta^{(t+1)}&=\argmax_{\theta}Q(\theta|\theta^{(t)})=\argmax_{\theta}E_{Z|X,\theta^{(t)}}[\log L(\theta;X,Z)]\\ &=\argmax_{\theta} \sum_{i=1}^n \frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} * [x_i*\log \theta_A + (\delta-x_i)*\log(1-\theta_A)+\log \theta_C] \\ &+\sum_{i=1}^n \frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i} * [x_i*\log \theta_B + (\delta-x_i)*\log(1-\theta_B)+\log (1-\theta_C)] \end{aligned} θ(t+1)=θargmaxQ(θ∣θ(t))=θargmaxEZ∣X,θ(t)[logL(θ;X,Z)]=θargmaxi=1∑nηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi∗[xi∗logθA+(δ−xi)∗log(1−θA)+logθC]+i=1∑nηi1(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi∗[xi∗logθB+(δ−xi)∗log(1−θB)+log(1−θC)]
其中,
(
θ
A
,
θ
B
)
(\theta_A, \theta_B)
(θA,θB)是需要求解的变量,其他
(
θ
A
(
t
)
,
θ
C
(
t
)
,
θ
C
(
t
)
)
,
η
i
(\theta_A^{(t)}, \theta_C^{(t)}, \theta_C^{(t)}), \eta_i
(θA(t),θC(t),θC(t)),ηi都是常量。这里我们联立偏导方程进行求解:
{
∂
Q
∂
θ
A
=
0
∂
Q
∂
θ
B
=
0
∂
Q
∂
θ
C
=
0
\{ \begin{aligned} \frac{\partial Q}{\partial \theta_A} = 0 \\ \frac{\partial Q}{\partial \theta_B} = 0 \\ \frac{\partial Q}{\partial \theta_C} = 0 \end{aligned}
{∂θA∂Q=0∂θB∂Q=0∂θC∂Q=0
一个一个来:
∂
Q
∂
θ
A
=
∑
i
=
1
n
1
η
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
∗
[
x
i
∗
1
θ
A
−
(
δ
−
x
i
)
∗
1
(
1
−
θ
A
)
]
=
∑
i
=
1
n
1
η
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
∗
[
x
i
∗
1
θ
A
−
(
δ
−
x
i
)
∗
1
(
1
−
θ
A
)
]
=
∑
i
=
1
n
1
η
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
∗
[
x
i
−
δ
∗
θ
A
θ
A
∗
(
1
−
θ
A
)
]
\begin{aligned} \frac{\partial Q}{\partial \theta_A} &= \sum_{i=1}^n \frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} * [x_i*\frac{1}{\theta_A} - (\delta-x_i)*\frac{1}{(1-\theta_A)}] \\ &=\sum_{i=1}^n \frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} * [x_i*\frac{1}{\theta_A} - (\delta-x_i)*\frac{1}{(1-\theta_A)}] \\ &=\sum_{i=1}^n \frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} * [\frac{x_i-\delta*\theta_A}{\theta_A*(1-\theta_A)}] \end{aligned}
∂θA∂Q=i=1∑nηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi∗[xi∗θA1−(δ−xi)∗(1−θA)1]=i=1∑nηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi∗[xi∗θA1−(δ−xi)∗(1−θA)1]=i=1∑nηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi∗[θA∗(1−θA)xi−δ∗θA]
令
∂
Q
∂
θ
A
=
0
\frac{\partial Q}{\partial \theta_A}=0
∂θA∂Q=0,我们有:
θ
A
(
t
+
1
)
=
∑
i
=
1
n
1
η
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
∗
x
i
∑
i
=
1
n
1
η
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
∗
δ
\theta_A^{(t+1)} = \frac{\sum_{i=1}^{n}\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}*x_i}{\sum_{i=1}^{n}\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}*\delta}
θA(t+1)=∑i=1nηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi∗δ∑i=1nηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi∗xi
同理我们有:
θ
B
(
t
+
1
)
=
∑
i
=
1
n
1
η
i
(
1
−
θ
C
(
t
)
)
∗
(
θ
B
(
t
)
)
x
i
∗
(
1
−
θ
B
(
t
)
)
δ
−
x
i
∗
x
i
∑
i
=
1
n
1
η
i
(
1
−
θ
C
(
t
)
)
∗
(
θ
B
(
t
)
)
x
i
∗
(
1
−
θ
B
(
t
)
)
δ
−
x
i
∗
δ
\theta_B^{(t+1)} = \frac{\sum_{i=1}^{n}\frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}*x_i}{\sum_{i=1}^{n}\frac{1}{\eta_i} (1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}*\delta}
θB(t+1)=∑i=1nηi1(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi∗δ∑i=1nηi1(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi∗xi
θ
C
(
t
+
1
)
=
∑
i
=
1
n
1
η
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
∑
i
=
1
n
1
η
i
[
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
+
(
1
−
θ
C
(
t
)
)
∗
(
θ
B
(
t
)
)
x
i
∗
(
1
−
θ
B
(
t
)
)
δ
−
x
i
]
=
1
n
∑
i
=
1
n
1
η
i
θ
C
(
t
)
∗
(
θ
A
(
t
)
)
x
i
∗
(
1
−
θ
A
(
t
)
)
δ
−
x
i
\begin{aligned} \theta_C^{(t+1)} &= \frac{\sum_{i=1}^{n}\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}}{\sum_{i=1}^{n}\frac{1}{\eta_i} [\theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i}+(1-\theta_C^{(t)})* (\theta_B^{(t)})^{x_i}*(1-\theta_B^{(t)})^{\delta - x_i}]} \\ &=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{\eta_i} \theta_C^{(t)}* (\theta_A^{(t)})^{x_i}*(1-\theta_A^{(t)})^{\delta - x_i} \end{aligned}
θC(t+1)=∑i=1nηi1[θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi+(1−θC(t))∗(θB(t))xi∗(1−θB(t))δ−xi]∑i=1nηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi=n1i=1∑nηi1θC(t)∗(θA(t))xi∗(1−θA(t))δ−xi
对比下前面直觉的结果,形式上非常接近:
θ
^
A
(
t
+
1
)
=
∑
i
=
1
n
x
i
∗
P
(
z
i
=
1
∣
∗
)
∑
i
=
1
n
δ
∗
P
(
z
i
=
1
∣
∗
)
\begin{aligned} \hat\theta_A^{(t+1)} & = \frac{\sum_{i=1}^{n}x_i * P(z_i=1|*)}{\sum_{i=1}^{n}\delta * P(z_i=1|*)} \end{aligned}
θ^A(t+1)=∑i=1nδ∗P(zi=1∣∗)∑i=1nxi∗P(zi=1∣∗)
θ
^
B
(
t
+
1
)
=
∑
i
=
1
n
x
i
∗
P
(
z
i
=
0
∣
∗
)
∑
i
=
1
n
δ
∗
P
(
z
i
=
0
∣
∗
)
\begin{aligned} \hat\theta_B^{(t+1)} & = \frac{\sum_{i=1}^{n}x_i * P(z_i=0|*)}{\sum_{i=1}^{n}\delta * P(z_i=0|*)} \end{aligned}
θ^B(t+1)=∑i=1nδ∗P(zi=0∣∗)∑i=1nxi∗P(zi=0∣∗)
θ
^
C
(
t
+
1
)
=
∑
i
=
1
n
P
(
z
i
=
1
∣
∗
)
∑
i
=
1
n
[
P
(
z
i
=
0
∣
∗
)
+
P
(
z
i
=
1
∣
∗
)
]
=
1
n
∑
i
=
1
n
P
(
z
i
=
1
∣
∗
)
\begin{aligned} \hat\theta_C^{(t+1)} & = \frac{\sum_{i=1}^{n} P(z_i=1|*)}{\sum_{i=1}^{n} [P(z_i=0|*)+P(z_i=1|*)]}=\frac{1}{n}\sum_{i=1}^{n} P(z_i=1|*) \end{aligned}
θ^C(t+1)=∑i=1n[P(zi=0∣∗)+P(zi=1∣∗)]∑i=1nP(zi=1∣∗)=n1i=1∑nP(zi=1∣∗)
结束语
到此为止是为EM算法的第一层境界,我们通过直觉和推导两条线从形式上理解了EM算法是什么,解决了What的问题。接下来我们需要解决的是Why的问题,即EM算法的必要性和可行性的问题,这就是EM算法的第二重境界的问题了:
- 必要性:原问题无法求解,或者求解难度很大。
- 可行性:EM算法通过迭代“可以”达到理论最优点。
References
[1] https://www.zhihu.com/question/40797593/answer/275171156
[2] Do C B, Batzoglou S. What is the expectation maximization algorithm?[J]. Nature biotechnology, 2008, 26(8): 897-899.