-
加入隐变量的联合概率,条件概率等为:
p ( d i , z k , w j ) = p ( d i ) p ( z k ∣ d i ) p ( w j ∣ z k ) p\left(d_{i}, z_{k}, w_{j}\right)=p\left(d_{i}\right) p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right) p(di,zk,wj)=p(di)p(zk∣di)p(wj∣zk)
P ( w j ∣ d i ) = ∑ k = 1 K P ( z k ∣ d i ) P ( w j ∣ z k ) P ( d i , w j ) = P ( d i ) ∑ k = 1 K P ( w j ∣ z k ) P ( z k ∣ d i ) \begin{array}{c} P\left(w_{j} | d_{i}\right)=\sum_{k=1}^{K} P\left(z_{k} | d_{i}\right) P\left(w_{j} | z_{k}\right) \\ P\left(d_{i}, w_{j}\right)=P\left(d_{i}\right) \sum_{k=1}^{K} P\left(w_{j} | z_{k}\right) P\left(z_{k} | d_{i}\right) \end{array} P(wj∣di)=∑k=1KP(zk∣di)P(wj∣zk)P(di,wj)=P(di)∑k=1KP(wj∣zk)P(zk∣di) -
得到对数似然函数:
L = ∑ i = 1 N ∑ j = 1 M [ n ( d i , w j ) log P ( d i ) + n ( d i , w j ) log ∑ k = 1 K P ( w j ∣ z k ) P ( z k ∣ d i ) ] L=\sum_{i=1}^{N} \sum_{j=1}^{M}\left[n\left(d_{i}, w_{j}\right) \log P\left(d_{i}\right)+n\left(d_{i}, w_{j}\right) \log \sum_{k=1}^{K} P\left(w_{j} | z_{k}\right) P\left(z_{k} | d_{i}\right)\right] L=i=1∑Nj=1∑M[n(di,wj)logP(di)+n(di,wj)logk=1∑KP(wj∣zk)P(zk∣di)] -
求E-step,即是求解后验概率,根据步骤一的已知可以得到:
γ ( z i j k ) = p ( z k ∣ d i , w j ) = p ( d i ) p ( z k ∣ d i ) p ( w j ∣ z k ) ∑ k = 1 K p ( d i ) p ( z k ∣ d i ) p ( w j ∣ z k ) \gamma\left(z_{i j k}\right)=p\left(z_{k} | d_{i}, w_{j}\right)=\frac{p\left(d_{i}\right) p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)}{\sum_{k=1}^{K} p\left(d_{i}\right) p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)} γ(zijk)=p(zk∣di,wj)=∑k=1Kp(di)p(zk∣di)p(wj∣zk)p(di)p(zk∣di)p(wj∣zk)
和 p ( d i ) p(d_i) p(di)参数无关,消去得到:
γ ( z i j k ) = p ( z k ∣ d i ) p ( w j ∣ z k ) ∑ k = 1 K p ( z k ∣ d i ) p ( w j ∣ z k ) \gamma\left(z_{i j k}\right)=\frac{p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)}{\sum_{k=1}^{K} p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)} γ(zijk)=∑k=1Kp(zk∣di)p(wj∣zk)p(zk∣di)p(wj∣zk) -
M-step
(1)求Q函数,对于一对样本而言,有期望函数为:
∑ k = 1 K γ ( z i j k ) log p ( d i , z k , w j ) = ∑ k = 1 K γ ( z i j k ) ( log p ( z k ∣ d i ) p ( w j ∣ z k ) + log p ( d i ) ) \begin{array}{l} \sum_{k=1}^{K} \gamma\left(z_{i j k}\right) \log p\left(d_{i}, z_{k}, w_{j}\right) =\sum_{k=1}^{K} \gamma\left(z_{i j k}\right)\left(\log p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)+\log p\left(d_{i}\right)\right) \end{array} ∑k=1Kγ(zijk)logp(di,zk,wj)=∑k=1Kγ(zijk)(logp(zk∣di)p(wj∣zk)+logp(di))
由于和单个样本的 l o g P ( d i ) logP(d_i) logP(di)为常数,可以不考虑在优化中,简化为:
∑ k = 1 K γ ( z i j k ) ( log p ( z k ∣ d i ) p ( w j ∣ z k ) ) \begin{array}{l} \sum_{k=1}^{K} \gamma\left(z_{i j k}\right)\left(\log p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)\right) \end{array} ∑k=1Kγ(zijk)(logp(zk∣di)p(wj∣zk))
(2)对全部样本有:
Q = ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) ∑ k = 1 K γ ( z i j k ) ( log p ( z k ∣ d i ) p ( w j ∣ z k ) ) Q=\sum_{i=1}^{N} \sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \sum_{k=1}^{K} \gamma\left(z_{i j k}\right)\left(\log p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)\right) Q=i=1∑Nj=1∑Mn(di,wj)k=1∑Kγ(zijk)(logp(zk∣di)p(wj∣zk))
(3)最大化Q函数,结合约束项 ∑ k = 1 K p ( z k ∣ d ) = 1 \sum_{k=1}^{K} p\left(z_{k} | d\right)=1 ∑k=1Kp(zk∣d)=1和约束项 ∑ w ∈ V p ( w ∣ z k ) = 1 \sum_{w \in V} p\left(w | z_{k}\right)=1 ∑w∈Vp(w∣zk)=1分别可求到如下:
1)对于
p
(
z
k
∣
d
i
)
p\left(z_{k} | d_{i}\right)
p(zk∣di),根据拉格朗日乘子法:
L
g
=
Q
(
θ
,
θ
o
l
d
)
+
λ
(
∑
k
=
1
K
p
(
z
k
∣
d
i
)
−
1
)
Lg=Q\left(\theta, \theta^{o l d}\right)+\lambda\left(\sum_{k=1}^{K} p\left(z_{k} | d_{i}\right)-1\right)
Lg=Q(θ,θold)+λ(k=1∑Kp(zk∣di)−1)
2)对
p
(
z
k
∣
d
i
)
p\left(z_{k} | d_{i}\right)
p(zk∣di)求偏导有,
−
∑
j
=
1
M
n
(
d
i
,
w
j
)
γ
(
z
i
j
k
)
=
λ
p
(
z
k
∣
d
i
)
-\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)=\lambda p\left(z_{k} | d_{i}\right)
−j=1∑Mn(di,wj)γ(zijk)=λp(zk∣di)
3)由于
∑
k
=
1
K
γ
(
z
i
j
k
)
=
1
\sum_{k=1}^{K}\gamma\left(z_{i j k}\right)=1
∑k=1Kγ(zijk)=1和
∑
k
=
1
K
p
(
z
k
∣
d
i
)
=
1
\sum_{k=1}^{K}p\left(z_{k} | d_{i}\right)=1
∑k=1Kp(zk∣di)=1,带入上式有:
λ
=
−
∑
j
=
1
M
n
(
d
i
,
w
j
)
\lambda=-\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right)
λ=−j=1∑Mn(di,wj)
4)把
λ
\lambda
λ带入到上上式中,得到
p
(
z
k
∣
d
i
)
p\left(z_{k} | d_{i}\right)
p(zk∣di)的表达式:
p
(
z
k
∣
d
i
)
=
∑
j
=
1
M
n
(
d
i
,
w
j
)
γ
(
z
i
j
k
)
∑
j
=
1
M
n
(
d
i
,
w
j
)
p\left(z_{k} | d_{i}\right)=\frac{\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)}{\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right)}
p(zk∣di)=∑j=1Mn(di,wj)∑j=1Mn(di,wj)γ(zijk)
同理,采用拉格朗日乘子法也可以求得
p
(
w
j
∣
z
k
)
p\left(w_{j} | z_{k}\right)
p(wj∣zk)的表达,过程如下:
1)表达式:
L
g
=
Q
(
θ
,
θ
old
)
+
λ
(
∑
k
=
1
K
p
(
w
j
∣
z
k
)
−
1
)
Lg=Q\left(\theta, \theta^{\text {old}}\right)+\lambda\left(\sum_{k=1}^{K} p\left(w_{j} | z_{k}\right)-1\right)
Lg=Q(θ,θold)+λ(k=1∑Kp(wj∣zk)−1)
2)求偏导得:
−
∑
i
=
1
N
n
(
d
i
,
w
j
)
γ
(
z
i
j
k
)
=
λ
p
(
w
j
∣
z
k
)
-\sum_{i=1}^{N} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)=\lambda p\left(w_{j} | z_{k}\right)
−i=1∑Nn(di,wj)γ(zijk)=λp(wj∣zk)
3)对参数
j
j
j的词累加得:
λ
=
−
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
γ
(
z
i
j
k
)
\lambda=-\sum_{i=1}^{N} \sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)
λ=−i=1∑Nj=1∑Mn(di,wj)γ(zijk)
4)再带入(2)中,求得:
p
(
w
j
∣
z
k
)
=
∑
i
=
1
N
n
(
d
i
,
w
j
)
γ
(
z
i
j
k
)
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
γ
(
z
i
j
k
)
p\left(w_{j} | z_{k}\right)=\frac{\sum_{i=1}^{N} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)}{\sum_{i=1}^{N} \sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)}
p(wj∣zk)=∑i=1N∑j=1Mn(di,wj)γ(zijk)∑i=1Nn(di,wj)γ(zijk)
- 总结得到优化的步骤为:
E-step,求后验概率:
γ ( z i j k ) = p ( z k ∣ d i ) p ( w j ∣ z k ) ∑ k = 1 K p ( z k ∣ d i ) p ( w j ∣ z k ) \gamma\left(z_{i j k}\right)=\frac{p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)}{\sum_{k=1}^{K} p\left(z_{k} | d_{i}\right) p\left(w_{j} | z_{k}\right)} γ(zijk)=∑k=1Kp(zk∣di)p(wj∣zk)p(zk∣di)p(wj∣zk)
M-step:
p ( z k ∣ d i ) = ∑ j = 1 M n ( d i , w j ) γ ( z i j k ) ∑ j = 1 M n ( d i , w j ) p\left(z_{k} | d_{i}\right)=\frac{\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)}{\sum_{j=1}^{M} n\left(d_{i}, w_{j}\right)} p(zk∣di)=∑j=1Mn(di,wj)∑j=1Mn(di,wj)γ(zijk)
p ( w j ∣ z k ) = ∑ i = 1 N n ( d i , w j ) γ ( z i j k ) ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) γ ( z i j k ) p\left(w_{j} | z_{k}\right)=\frac{\sum_{i=1}^{N} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)}{\sum_{i=1}^{N} \sum_{j=1}^{M} n\left(d_{i}, w_{j}\right) \gamma\left(z_{i j k}\right)} p(wj∣zk)=∑i=1N∑j=1Mn(di,wj)γ(zijk)∑i=1Nn(di,wj)γ(zijk)