简介
概率潜在语义分析(Probabilistic Latent Semantic Analysis)模型简称pLSA。可以使用EM算法来估计pLSA的参数。
已知
有文档集合
D
=
{
d
1
,
.
.
.
,
d
N
}
D=\{d_1,...,d_N\}
D={d1,...,dN},词语集合
W
=
{
w
1
,
.
.
.
,
w
M
}
W=\{w_1,...,w_M\}
W={w1,...,wM},文档的(不可观测的隐变量)类别集合
Z
=
{
z
1
,
.
.
.
,
z
K
}
Z=\{z_1,...,z_K\}
Z={z1,...,zK}。可以知道生成过程如下:
p(di)选取到文档di
  
⟹
  
\implies
⟹ p(zk|di)的概率文档di属于类别zk
  
⟹
  
\implies
⟹ p(wj|zk)的概率zk类的文档中有单词wj
能观测得到的数据是 n ( d i , w j ) n(d_i,w_j) n(di,wj),而 Z Z Z是观测不到的
独立性假设: p ( d i , w j ∣ z k ) = p ( d i ∣ z k )   p ( w j ∣ z k ) ( 1 ) p(d_i,w_j|z_k)=p(d_i|z_k)\,p(w_j|z_k)\qquad(1) p(di,wj∣zk)=p(di∣zk)p(wj∣zk)(1)
参数
需要求解的pLSA的参数是
p
(
z
k
∣
d
i
)
p(z_k|d_i)
p(zk∣di)和
p
(
w
j
∣
z
k
)
p(w_j|z_k)
p(wj∣zk),因为:
p
(
d
i
,
w
j
)
=
∑
k
=
1
K
p
(
z
k
,
d
i
,
w
j
)
=
∑
k
=
1
K
p
(
d
i
,
w
j
∣
z
k
)
p
(
z
k
)
=
∑
k
=
1
K
p
(
d
i
∣
z
k
)
p
(
w
j
∣
z
k
)
p
(
z
k
)
  
[
∗
独
立
性
假
设
(
1
)
∗
]
=
p
(
d
i
,
z
k
)
p
(
w
j
∣
z
k
)
=
∑
k
=
1
K
p
(
z
k
∣
d
i
)
p
(
d
i
)
p
(
w
j
∣
z
k
)
=
p
(
d
i
)
∑
k
=
1
K
p
(
z
k
∣
d
i
)
p
(
w
j
∣
z
k
)
(
2
)
\begin{aligned}p(d_i,w_j)&=\sum_{k=1}^{K}p(z_k,d_i,w_j) =\sum_{k=1}^{K}p(d_i,w_j|z_k)p(z_k)\\&=\sum_{k=1}^{K}p(d_i|z_k)p(w_j|z_k)p(z_k)\;[*独立性假设(1)*]\\&=p(d_i,z_k)p(w_j|z_k) \\&=\sum_{k=1}^{K}p(z_k|d_i)p(d_i)p(w_j|z_k) \\&=p(d_i)\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k) \qquad\qquad(2) \end{aligned}
p(di,wj)=k=1∑Kp(zk,di,wj)=k=1∑Kp(di,wj∣zk)p(zk)=k=1∑Kp(di∣zk)p(wj∣zk)p(zk)[∗独立性假设(1)∗]=p(di,zk)p(wj∣zk)=k=1∑Kp(zk∣di)p(di)p(wj∣zk)=p(di)k=1∑Kp(zk∣di)p(wj∣zk)(2)
而由(2),联合概率转为条件概率:
p
(
w
j
∣
d
i
)
=
p
(
d
i
,
w
j
)
p
(
d
i
)
=
∑
k
=
1
K
p
(
z
k
∣
d
i
)
p
(
w
j
∣
z
k
)
(
3
)
p(w_j|d_i)=\frac{p(d_i,w_j)}{p(d_i)}=\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k)\qquad(3)
p(wj∣di)=p(di)p(di,wj)=k=1∑Kp(zk∣di)p(wj∣zk)(3)
(好像也可以这么考虑,从 d i d_i di生成 z k z_k zk,是 p ( z k ∣ d i ) p(z_k|d_i) p(zk∣di),固定 d i d_i di,有 z k z_k zk类文档,所以会有 ∑ k = 1 K \sum_{k=1}^{K} ∑k=1K,而文档对应的是单个单词 w j w_j wj,所以 p ( w j ∣ d i ) 会 是 如 上 形 式 p(w_j|d_i)会是如上形式 p(wj∣di)会是如上形式)
“极大似然”
要使得
p
(
D
,
W
)
p(D,W)
p(D,W)最大,也就是使得
L
L
L最大,表示文档
d
i
d_i
di中出现单词
w
j
w_j
wj为
n
(
d
i
,
w
j
)
n(d_i,w_j)
n(di,wj)的概率,累乘得到
L
L
L,这和极大似然估计里面是一样的,使得由参数生成这样子的样本的可能性最大:
L
=
∏
i
=
1
N
∏
j
=
1
M
[
p
(
d
i
,
w
i
)
]
n
(
d
i
,
w
j
)
=
∏
i
=
1
N
∏
j
=
1
M
[
p
(
d
i
)
∑
k
=
1
K
p
(
z
k
∣
d
i
)
p
(
w
j
∣
z
k
)
]
n
(
d
i
,
w
j
)
\begin{aligned}L &=\prod_{i=1}^{N}\prod_{j=1}^{M}[p(d_i,w_i)]^{n(d_i,w_j)} \\&=\prod_{i=1}^{N}\prod_{j=1}^{M}[p(d_i)\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k)]^{n(d_i,w_j)} \end{aligned}
L=i=1∏Nj=1∏M[p(di,wi)]n(di,wj)=i=1∏Nj=1∏M[p(di)k=1∑Kp(zk∣di)p(wj∣zk)]n(di,wj)
采用对数似然函数
l
o
g
L
logL
logL,累乘变成累加,有:
l
o
g
L
=
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
l
o
g
[
p
(
d
i
)
∑
k
=
1
K
p
(
z
k
∣
d
i
)
p
(
w
j
∣
z
k
)
]
=
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
l
o
g
 
p
(
d
i
)
  
+
  
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
l
o
g
[
∑
k
=
1
K
p
(
z
k
∣
d
i
)
p
(
w
j
∣
z
k
)
]
  
(
4
)
\begin{aligned}logL&=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)log[p(d_i)\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k)]\\ &=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)log\,p(d_i)\;+\;\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)log[\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k)]\;(4) \end{aligned}
logL=i=1∑Nj=1∑Mn(di,wj)log[p(di)k=1∑Kp(zk∣di)p(wj∣zk)]=i=1∑Nj=1∑Mn(di,wj)logp(di)+i=1∑Nj=1∑Mn(di,wj)log[k=1∑Kp(zk∣di)p(wj∣zk)](4)
观察式(4),可以发现,现在要极大
l
o
g
L
logL
logL,但是前半部分的
n
(
d
i
,
w
j
)
n(d_i,w_j)
n(di,wj)是可以观察得到的,
p
(
d
i
)
p(d_i)
p(di)也是可以观察得到的,都不是变量,都是常数,这种情况下,极大
l
o
g
L
logL
logL,则只考虑后半部分,后半部分记做
L
1
L_1
L1。继续推导
L
1
=
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
l
o
g
[
∑
k
=
1
K
Q
k
(
z
k
)
p
(
z
k
∣
d
i
)
p
(
w
j
∣
z
k
)
Q
k
(
z
k
)
]
≥
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
∑
k
=
1
K
Q
k
(
z
k
)
l
o
g
[
p
(
z
k
∣
d
i
)
p
(
w
j
∣
z
k
)
Q
k
(
z
k
)
]
(
5
)
\begin{aligned}L_1&=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)log[\sum_{k=1}^{K}Q_k(z_k)\frac{p(z_k|d_i)p(w_j|z_k)}{Q_k(z_k)}]\\ &\ge\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}Q_k(z_k)log[\frac{p(z_k|d_i)p(w_j|z_k)}{Q_k(z_k)}]\quad(5) \end{aligned}
L1=i=1∑Nj=1∑Mn(di,wj)log[k=1∑KQk(zk)Qk(zk)p(zk∣di)p(wj∣zk)]≥i=1∑Nj=1∑Mn(di,wj)k=1∑KQk(zk)log[Qk(zk)p(zk∣di)p(wj∣zk)](5)
上式中,得到
L
1
≥
(
5
)
L_1\ge(5)
L1≥(5),是由于Jensen不等式:
l
o
g
∑
j
λ
j
y
j
≥
∑
j
λ
j
l
o
g
y
j
λ
j
≥
0
,
∑
j
λ
j
=
1
log\sum_{j}\lambda_jy_j\ge\sum_{j}\lambda_jlogy_j\qquad\lambda_j\ge0,\sum_{j}\lambda_j=1
logj∑λjyj≥j∑λjlogyjλj≥0,j∑λj=1
我们需要随便选择一个
Q
k
(
z
k
)
Q_k(z_k)
Qk(zk),使得
p
(
z
k
∣
d
i
)
p
(
w
j
∣
z
k
)
Q
k
(
z
k
)
=
c
\frac{p(z_k|d_i)p(w_j|z_k)}{Q_k(z_k)}=c
Qk(zk)p(zk∣di)p(wj∣zk)=c
c
c
c是一个常数,不依赖于
z
k
z_k
zk。这样的
Q
k
(
z
k
)
Q_k(z_k)
Qk(zk)有很多,但是可以这样取,取为
p
(
z
k
∣
d
i
,
w
j
)
p(z_k|d_i,w_j)
p(zk∣di,wj)。因为有:
p
(
z
k
∣
d
i
,
w
j
)
=
p
(
z
k
,
d
i
,
w
j
)
p
(
d
i
,
w
j
)
=
p
(
d
i
,
w
j
∣
z
k
)
p
(
z
k
)
p
(
d
i
,
w
j
)
=
p
(
d
i
∣
z
k
)
p
(
w
j
∣
z
k
)
p
(
z
k
)
p
(
d
i
,
w
j
)
=
p
(
w
j
∣
z
k
)
p
(
z
k
∣
d
i
)
p
(
d
i
)
p
(
d
i
,
w
j
)
=
p
(
w
j
∣
z
k
)
p
(
z
k
∣
d
i
)
∑
k
=
1
K
p
(
z
k
∣
d
i
)
p
(
w
j
∣
z
k
)
(
6
)
\begin{aligned} p(z_k|d_i,w_j)&=\frac{p(z_k,d_i,w_j)}{p(d_i,w_j)}=\frac{p(d_i,w_j|z_k)p(z_k)}{p(d_i,w_j)}\\ &=\frac{p(d_i|z_k)p(w_j|z_k)p(z_k)}{p(d_i,w_j)}\\ &=\frac{p(w_j|z_k)p(z_k|d_i)p(d_i)}{p(d_i,w_j)}\\ &=\frac{p(w_j|z_k)p(z_k|d_i)}{\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k)}\qquad(6) \end{aligned}
p(zk∣di,wj)=p(di,wj)p(zk,di,wj)=p(di,wj)p(di,wj∣zk)p(zk)=p(di,wj)p(di∣zk)p(wj∣zk)p(zk)=p(di,wj)p(wj∣zk)p(zk∣di)p(di)=∑k=1Kp(zk∣di)p(wj∣zk)p(wj∣zk)p(zk∣di)(6)
把
Q
k
(
z
k
)
Q_k(z_k)
Qk(zk)带入(5)可得:
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
∑
k
=
1
K
p
(
z
k
∣
d
i
,
w
j
)
l
o
g
[
p
(
z
k
∣
d
i
)
p
(
w
j
∣
z
k
)
p
(
z
k
∣
d
i
,
w
j
)
]
(
7
)
\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j)log[\frac{p(z_k|d_i)p(w_j|z_k)}{p(z_k|d_i,w_j)}]\qquad(7)
i=1∑Nj=1∑Mn(di,wj)k=1∑Kp(zk∣di,wj)log[p(zk∣di,wj)p(zk∣di)p(wj∣zk)](7)
(7)中,log部分下面的分母在求极大时可以省去,因为在 l o g log log函数对参数 p ( z k ∣ d i ) p(z_k|d_i) p(zk∣di)和 p ( w j ∣ z k ) p(w_j|z_k) p(wj∣zk)求偏导数时,如(ln5x)’=1/x,所以可以省去,如果保留,在下面也会发现不影响。
(7)省去了log下的分母后,得到:
f
=
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
∑
k
=
1
K
p
(
z
k
∣
d
i
,
w
j
)
l
o
g
[
p
(
z
k
∣
d
i
)
p
(
w
j
∣
z
k
)
]
(
8
)
f=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j)log[p(z_k|d_i)p(w_j|z_k)]\qquad(8)
f=i=1∑Nj=1∑Mn(di,wj)k=1∑Kp(zk∣di,wj)log[p(zk∣di)p(wj∣zk)](8)
所以接下来要做的就是最大化(8)。
EM算法
E-step:更新
Q
z
(
z
k
)
=
p
(
z
k
∣
d
i
,
w
j
)
Qz(z_k)=p(z_k|d_i,w_j)
Qz(zk)=p(zk∣di,wj)
M-step:最大化式(8),得到参数
p
(
z
k
∣
d
i
)
p(z_k|d_i)
p(zk∣di)和
p
(
w
j
∣
z
k
)
p(w_j|z_k)
p(wj∣zk)
约束条件:
s
.
t
.
∑
k
=
1
K
p
(
z
k
∣
d
i
)
=
1
∑
j
=
1
M
p
(
w
j
∣
z
k
)
=
1
\begin{aligned}s.t.&\sum_{k=1}^{K}p(z_k|d_i)=1\\ &\sum_{j=1}^{M}p(w_j|z_k)=1 \end{aligned}
s.t.k=1∑Kp(zk∣di)=1j=1∑Mp(wj∣zk)=1
通过不断求取下界最大化(
≥
\ge
≥),逼近似然极大化。
拉格朗日法极大化(8)
使用拉格朗日法求驻点,构造函数
L
g
Lg
Lg:
L
g
=
f
+
∑
i
=
1
N
ρ
i
[
1
−
∑
k
=
1
K
p
(
z
k
∣
d
i
)
]
+
∑
i
=
1
N
τ
i
[
1
−
∑
j
=
1
M
p
(
w
j
∣
z
k
)
]
Lg=f+\sum_{i=1}^{N}\rho_i[1-\sum_{k=1}^{K}p(z_k|d_i)]+\sum_{i=1}^{N}\tau_i[1-\sum_{j=1}^{M}p(w_j|z_k)]
Lg=f+i=1∑Nρi[1−k=1∑Kp(zk∣di)]+i=1∑Nτi[1−j=1∑Mp(wj∣zk)]
对
L
g
Lg
Lg的变量
p
(
z
k
∣
d
i
)
p(z_k|d_i)
p(zk∣di)求偏导得到:
∇
p
(
z
k
∣
d
i
)
L
g
=
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
∑
k
=
1
K
p
(
z
k
∣
d
i
,
w
j
)
p
(
z
k
∣
d
i
)
  
−
  
∑
k
=
1
K
∑
i
=
1
N
ρ
i
  
=
0
\nabla_{p(z_k|d_i)}Lg=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}\frac{p(z_k|d_i,w_j)}{p(z_k|d_i)}\;-\;\sum_{k=1}^{K}\sum_{i=1}^{N}\rho_i\;=0
∇p(zk∣di)Lg=i=1∑Nj=1∑Mn(di,wj)k=1∑Kp(zk∣di)p(zk∣di,wj)−k=1∑Ki=1∑Nρi=0
对于减号左右两项,
p
(
z
k
∣
d
i
)
p(z_k|d_i)
p(zk∣di)都是对k和i的累加(右边现在还没有),可以两边同时乘以
p
(
z
k
∣
d
i
)
p(z_k|d_i)
p(zk∣di),得:
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
∑
k
=
1
K
p
(
z
k
∣
d
i
,
w
j
)
=
∑
i
=
1
N
∑
k
=
1
K
ρ
i
p
(
z
k
∣
d
i
)
(
9
)
\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j)=\sum_{i=1}^{N}\sum_{k=1}^{K}\rho_ip(z_k|d_i)\qquad(9)
i=1∑Nj=1∑Mn(di,wj)k=1∑Kp(zk∣di,wj)=i=1∑Nk=1∑Kρip(zk∣di)(9)
而由约束条件
∑
k
=
1
K
p
(
z
k
∣
d
i
)
=
1
\sum_{k=1}^{K}p(z_k|d_i)=1
∑k=1Kp(zk∣di)=1,所以从上式求得:
ρ
i
=
∑
j
=
1
M
n
(
d
i
,
w
j
)
∑
k
=
1
K
p
(
z
k
∣
d
i
,
w
j
)
\rho_i=\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j)
ρi=j=1∑Mn(di,wj)k=1∑Kp(zk∣di,wj)
因为
∑
k
=
1
K
p
(
z
k
∣
d
i
,
w
j
)
  
=
1
\sum_{k=1}^{K}p(z_k|d_i,w_j)\;=1
∑k=1Kp(zk∣di,wj)=1,所以
ρ
i
\rho_i
ρi也可以表示为
ρ
i
=
n
(
d
i
)
\rho_i=n(d_i)
ρi=n(di)。
继续,对于
L
g
Lg
Lg的变量
p
(
w
j
∣
z
k
)
p(w_j|z_k)
p(wj∣zk)求偏导得到:
∇
p
(
w
j
∣
z
k
)
L
g
=
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
∑
k
=
1
K
p
(
z
k
∣
d
i
,
w
j
)
p
(
w
j
∣
z
k
)
  
−
  
∑
i
=
1
M
∑
k
=
1
K
τ
k
  
=
0
\nabla_{p(w_j|z_k)}Lg=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}\frac{p(z_k|d_i,w_j)}{p(w_j|z_k)}\;-\;\sum_{i=1}^{M}\sum_{k=1}^{K}\tau_k\;=0
∇p(wj∣zk)Lg=i=1∑Nj=1∑Mn(di,wj)k=1∑Kp(wj∣zk)p(zk∣di,wj)−i=1∑Mk=1∑Kτk=0
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
∑
k
=
1
K
p
(
z
k
∣
d
i
,
w
j
)
p
(
w
j
∣
z
k
)
=
∑
i
=
1
M
∑
k
=
1
K
τ
k
\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}\frac{p(z_k|d_i,w_j)}{p(w_j|z_k)}=\sum_{i=1}^{M}\sum_{k=1}^{K}\tau_k
i=1∑Nj=1∑Mn(di,wj)k=1∑Kp(wj∣zk)p(zk∣di,wj)=i=1∑Mk=1∑Kτk
两边乘上
p
(
w
j
∣
z
k
)
p(w_j|z_k)
p(wj∣zk)得:
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
∑
k
=
1
K
p
(
z
k
∣
d
i
,
w
j
)
=
∑
k
=
1
K
τ
k
∑
i
=
1
M
p
(
w
j
∣
z
k
)
(
10
)
\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j)=\sum_{k=1}^{K}\tau_k\sum_{i=1}^{M}p(w_j|z_k)\qquad(10)
i=1∑Nj=1∑Mn(di,wj)k=1∑Kp(zk∣di,wj)=k=1∑Kτki=1∑Mp(wj∣zk)(10)
由约束条件
∑
j
=
1
M
p
(
w
j
∣
z
k
)
=
1
\sum_{j=1}^{M}p(w_j|z_k)=1
∑j=1Mp(wj∣zk)=1,得:
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
∑
k
=
1
K
p
(
z
k
∣
d
i
,
w
j
)
=
∑
k
=
1
K
τ
k
\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j)=\sum_{k=1}^{K}\tau_k
i=1∑Nj=1∑Mn(di,wj)k=1∑Kp(zk∣di,wj)=k=1∑Kτk
变形一下:
∑
k
=
1
K
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
p
(
z
k
∣
d
i
,
w
j
)
=
∑
k
=
1
K
τ
k
\sum_{k=1}^{K}\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)p(z_k|d_i,w_j)=\sum_{k=1}^{K}\tau_k
k=1∑Ki=1∑Nj=1∑Mn(di,wj)p(zk∣di,wj)=k=1∑Kτk
∴
\therefore
∴
τ
k
=
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
p
(
z
k
∣
d
i
,
w
j
)
\tau_k=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)p(z_k|d_i,w_j)
τk=i=1∑Nj=1∑Mn(di,wj)p(zk∣di,wj)
于是,M步更新的两个参数
p
(
w
j
∣
z
k
)
p(w_j|z_k)
p(wj∣zk)和
p
(
z
k
∣
d
i
)
p(z_k|d_i)
p(zk∣di)可以通过它们来表示,具体来看,先看(9)式,里面的未知量
ρ
i
\rho_i
ρi已经表示出来了,所以可以通过(9)求得
p
(
z
k
∣
d
i
)
p(z_k|d_i)
p(zk∣di):
∑
k
=
1
K
∑
j
=
1
M
n
(
d
i
,
w
j
)
p
(
z
k
∣
d
i
,
w
j
)
ρ
i
  
=
  
∑
k
=
1
K
p
(
z
k
∣
d
i
)
\frac{\sum_{k=1}^{K}\sum_{j=1}^{M}n(d_i,w_j)p(z_k|d_i,w_j)}{\rho_i}\;=\;\sum_{k=1}^{K}p(z_k|d_i)
ρi∑k=1K∑j=1Mn(di,wj)p(zk∣di,wj)=k=1∑Kp(zk∣di)
p
(
z
k
∣
d
i
)
  
=
  
∑
j
=
1
M
n
(
d
i
,
w
j
)
p
(
z
k
∣
d
i
,
w
j
)
n
(
d
i
)
p(z_k|d_i)\;=\;\frac{\sum_{j=1}^{M}n(d_i,w_j)p(z_k|d_i,w_j)}{n(d_i)}
p(zk∣di)=n(di)∑j=1Mn(di,wj)p(zk∣di,wj)
可以通过(10)求解
p
(
w
j
∣
z
k
)
p(w_j|z_k)
p(wj∣zk):
p
(
w
j
∣
z
k
)
=
∑
i
=
1
N
n
(
d
i
,
w
j
)
τ
k
p(w_j|z_k)=\frac{\sum_{i=1}^{N}n(d_i,w_j)}{\tau_k}
p(wj∣zk)=τk∑i=1Nn(di,wj)
p
(
w
j
∣
z
k
)
=
∑
i
=
1
N
n
(
d
i
,
w
j
)
∑
i
=
1
N
∑
j
=
1
M
n
(
d
i
,
w
j
)
p
(
z
k
∣
d
i
,
w
j
)
p(w_j|z_k)=\frac{\sum_{i=1}^{N}n(d_i,w_j)}{\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)p(z_k|d_i,w_j)}
p(wj∣zk)=∑i=1N∑j=1Mn(di,wj)p(zk∣di,wj)∑i=1Nn(di,wj)
……
数学在挂科边缘试探的我……只希望模式识别能好好过去,推上面第一次推了一个下午才推完