18.2 概率潜在语义分析的算法
生成模型的对数似然函数是:
L
=
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
l
o
g
P
(
w
i
,
d
j
)
=
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
l
o
g
[
∑
k
=
1
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
P
(
d
j
)
]
=
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
[
l
o
g
P
(
d
j
)
+
l
o
g
(
∑
k
=
1
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
)
]
=
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
l
o
g
P
(
d
j
)
+
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
l
o
g
(
∑
k
=
1
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
)
\begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)logP(w_i,d_j)\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)P(d_j)]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)[logP(d_j)+log(\sum_{k=1}P(w_i|z_k)P(z_k|d_j))]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)logP(d_j)+\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log(\sum_{k=1}P(w_i|z_k)P(z_k|d_j)) \end{aligned}
L=i=1∑Mj=1∑Nn(wi,dj)logP(wi,dj)=i=1∑Mj=1∑Nn(wi,dj)log[k=1∑P(wi∣zk)P(zk∣dj)P(dj)]=i=1∑Mj=1∑Nn(wi,dj)[logP(dj)+log(k=1∑P(wi∣zk)P(zk∣dj))]=i=1∑Mj=1∑Nn(wi,dj)logP(dj)+i=1∑Mj=1∑Nn(wi,dj)log(k=1∑P(wi∣zk)P(zk∣dj))
又因为前半部分是一个常数,与模型参数无关,于是将其省去,就得到书上的似然函数:
L
=
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
l
o
g
[
∑
k
=
1
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
]
L=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)]
L=i=1∑Mj=1∑Nn(wi,dj)log[k=1∑P(wi∣zk)P(zk∣dj)]
E步:计算Q函数
L
=
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
l
o
g
[
∑
k
=
1
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
]
=
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
l
o
g
[
∑
k
=
1
P
(
z
k
∣
w
i
,
d
j
)
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
P
(
z
k
∣
w
i
,
d
j
)
]
\begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(z_k|w_i,d_j)\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}] \end{aligned}
L=i=1∑Mj=1∑Nn(wi,dj)log[k=1∑P(wi∣zk)P(zk∣dj)]=i=1∑Mj=1∑Nn(wi,dj)log[k=1∑P(zk∣wi,dj)P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)]
上式中, 由Jensen不等式:
log
∑
j
λ
j
y
j
≥
∑
j
λ
j
log
y
j
λ
j
≥
0
,
∑
j
λ
j
=
1
\log \sum_{j} \lambda_{j} y_{j} \geq \sum_{j} \lambda_{j} \log y_{j} \quad \lambda_{j} \geq 0, \sum_{j} \lambda_{j}=1
logj∑λjyj≥j∑λjlogyjλj≥0,j∑λj=1
L = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) l o g [ ∑ k = 1 P ( z k ∣ w i , d j ) P ( w i ∣ z k ) P ( z k ∣ d j ) P ( z k ∣ w i , d j ) ] ⩾ ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) ∑ k = 1 K P ( z k ∣ w i , d j ) l o g [ P ( w i ∣ z k ) P ( z k ∣ d j ) P ( z k ∣ w i , d j ) ] \begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(z_k|w_i,d_j)\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}]\\ &\geqslant\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}] \end{aligned} L=i=1∑Mj=1∑Nn(wi,dj)log[k=1∑P(zk∣wi,dj)P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)]⩾i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)log[P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)]
得到L的下界:
L
=
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
∑
k
=
1
K
P
(
z
k
∣
w
i
,
d
j
)
l
o
g
[
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
P
(
z
k
∣
w
i
,
d
j
)
]
=
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
∑
k
=
1
K
P
(
z
k
∣
w
i
,
d
j
)
[
l
o
g
[
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
]
−
l
o
g
P
(
z
k
∣
w
i
,
d
j
)
]
=
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
∑
k
=
1
K
P
(
z
k
∣
w
i
,
d
j
)
l
o
g
[
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
]
−
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
∑
k
=
1
K
P
(
z
k
∣
w
i
,
d
j
)
l
o
g
P
(
z
k
∣
w
i
,
d
j
)
\begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)[log[P(w_i|z_k)P(z_k|d_j)]-logP(z_k|w_i,d_j)]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[P(w_i|z_k)P(z_k|d_j)]-\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)logP(z_k|w_i,d_j) \end{aligned}
L=i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)log[P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)]=i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)[log[P(wi∣zk)P(zk∣dj)]−logP(zk∣wi,dj)]=i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)]−i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)logP(zk∣wi,dj)
又因为在极大化Q函数时,对
P
(
w
i
∣
z
k
)
和
P
(
z
k
∣
d
j
)
求偏导数
P\left(w_{i} \mid z_{k}\right) \text { 和 } P\left(z_{k} \mid d_{j}\right) \text { 求偏导数 }
P(wi∣zk) 和 P(zk∣dj) 求偏导数 ,后半部分偏导数为0,所以可以直接在这里将其省去,当然也可以留着,反正后面求导都会为0。因此
Q
=
∑
i
=
1
M
∑
j
=
1
N
n
(
w
i
,
d
j
)
∑
k
=
1
K
P
(
z
k
∣
w
i
,
d
j
)
l
o
g
[
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
]
Q=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[P(w_i|z_k)P(z_k|d_j)]
Q=i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)]
就得到了书上的
Q
′
Q^{\prime}
Q′函数。其中
P
(
z
k
∣
w
i
,
d
j
)
=
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
∑
k
=
1
K
P
(
w
i
∣
z
k
)
P
(
z
k
∣
d
j
)
P\left(z_{k} \mid w_{i}, d_{j}\right)=\frac{P\left(w_{i} \mid z_{k}\right) P\left(z_{k} \mid d_{j}\right)}{\sum_{k=1}^{K} P\left(w_{i} \mid z_{k}\right) P\left(z_{k} \mid d_{j}\right)}
P(zk∣wi,dj)=∑k=1KP(wi∣zk)P(zk∣dj)P(wi∣zk)P(zk∣dj)
M步:极大化Q函数
因为变量
P
(
w
i
∣
z
k
)
,
P
(
z
k
∣
d
j
)
P\left(w_{i} \mid z_{k}\right), P\left(z_{k} \mid d_{j}\right)
P(wi∣zk),P(zk∣dj) 形成概率分布, 满足约束条件
∑
i
=
1
M
P
(
w
i
∣
z
k
)
=
1
,
k
=
1
,
2
,
⋯
,
K
∑
k
=
1
K
P
(
z
k
∣
d
j
)
=
1
,
j
=
1
,
2
,
⋯
,
N
\begin{aligned} &\sum_{i=1}^{M} P\left(w_{i} \mid z_{k}\right)=1, \quad k=1,2, \cdots, K \\ &\sum_{k=1}^{K} P\left(z_{k} \mid d_{j}\right)=1, \quad j=1,2, \cdots, N \end{aligned}
i=1∑MP(wi∣zk)=1,k=1,2,⋯,Kk=1∑KP(zk∣dj)=1,j=1,2,⋯,N
应用拉格朗日法, 引入拉格朗日乘子
τ
k
\tau_{k}
τk 和
ρ
j
\rho_{j}
ρj, 定义拉格朗日函数
Λ
\Lambda
Λ
Λ
=
Q
′
+
∑
k
=
1
K
τ
k
(
1
−
∑
i
=
1
M
P
(
w
i
∣
z
k
)
)
+
∑
j
=
1
N
ρ
j
(
1
−
∑
k
=
1
K
P
(
z
k
∣
d
j
)
)
\Lambda=Q^{\prime}+\sum_{k=1}^{K} \tau_{k}\left(1-\sum_{i=1}^{M} P\left(w_{i} \mid z_{k}\right)\right)+\sum_{j=1}^{N} \rho_{j}\left(1-\sum_{k=1}^{K} P\left(z_{k} \mid d_{j}\right)\right)
Λ=Q′+k=1∑Kτk(1−i=1∑MP(wi∣zk))+j=1∑Nρj(1−k=1∑KP(zk∣dj))
将拉格朗日函数
Λ
\Lambda
Λ 分别对
P
(
w
i
∣
z
k
)
P\left(w_{i} \mid z_{k}\right)
P(wi∣zk) 和
P
(
z
k
∣
d
j
)
P\left(z_{k} \mid d_{j}\right)
P(zk∣dj) 求偏导数, 并令其等于 0 , 得到下面的方程组
∑
j
=
1
N
n
(
w
i
,
d
j
)
P
(
z
k
∣
w
i
,
d
j
)
−
τ
k
P
(
w
i
∣
z
k
)
=
0
,
i
=
1
,
2
,
⋯
,
M
;
k
=
1
,
2
,
⋯
,
K
∑
i
=
1
M
n
(
w
i
,
d
j
)
P
(
z
k
∣
w
i
,
d
j
)
−
ρ
j
P
(
z
k
∣
d
j
)
=
0
,
j
=
1
,
2
,
⋯
,
N
;
k
=
1
,
2
,
⋯
,
K
\begin{aligned} &\sum_{j=1}^{N} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)-\tau_{k} P\left(w_{i} \mid z_{k}\right)=0, \quad i=1,2, \cdots, M ; \quad k=1,2, \cdots, K\\ &\sum_{i=1}^{M} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)-\rho_{j} P\left(z_{k} \mid d_{j}\right)=0, \quad j=1,2, \cdots, N ; \quad k=1,2, \cdots, K \end{aligned}
j=1∑Nn(wi,dj)P(zk∣wi,dj)−τkP(wi∣zk)=0,i=1,2,⋯,M;k=1,2,⋯,Ki=1∑Mn(wi,dj)P(zk∣wi,dj)−ρjP(zk∣dj)=0,j=1,2,⋯,N;k=1,2,⋯,K
现求解
τ
k
和
ρ
j
\tau_k和\rho_j
τk和ρj,两边分别同时对i和k求和得到:
∑
i
=
1
M
∑
i
=
1
M
n
(
w
i
,
d
j
)
P
(
z
k
∣
w
j
,
d
j
)
=
∑
i
=
1
M
τ
k
P
(
w
i
∣
z
k
)
=
τ
k
∑
k
=
1
K
∑
i
=
1
M
n
(
w
i
,
d
j
)
P
(
z
k
∣
w
i
,
d
j
)
=
∑
k
=
1
K
ρ
j
P
(
z
k
∣
d
j
)
=
ρ
j
\begin{aligned} &\sum_{i=1}^M\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_j,d_j)=\sum_{i=1}^M\tau_kP(w_i|z_k)=\tau_k\\ &\sum_{k=1}^K\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_i,d_j)=\sum_{k=1}^K\rho_jP(z_k|d_j)=\rho_j \end{aligned}
i=1∑Mi=1∑Mn(wi,dj)P(zk∣wj,dj)=i=1∑MτkP(wi∣zk)=τkk=1∑Ki=1∑Mn(wi,dj)P(zk∣wi,dj)=k=1∑KρjP(zk∣dj)=ρj
于是得到:
ρ
j
=
∑
k
=
1
K
∑
i
=
1
M
n
(
w
i
,
d
j
)
P
(
z
k
∣
w
j
,
d
j
)
=
∑
i
=
1
M
n
(
w
i
,
d
j
)
=
n
(
d
j
)
τ
k
=
∑
j
=
1
N
∑
i
=
1
M
n
(
w
i
,
d
j
)
P
(
z
k
∣
w
i
,
d
j
)
\begin{aligned} \rho_j&=\sum_{k=1}^K\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_j,d_j)=\sum_{i=1}^Mn(w_i,d_j)=n(d_j)\\ \tau_k&=\sum_{j=1}^N\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_i,d_j) \end{aligned}
ρjτk=k=1∑Ki=1∑Mn(wi,dj)P(zk∣wj,dj)=i=1∑Mn(wi,dj)=n(dj)=j=1∑Ni=1∑Mn(wi,dj)P(zk∣wi,dj)
将求得的
τ
k
和
ρ
j
\tau_k和\rho_j
τk和ρj代回方程组得参数估计公式:
P
(
w
i
∣
z
k
)
=
∑
j
=
1
N
n
(
w
i
,
d
j
)
P
(
z
k
∣
w
i
,
d
j
)
∑
m
=
1
M
∑
j
=
1
N
n
(
w
m
,
d
j
)
P
(
z
k
∣
w
m
,
d
j
)
P
(
z
k
∣
d
j
)
=
∑
i
=
1
M
n
(
w
i
,
d
j
)
P
(
z
k
∣
w
i
,
d
j
)
n
(
d
j
)
\begin{aligned} &P\left(w_{i} \mid z_{k}\right)=\frac{\sum_{j=1}^{N} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)}{\sum_{m=1}^{M} \sum_{j=1}^{N} n\left(w_{m}, d_{j}\right) P\left(z_{k} \mid w_{m}, d_{j}\right)}\\ &P\left(z_{k} \mid d_{j}\right)=\frac{\sum_{i=1}^{M} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)}{n\left(d_{j}\right)} \end{aligned}
P(wi∣zk)=∑m=1M∑j=1Nn(wm,dj)P(zk∣wm,dj)∑j=1Nn(wi,dj)P(zk∣wi,dj)P(zk∣dj)=n(dj)∑i=1Mn(wi,dj)P(zk∣wi,dj)