EM算法推导pLSA

简介

  概率潜在语义分析(Probabilistic Latent Semantic Analysis)模型简称pLSA。可以使用EM算法来估计pLSA的参数。

已知

  有文档集合 D = { d 1 , . . . , d N } D=\{d_1,...,d_N\} D={d1,...,dN},词语集合 W = { w 1 , . . . , w M } W=\{w_1,...,w_M\} W={w1,...,wM},文档的(不可观测的隐变量)类别集合 Z = { z 1 , . . . , z K } Z=\{z_1,...,z_K\} Z={z1,...,zK}。可以知道生成过程如下:
p(di)选取到文档di    ⟹    \implies p(zk|di)的概率文档di属于类别zk    ⟹    \implies p(wj|zk)的概率zk类的文档中有单词wj

  能观测得到的数据是 n ( d i , w j ) n(d_i,w_j) n(di,wj),而 Z Z Z是观测不到的

  独立性假设: p ( d i , w j ∣ z k ) = p ( d i ∣ z k )   p ( w j ∣ z k ) ( 1 ) p(d_i,w_j|z_k)=p(d_i|z_k)\,p(w_j|z_k)\qquad(1) p(di,wjzk)=p(dizk)p(wjzk)(1)

参数

  需要求解的pLSA的参数是 p ( z k ∣ d i ) p(z_k|d_i) p(zkdi) p ( w j ∣ z k ) p(w_j|z_k) p(wjzk),因为:
p ( d i , w j ) = ∑ k = 1 K p ( z k , d i , w j ) = ∑ k = 1 K p ( d i , w j ∣ z k ) p ( z k ) = ∑ k = 1 K p ( d i ∣ z k ) p ( w j ∣ z k ) p ( z k )    [ ∗ 独 立 性 假 设 ( 1 ) ∗ ] = p ( d i , z k ) p ( w j ∣ z k ) = ∑ k = 1 K p ( z k ∣ d i ) p ( d i ) p ( w j ∣ z k ) = p ( d i ) ∑ k = 1 K p ( z k ∣ d i ) p ( w j ∣ z k ) ( 2 ) \begin{aligned}p(d_i,w_j)&=\sum_{k=1}^{K}p(z_k,d_i,w_j) =\sum_{k=1}^{K}p(d_i,w_j|z_k)p(z_k)\\&=\sum_{k=1}^{K}p(d_i|z_k)p(w_j|z_k)p(z_k)\;[*独立性假设(1)*]\\&=p(d_i,z_k)p(w_j|z_k) \\&=\sum_{k=1}^{K}p(z_k|d_i)p(d_i)p(w_j|z_k) \\&=p(d_i)\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k) \qquad\qquad(2) \end{aligned} p(di,wj)=k=1Kp(zk,di,wj)=k=1Kp(di,wjzk)p(zk)=k=1Kp(dizk)p(wjzk)p(zk)[(1)]=p(di,zk)p(wjzk)=k=1Kp(zkdi)p(di)p(wjzk)=p(di)k=1Kp(zkdi)p(wjzk)(2)
而由(2),联合概率转为条件概率:
p ( w j ∣ d i ) = p ( d i , w j ) p ( d i ) = ∑ k = 1 K p ( z k ∣ d i ) p ( w j ∣ z k ) ( 3 ) p(w_j|d_i)=\frac{p(d_i,w_j)}{p(d_i)}=\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k)\qquad(3) p(wjdi)=p(di)p(di,wj)=k=1Kp(zkdi)p(wjzk)(3)

(好像也可以这么考虑,从 d i d_i di生成 z k z_k zk,是 p ( z k ∣ d i ) p(z_k|d_i) p(zkdi),固定 d i d_i di,有 z k z_k zk类文档,所以会有 ∑ k = 1 K \sum_{k=1}^{K} k=1K,而文档对应的是单个单词 w j w_j wj,所以 p ( w j ∣ d i ) 会 是 如 上 形 式 p(w_j|d_i)会是如上形式 p(wjdi)

“极大似然”

  要使得 p ( D , W ) p(D,W) p(D,W)最大,也就是使得 L L L最大,表示文档 d i d_i di中出现单词 w j w_j wj n ( d i , w j ) n(d_i,w_j) n(di,wj)的概率,累乘得到 L L L,这和极大似然估计里面是一样的,使得由参数生成这样子的样本的可能性最大:
L = ∏ i = 1 N ∏ j = 1 M [ p ( d i , w i ) ] n ( d i , w j ) = ∏ i = 1 N ∏ j = 1 M [ p ( d i ) ∑ k = 1 K p ( z k ∣ d i ) p ( w j ∣ z k ) ] n ( d i , w j ) \begin{aligned}L &=\prod_{i=1}^{N}\prod_{j=1}^{M}[p(d_i,w_i)]^{n(d_i,w_j)} \\&=\prod_{i=1}^{N}\prod_{j=1}^{M}[p(d_i)\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k)]^{n(d_i,w_j)} \end{aligned} L=i=1Nj=1M[p(di,wi)]n(di,wj)=i=1Nj=1M[p(di)k=1Kp(zkdi)p(wjzk)]n(di,wj)

采用对数似然函数 l o g L logL logL,累乘变成累加,有:
l o g L = ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) l o g [ p ( d i ) ∑ k = 1 K p ( z k ∣ d i ) p ( w j ∣ z k ) ] = ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) l o g   p ( d i )    +    ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) l o g [ ∑ k = 1 K p ( z k ∣ d i ) p ( w j ∣ z k ) ]    ( 4 ) \begin{aligned}logL&=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)log[p(d_i)\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k)]\\ &=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)log\,p(d_i)\;+\;\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)log[\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k)]\;(4) \end{aligned} logL=i=1Nj=1Mn(di,wj)log[p(di)k=1Kp(zkdi)p(wjzk)]=i=1Nj=1Mn(di,wj)logp(di)+i=1Nj=1Mn(di,wj)log[k=1Kp(zkdi)p(wjzk)](4)
观察式(4),可以发现,现在要极大 l o g L logL logL,但是前半部分的 n ( d i , w j ) n(d_i,w_j) n(di,wj)是可以观察得到的, p ( d i ) p(d_i) p(di)也是可以观察得到的,都不是变量,都是常数,这种情况下,极大 l o g L logL logL,则只考虑后半部分,后半部分记做 L 1 L_1 L1。继续推导
L 1 = ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) l o g [ ∑ k = 1 K Q k ( z k ) p ( z k ∣ d i ) p ( w j ∣ z k ) Q k ( z k ) ] ≥ ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) ∑ k = 1 K Q k ( z k ) l o g [ p ( z k ∣ d i ) p ( w j ∣ z k ) Q k ( z k ) ] ( 5 ) \begin{aligned}L_1&=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)log[\sum_{k=1}^{K}Q_k(z_k)\frac{p(z_k|d_i)p(w_j|z_k)}{Q_k(z_k)}]\\ &\ge\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}Q_k(z_k)log[\frac{p(z_k|d_i)p(w_j|z_k)}{Q_k(z_k)}]\quad(5) \end{aligned} L1=i=1Nj=1Mn(di,wj)log[k=1KQk(zk)Qk(zk)p(zkdi)p(wjzk)]i=1Nj=1Mn(di,wj)k=1KQk(zk)log[Qk(zk)p(zkdi)p(wjzk)](5)
上式中,得到 L 1 ≥ ( 5 ) L_1\ge(5) L1(5),是由于Jensen不等式: l o g ∑ j λ j y j ≥ ∑ j λ j l o g y j λ j ≥ 0 , ∑ j λ j = 1 log\sum_{j}\lambda_jy_j\ge\sum_{j}\lambda_jlogy_j\qquad\lambda_j\ge0,\sum_{j}\lambda_j=1 logjλjyjjλjlogyjλj0,jλj=1

我们需要随便选择一个 Q k ( z k ) Q_k(z_k) Qk(zk),使得 p ( z k ∣ d i ) p ( w j ∣ z k ) Q k ( z k ) = c \frac{p(z_k|d_i)p(w_j|z_k)}{Q_k(z_k)}=c Qk(zk)p(zkdi)p(wjzk)=c
c c c是一个常数,不依赖于 z k z_k zk。这样的 Q k ( z k ) Q_k(z_k) Qk(zk)有很多,但是可以这样取,取为 p ( z k ∣ d i , w j ) p(z_k|d_i,w_j) p(zkdi,wj)。因为有:
p ( z k ∣ d i , w j ) = p ( z k , d i , w j ) p ( d i , w j ) = p ( d i , w j ∣ z k ) p ( z k ) p ( d i , w j ) = p ( d i ∣ z k ) p ( w j ∣ z k ) p ( z k ) p ( d i , w j ) = p ( w j ∣ z k ) p ( z k ∣ d i ) p ( d i ) p ( d i , w j ) = p ( w j ∣ z k ) p ( z k ∣ d i ) ∑ k = 1 K p ( z k ∣ d i ) p ( w j ∣ z k ) ( 6 ) \begin{aligned} p(z_k|d_i,w_j)&=\frac{p(z_k,d_i,w_j)}{p(d_i,w_j)}=\frac{p(d_i,w_j|z_k)p(z_k)}{p(d_i,w_j)}\\ &=\frac{p(d_i|z_k)p(w_j|z_k)p(z_k)}{p(d_i,w_j)}\\ &=\frac{p(w_j|z_k)p(z_k|d_i)p(d_i)}{p(d_i,w_j)}\\ &=\frac{p(w_j|z_k)p(z_k|d_i)}{\sum_{k=1}^{K}p(z_k|d_i)p(w_j|z_k)}\qquad(6) \end{aligned} p(zkdi,wj)=p(di,wj)p(zk,di,wj)=p(di,wj)p(di,wjzk)p(zk)=p(di,wj)p(dizk)p(wjzk)p(zk)=p(di,wj)p(wjzk)p(zkdi)p(di)=k=1Kp(zkdi)p(wjzk)p(wjzk)p(zkdi)(6)

Q k ( z k ) Q_k(z_k) Qk(zk)带入(5)可得:
∑ i = 1 N ∑ j = 1 M n ( d i , w j ) ∑ k = 1 K p ( z k ∣ d i , w j ) l o g [ p ( z k ∣ d i ) p ( w j ∣ z k ) p ( z k ∣ d i , w j ) ] ( 7 ) \sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j)log[\frac{p(z_k|d_i)p(w_j|z_k)}{p(z_k|d_i,w_j)}]\qquad(7) i=1Nj=1Mn(di,wj)k=1Kp(zkdi,wj)log[p(zkdi,wj)p(zkdi)p(wjzk)](7)

(7)中,log部分下面的分母在求极大时可以省去,因为在 l o g log log函数对参数 p ( z k ∣ d i ) p(z_k|d_i) p(zkdi) p ( w j ∣ z k ) p(w_j|z_k) p(wjzk)求偏导数时,如(ln5x)’=1/x,所以可以省去,如果保留,在下面也会发现不影响。

(7)省去了log下的分母后,得到:
f = ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) ∑ k = 1 K p ( z k ∣ d i , w j ) l o g [ p ( z k ∣ d i ) p ( w j ∣ z k ) ] ( 8 ) f=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j)log[p(z_k|d_i)p(w_j|z_k)]\qquad(8) f=i=1Nj=1Mn(di,wj)k=1Kp(zkdi,wj)log[p(zkdi)p(wjzk)](8)
所以接下来要做的就是最大化(8)。

EM算法

E-step:更新 Q z ( z k ) = p ( z k ∣ d i , w j ) Qz(z_k)=p(z_k|d_i,w_j) Qz(zk)=p(zkdi,wj)
M-step:最大化式(8),得到参数 p ( z k ∣ d i ) p(z_k|d_i) p(zkdi) p ( w j ∣ z k ) p(w_j|z_k) p(wjzk)
约束条件:
s . t . ∑ k = 1 K p ( z k ∣ d i ) = 1 ∑ j = 1 M p ( w j ∣ z k ) = 1 \begin{aligned}s.t.&\sum_{k=1}^{K}p(z_k|d_i)=1\\ &\sum_{j=1}^{M}p(w_j|z_k)=1 \end{aligned} s.t.k=1Kp(zkdi)=1j=1Mp(wjzk)=1
通过不断求取下界最大化( ≥ \ge ),逼近似然极大化。

拉格朗日法极大化(8)

  使用拉格朗日法求驻点,构造函数 L g Lg Lg:
L g = f + ∑ i = 1 N ρ i [ 1 − ∑ k = 1 K p ( z k ∣ d i ) ] + ∑ i = 1 N τ i [ 1 − ∑ j = 1 M p ( w j ∣ z k ) ] Lg=f+\sum_{i=1}^{N}\rho_i[1-\sum_{k=1}^{K}p(z_k|d_i)]+\sum_{i=1}^{N}\tau_i[1-\sum_{j=1}^{M}p(w_j|z_k)] Lg=f+i=1Nρi[1k=1Kp(zkdi)]+i=1Nτi[1j=1Mp(wjzk)]

L g Lg Lg的变量 p ( z k ∣ d i ) p(z_k|d_i) p(zkdi)求偏导得到:
∇ p ( z k ∣ d i ) L g = ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) ∑ k = 1 K p ( z k ∣ d i , w j ) p ( z k ∣ d i )    −    ∑ k = 1 K ∑ i = 1 N ρ i    = 0 \nabla_{p(z_k|d_i)}Lg=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}\frac{p(z_k|d_i,w_j)}{p(z_k|d_i)}\;-\;\sum_{k=1}^{K}\sum_{i=1}^{N}\rho_i\;=0 p(zkdi)Lg=i=1Nj=1Mn(di,wj)k=1Kp(zkdi)p(zkdi,wj)k=1Ki=1Nρi=0
对于减号左右两项, p ( z k ∣ d i ) p(z_k|d_i) p(zkdi)都是对k和i的累加(右边现在还没有),可以两边同时乘以 p ( z k ∣ d i ) p(z_k|d_i) p(zkdi),得:
∑ i = 1 N ∑ j = 1 M n ( d i , w j ) ∑ k = 1 K p ( z k ∣ d i , w j ) = ∑ i = 1 N ∑ k = 1 K ρ i p ( z k ∣ d i ) ( 9 ) \sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j)=\sum_{i=1}^{N}\sum_{k=1}^{K}\rho_ip(z_k|d_i)\qquad(9) i=1Nj=1Mn(di,wj)k=1Kp(zkdi,wj)=i=1Nk=1Kρip(zkdi)(9)
而由约束条件 ∑ k = 1 K p ( z k ∣ d i ) = 1 \sum_{k=1}^{K}p(z_k|d_i)=1 k=1Kp(zkdi)=1,所以从上式求得:
ρ i = ∑ j = 1 M n ( d i , w j ) ∑ k = 1 K p ( z k ∣ d i , w j ) \rho_i=\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j) ρi=j=1Mn(di,wj)k=1Kp(zkdi,wj)
因为 ∑ k = 1 K p ( z k ∣ d i , w j )    = 1 \sum_{k=1}^{K}p(z_k|d_i,w_j)\;=1 k=1Kp(zkdi,wj)=1,所以 ρ i \rho_i ρi也可以表示为 ρ i = n ( d i ) \rho_i=n(d_i) ρi=n(di)

继续,对于 L g Lg Lg的变量 p ( w j ∣ z k ) p(w_j|z_k) p(wjzk)求偏导得到:
∇ p ( w j ∣ z k ) L g = ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) ∑ k = 1 K p ( z k ∣ d i , w j ) p ( w j ∣ z k )    −    ∑ i = 1 M ∑ k = 1 K τ k    = 0 \nabla_{p(w_j|z_k)}Lg=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}\frac{p(z_k|d_i,w_j)}{p(w_j|z_k)}\;-\;\sum_{i=1}^{M}\sum_{k=1}^{K}\tau_k\;=0 p(wjzk)Lg=i=1Nj=1Mn(di,wj)k=1Kp(wjzk)p(zkdi,wj)i=1Mk=1Kτk=0
∑ i = 1 N ∑ j = 1 M n ( d i , w j ) ∑ k = 1 K p ( z k ∣ d i , w j ) p ( w j ∣ z k ) = ∑ i = 1 M ∑ k = 1 K τ k \sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}\frac{p(z_k|d_i,w_j)}{p(w_j|z_k)}=\sum_{i=1}^{M}\sum_{k=1}^{K}\tau_k i=1Nj=1Mn(di,wj)k=1Kp(wjzk)p(zkdi,wj)=i=1Mk=1Kτk
两边乘上 p ( w j ∣ z k ) p(w_j|z_k) p(wjzk)得:
∑ i = 1 N ∑ j = 1 M n ( d i , w j ) ∑ k = 1 K p ( z k ∣ d i , w j ) = ∑ k = 1 K τ k ∑ i = 1 M p ( w j ∣ z k ) ( 10 ) \sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j)=\sum_{k=1}^{K}\tau_k\sum_{i=1}^{M}p(w_j|z_k)\qquad(10) i=1Nj=1Mn(di,wj)k=1Kp(zkdi,wj)=k=1Kτki=1Mp(wjzk)(10)
由约束条件 ∑ j = 1 M p ( w j ∣ z k ) = 1 \sum_{j=1}^{M}p(w_j|z_k)=1 j=1Mp(wjzk)=1,得:
∑ i = 1 N ∑ j = 1 M n ( d i , w j ) ∑ k = 1 K p ( z k ∣ d i , w j ) = ∑ k = 1 K τ k \sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)\sum_{k=1}^{K}p(z_k|d_i,w_j)=\sum_{k=1}^{K}\tau_k i=1Nj=1Mn(di,wj)k=1Kp(zkdi,wj)=k=1Kτk
变形一下:
∑ k = 1 K ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) p ( z k ∣ d i , w j ) = ∑ k = 1 K τ k \sum_{k=1}^{K}\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)p(z_k|d_i,w_j)=\sum_{k=1}^{K}\tau_k k=1Ki=1Nj=1Mn(di,wj)p(zkdi,wj)=k=1Kτk
∴ \therefore
τ k = ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) p ( z k ∣ d i , w j ) \tau_k=\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)p(z_k|d_i,w_j) τk=i=1Nj=1Mn(di,wj)p(zkdi,wj)
于是,M步更新的两个参数 p ( w j ∣ z k ) p(w_j|z_k) p(wjzk) p ( z k ∣ d i ) p(z_k|d_i) p(zkdi)可以通过它们来表示,具体来看,先看(9)式,里面的未知量 ρ i \rho_i ρi已经表示出来了,所以可以通过(9)求得 p ( z k ∣ d i ) p(z_k|d_i) p(zkdi)
∑ k = 1 K ∑ j = 1 M n ( d i , w j ) p ( z k ∣ d i , w j ) ρ i    =    ∑ k = 1 K p ( z k ∣ d i ) \frac{\sum_{k=1}^{K}\sum_{j=1}^{M}n(d_i,w_j)p(z_k|d_i,w_j)}{\rho_i}\;=\;\sum_{k=1}^{K}p(z_k|d_i) ρik=1Kj=1Mn(di,wj)p(zkdi,wj)=k=1Kp(zkdi)
p ( z k ∣ d i )    =    ∑ j = 1 M n ( d i , w j ) p ( z k ∣ d i , w j ) n ( d i ) p(z_k|d_i)\;=\;\frac{\sum_{j=1}^{M}n(d_i,w_j)p(z_k|d_i,w_j)}{n(d_i)} p(zkdi)=n(di)j=1Mn(di,wj)p(zkdi,wj)
可以通过(10)求解 p ( w j ∣ z k ) p(w_j|z_k) p(wjzk)
p ( w j ∣ z k ) = ∑ i = 1 N n ( d i , w j ) τ k p(w_j|z_k)=\frac{\sum_{i=1}^{N}n(d_i,w_j)}{\tau_k} p(wjzk)=τki=1Nn(di,wj)
p ( w j ∣ z k ) = ∑ i = 1 N n ( d i , w j ) ∑ i = 1 N ∑ j = 1 M n ( d i , w j ) p ( z k ∣ d i , w j ) p(w_j|z_k)=\frac{\sum_{i=1}^{N}n(d_i,w_j)}{\sum_{i=1}^{N}\sum_{j=1}^{M}n(d_i,w_j)p(z_k|d_i,w_j)} p(wjzk)=i=1Nj=1Mn(di,wj)p(zkdi,wj)i=1Nn(di,wj)

……

  数学在挂科边缘试探的我……只希望模式识别能好好过去,推上面第一次推了一个下午才推完

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值