统计学习方法第十八章——概率潜在语义分析

18.2 概率潜在语义分析的算法

生成模型的对数似然函数是:
L = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) l o g P ( w i , d j ) = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) l o g [ ∑ k = 1 P ( w i ∣ z k ) P ( z k ∣ d j ) P ( d j ) ] = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) [ l o g P ( d j ) + l o g ( ∑ k = 1 P ( w i ∣ z k ) P ( z k ∣ d j ) ) ] = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) l o g P ( d j ) + ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) l o g ( ∑ k = 1 P ( w i ∣ z k ) P ( z k ∣ d j ) ) \begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)logP(w_i,d_j)\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)P(d_j)]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)[logP(d_j)+log(\sum_{k=1}P(w_i|z_k)P(z_k|d_j))]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)logP(d_j)+\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log(\sum_{k=1}P(w_i|z_k)P(z_k|d_j)) \end{aligned} L=i=1Mj=1Nn(wi,dj)logP(wi,dj)=i=1Mj=1Nn(wi,dj)log[k=1P(wizk)P(zkdj)P(dj)]=i=1Mj=1Nn(wi,dj)[logP(dj)+log(k=1P(wizk)P(zkdj))]=i=1Mj=1Nn(wi,dj)logP(dj)+i=1Mj=1Nn(wi,dj)log(k=1P(wizk)P(zkdj))
又因为前半部分是一个常数,与模型参数无关,于是将其省去,就得到书上的似然函数:
L = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) l o g [ ∑ k = 1 P ( w i ∣ z k ) P ( z k ∣ d j ) ] L=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)] L=i=1Mj=1Nn(wi,dj)log[k=1P(wizk)P(zkdj)]
E步:计算Q函数
L = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) l o g [ ∑ k = 1 P ( w i ∣ z k ) P ( z k ∣ d j ) ] = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) l o g [ ∑ k = 1 P ( z k ∣ w i , d j ) P ( w i ∣ z k ) P ( z k ∣ d j ) P ( z k ∣ w i , d j ) ] \begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(z_k|w_i,d_j)\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}] \end{aligned} L=i=1Mj=1Nn(wi,dj)log[k=1P(wizk)P(zkdj)]=i=1Mj=1Nn(wi,dj)log[k=1P(zkwi,dj)P(zkwi,dj)P(wizk)P(zkdj)]
上式中, 由Jensen不等式:
log ⁡ ∑ j λ j y j ≥ ∑ j λ j log ⁡ y j λ j ≥ 0 , ∑ j λ j = 1 \log \sum_{j} \lambda_{j} y_{j} \geq \sum_{j} \lambda_{j} \log y_{j} \quad \lambda_{j} \geq 0, \sum_{j} \lambda_{j}=1 logjλjyjjλjlogyjλj0,jλj=1

L = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) l o g [ ∑ k = 1 P ( z k ∣ w i , d j ) P ( w i ∣ z k ) P ( z k ∣ d j ) P ( z k ∣ w i , d j ) ] ⩾ ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) ∑ k = 1 K P ( z k ∣ w i , d j ) l o g [ P ( w i ∣ z k ) P ( z k ∣ d j ) P ( z k ∣ w i , d j ) ] \begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(z_k|w_i,d_j)\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}]\\ &\geqslant\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}] \end{aligned} L=i=1Mj=1Nn(wi,dj)log[k=1P(zkwi,dj)P(zkwi,dj)P(wizk)P(zkdj)]i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)log[P(zkwi,dj)P(wizk)P(zkdj)]

得到L的下界:
L = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) ∑ k = 1 K P ( z k ∣ w i , d j ) l o g [ P ( w i ∣ z k ) P ( z k ∣ d j ) P ( z k ∣ w i , d j ) ] = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) ∑ k = 1 K P ( z k ∣ w i , d j ) [ l o g [ P ( w i ∣ z k ) P ( z k ∣ d j ) ] − l o g P ( z k ∣ w i , d j ) ] = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) ∑ k = 1 K P ( z k ∣ w i , d j ) l o g [ P ( w i ∣ z k ) P ( z k ∣ d j ) ] − ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) ∑ k = 1 K P ( z k ∣ w i , d j ) l o g P ( z k ∣ w i , d j ) \begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)[log[P(w_i|z_k)P(z_k|d_j)]-logP(z_k|w_i,d_j)]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[P(w_i|z_k)P(z_k|d_j)]-\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)logP(z_k|w_i,d_j) \end{aligned} L=i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)log[P(zkwi,dj)P(wizk)P(zkdj)]=i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)[log[P(wizk)P(zkdj)]logP(zkwi,dj)]=i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)log[P(wizk)P(zkdj)]i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)logP(zkwi,dj)
又因为在极大化Q函数时,对 P ( w i ∣ z k )  和  P ( z k ∣ d j )  求偏导数  P\left(w_{i} \mid z_{k}\right) \text { 和 } P\left(z_{k} \mid d_{j}\right) \text { 求偏导数 } P(wizk)  P(zkdj) 求偏导数 ,后半部分偏导数为0,所以可以直接在这里将其省去,当然也可以留着,反正后面求导都会为0。因此
Q = ∑ i = 1 M ∑ j = 1 N n ( w i , d j ) ∑ k = 1 K P ( z k ∣ w i , d j ) l o g [ P ( w i ∣ z k ) P ( z k ∣ d j ) ] Q=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[P(w_i|z_k)P(z_k|d_j)] Q=i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)log[P(wizk)P(zkdj)]
就得到了书上的 Q ′ Q^{\prime} Q函数。其中
P ( z k ∣ w i , d j ) = P ( w i ∣ z k ) P ( z k ∣ d j ) ∑ k = 1 K P ( w i ∣ z k ) P ( z k ∣ d j ) P\left(z_{k} \mid w_{i}, d_{j}\right)=\frac{P\left(w_{i} \mid z_{k}\right) P\left(z_{k} \mid d_{j}\right)}{\sum_{k=1}^{K} P\left(w_{i} \mid z_{k}\right) P\left(z_{k} \mid d_{j}\right)} P(zkwi,dj)=k=1KP(wizk)P(zkdj)P(wizk)P(zkdj)
M步:极大化Q函数

因为变量 P ( w i ∣ z k ) , P ( z k ∣ d j ) P\left(w_{i} \mid z_{k}\right), P\left(z_{k} \mid d_{j}\right) P(wizk),P(zkdj) 形成概率分布, 满足约束条件
∑ i = 1 M P ( w i ∣ z k ) = 1 , k = 1 , 2 , ⋯   , K ∑ k = 1 K P ( z k ∣ d j ) = 1 , j = 1 , 2 , ⋯   , N \begin{aligned} &\sum_{i=1}^{M} P\left(w_{i} \mid z_{k}\right)=1, \quad k=1,2, \cdots, K \\ &\sum_{k=1}^{K} P\left(z_{k} \mid d_{j}\right)=1, \quad j=1,2, \cdots, N \end{aligned} i=1MP(wizk)=1,k=1,2,,Kk=1KP(zkdj)=1,j=1,2,,N
应用拉格朗日法, 引入拉格朗日乘子 τ k \tau_{k} τk ρ j \rho_{j} ρj, 定义拉格朗日函数 Λ \Lambda Λ
Λ = Q ′ + ∑ k = 1 K τ k ( 1 − ∑ i = 1 M P ( w i ∣ z k ) ) + ∑ j = 1 N ρ j ( 1 − ∑ k = 1 K P ( z k ∣ d j ) ) \Lambda=Q^{\prime}+\sum_{k=1}^{K} \tau_{k}\left(1-\sum_{i=1}^{M} P\left(w_{i} \mid z_{k}\right)\right)+\sum_{j=1}^{N} \rho_{j}\left(1-\sum_{k=1}^{K} P\left(z_{k} \mid d_{j}\right)\right) Λ=Q+k=1Kτk(1i=1MP(wizk))+j=1Nρj(1k=1KP(zkdj))
将拉格朗日函数 Λ \Lambda Λ 分别对 P ( w i ∣ z k ) P\left(w_{i} \mid z_{k}\right) P(wizk) P ( z k ∣ d j ) P\left(z_{k} \mid d_{j}\right) P(zkdj) 求偏导数, 并令其等于 0 , 得到下面的方程组
∑ j = 1 N n ( w i , d j ) P ( z k ∣ w i , d j ) − τ k P ( w i ∣ z k ) = 0 , i = 1 , 2 , ⋯   , M ; k = 1 , 2 , ⋯   , K ∑ i = 1 M n ( w i , d j ) P ( z k ∣ w i , d j ) − ρ j P ( z k ∣ d j ) = 0 , j = 1 , 2 , ⋯   , N ; k = 1 , 2 , ⋯   , K \begin{aligned} &\sum_{j=1}^{N} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)-\tau_{k} P\left(w_{i} \mid z_{k}\right)=0, \quad i=1,2, \cdots, M ; \quad k=1,2, \cdots, K\\ &\sum_{i=1}^{M} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)-\rho_{j} P\left(z_{k} \mid d_{j}\right)=0, \quad j=1,2, \cdots, N ; \quad k=1,2, \cdots, K \end{aligned} j=1Nn(wi,dj)P(zkwi,dj)τkP(wizk)=0,i=1,2,,M;k=1,2,,Ki=1Mn(wi,dj)P(zkwi,dj)ρjP(zkdj)=0,j=1,2,,N;k=1,2,,K
现求解 τ k 和 ρ j \tau_k和\rho_j τkρj,两边分别同时对i和k求和得到:
∑ i = 1 M ∑ i = 1 M n ( w i , d j ) P ( z k ∣ w j , d j ) = ∑ i = 1 M τ k P ( w i ∣ z k ) = τ k ∑ k = 1 K ∑ i = 1 M n ( w i , d j ) P ( z k ∣ w i , d j ) = ∑ k = 1 K ρ j P ( z k ∣ d j ) = ρ j \begin{aligned} &\sum_{i=1}^M\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_j,d_j)=\sum_{i=1}^M\tau_kP(w_i|z_k)=\tau_k\\ &\sum_{k=1}^K\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_i,d_j)=\sum_{k=1}^K\rho_jP(z_k|d_j)=\rho_j \end{aligned} i=1Mi=1Mn(wi,dj)P(zkwj,dj)=i=1MτkP(wizk)=τkk=1Ki=1Mn(wi,dj)P(zkwi,dj)=k=1KρjP(zkdj)=ρj
于是得到:
ρ j = ∑ k = 1 K ∑ i = 1 M n ( w i , d j ) P ( z k ∣ w j , d j ) = ∑ i = 1 M n ( w i , d j ) = n ( d j ) τ k = ∑ j = 1 N ∑ i = 1 M n ( w i , d j ) P ( z k ∣ w i , d j ) \begin{aligned} \rho_j&=\sum_{k=1}^K\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_j,d_j)=\sum_{i=1}^Mn(w_i,d_j)=n(d_j)\\ \tau_k&=\sum_{j=1}^N\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_i,d_j) \end{aligned} ρjτk=k=1Ki=1Mn(wi,dj)P(zkwj,dj)=i=1Mn(wi,dj)=n(dj)=j=1Ni=1Mn(wi,dj)P(zkwi,dj)

将求得的 τ k 和 ρ j \tau_k和\rho_j τkρj代回方程组得参数估计公式:
P ( w i ∣ z k ) = ∑ j = 1 N n ( w i , d j ) P ( z k ∣ w i , d j ) ∑ m = 1 M ∑ j = 1 N n ( w m , d j ) P ( z k ∣ w m , d j ) P ( z k ∣ d j ) = ∑ i = 1 M n ( w i , d j ) P ( z k ∣ w i , d j ) n ( d j ) \begin{aligned} &P\left(w_{i} \mid z_{k}\right)=\frac{\sum_{j=1}^{N} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)}{\sum_{m=1}^{M} \sum_{j=1}^{N} n\left(w_{m}, d_{j}\right) P\left(z_{k} \mid w_{m}, d_{j}\right)}\\ &P\left(z_{k} \mid d_{j}\right)=\frac{\sum_{i=1}^{M} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)}{n\left(d_{j}\right)} \end{aligned} P(wizk)=m=1Mj=1Nn(wm,dj)P(zkwm,dj)j=1Nn(wi,dj)P(zkwi,dj)P(zkdj)=n(dj)i=1Mn(wi,dj)P(zkwi,dj)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值