机器学习之潜在狄利克雷分配(LDA)变分EM算法及python实现

LDA变分EM算法

变分推理(variational inference)是贝叶斯学习中常用的,含有隐变量模型的学习方法。变分推理和上一章节的吉布斯采样不同,吉布斯采样的算法通过随机抽样的方法近似计算模型的后验概率,而变分推理通过解析的方法计算模型的后验概率的近似值。

本章节内容参考李航博士的《统计学习方法》
书中是简化版本的变分EM算法推导,本章节进行完整的推导。
上一章节介绍了LDA的吉布斯采样方法.

1. 变分推理

变分推理的思想如下:
假设模型是联合概率分布 p ( x , z ) p(x,z) p(x,z),其中 x x x是观测变量, z z z是隐变量 ,包括参数。目标是学习模型的后验概率 p ( z ∣ x ) p(z|x) p(zx)。但是这个分布复杂无法直接求解,所以考虑用概率分布 q ( z ) q(z) q(z)来近似条件概率分布 p ( z ∣ x ) p(z|x) p(zx),之后用KL散度 K L ( q ( z ) ∣ ∣ p ( z ∣ x ) ) KL(q(z)||p(z|x)) KL(q(z)p(zx))计算两者之间的相似度, q ( z ) q(z) q(z)称为变分分布。

KL散度可以写成以下形式:

K L ( q ( z ) ∣ ∣ p ( z ∣ x ) ) = ∑ z q ( z ) log ⁡ q ( z ) p ( z ∣ x ) = ∑ z q ( z ) log ⁡ q ( z ) − ∑ z q ( z ) log ⁡ ( p ( z ∣ x ) ) = ∑ z q ( z ) log ⁡ q ( z ) − ∑ z q ( z ) log ⁡ p ( z , x ) p ( x ) = ∑ z q ( z ) log ⁡ q ( z ) − ∑ z q ( z ) log ⁡ p ( z , x ) + ∑ z q ( z ) log ⁡ p ( x ) = ∑ z q ( z ) log ⁡ q ( z ) − ∑ z q ( z ) log ⁡ p ( z , x ) + log ⁡ p ( x ) = log ⁡ p ( x ) − { E q [ log ⁡ p ( x , z ) ] − E q [ log ⁡ q ( z ) ] } \begin{aligned} KL(q(z) || p(z|x)) &= \sum_z q(z) \log \frac {q(z)} {p(z|x)} \\ &= \sum_z q(z) \log q(z) - \sum_z q(z) \log (p(z|x)) \\ & = \sum_z q(z) \log q(z) - \sum_z q(z) \log \frac {p(z, x)}{p(x)} \\ & = \sum_z q(z) \log q(z) - \sum_z q(z) \log p(z, x) + \sum_z q(z) \log p(x) \\ & = \sum_z q(z) \log q(z) - \sum_z q(z) \log p(z, x) + \log p(x) \\ & = \log p(x) -\{ E_q[\log p(x,z)] - E_q[\log q(z)] \} \end{aligned} KL(q(z)p(zx))=zq(z)logp(zx)q(z)=zq(z)logq(z)zq(z)log(p(zx))=zq(z)logq(z)zq(z)logp(x)p(z,x)=zq(z)logq(z)zq(z)logp(z,x)+zq(z)logp(x)=zq(z)logq(z)zq(z)logp(z,x)+logp(x)=logp(x){Eq[logp(x,z)]Eq[logq(z)]}

因为KL散度是大于等于0,但且仅当两个分布一致时位0,所以可以得到:

log ⁡ p ( x ) ≥ E q [ log ⁡ p s ( x , z ) ] − E q [ log ⁡ q ( z ) ] \log p(x) \ge E_q[\log ps(x,z)] - E_q[\log q(z)] logp(x)Eq[logps(x,z)]Eq[logq(z)]

不等式右断是左端的下界,左端称为证据(evidence),有端称为证据下界(evidence lower bound, ELBO),记做:

L ( q ) = E q [ log ⁡ p ( x , z ) ] − E q [ log ⁡ q ( z ) ] L(q) = E_q[\log p(x,z)] - E_q[\log q(z)] L(q)=Eq[logp(x,z)]Eq[logq(z)]

因为 x x x是观测量,所以 p ( x ) p(x) p(x)是定值,所以通过求解证据下界的最大化来求取 q ( z ) q(z) q(z),因为证据下界越大,越靠近 p ( x ) p(x) p(x),BL散度越小,说明 q ( z ) q(z) q(z) p ( z ∣ x ) p(z|x) p(zx)的分布越接近。

其实证据下界也可以由Jensen不等式得到,如下推导:

log ⁡ p ( x ) = log ⁡ ∫ z p ( x , z ) d z = log ⁡ ∫ z p ( x , z ) q ( z ) q ( z ) d z = log ⁡ ∫ z q ( z ) p ( z , x ) q ( z ) d z = log ⁡ [ E q p ( x , z ) q ( z ) ] ≥ E q [ log ⁡ p ( x , z ) q ( z ) ] = E q [ log ⁡ p ( x , z ) ] − E q [ log ⁡ q ( z ) ] \begin{aligned} \log p(x) & = \log \int_z p(x, z) d z \\ & = \log \int_z p(x, z) \frac {q(z)} {q(z)} d z \\ & = \log \int_z q(z) \frac {p(z,x)} {q(z)} d z \\ & = \log [E_q \frac{p(x,z)}{q(z)}] \\ & \ge E_q[\log \frac {p(x,z)} {q(z)}] \\ &= E_q[\log p(x,z)] - E_q[\log q(z)] \end{aligned} logp(x)=logzp(x,z)dz=logzp(x,z)q(z)q(z)dz=logzq(z)q(z)p(z,x)dz=log[Eqq(z)p(x,z)]Eq[logq(z)p(x,z)]=Eq[logp(x,z)]Eq[logq(z)]

通常假设 q ( z ) q(z) q(z) z z z的所有分量都是相互独立的(实际是条件独立于参数),即满足

q ( z ) = q ( z 1 ) q ( z 2 ) . . . q ( z n ) q(z) = q(z_1)q(z_2)...q(z_n) q(z)=q(z1)q(z2)...q(zn)

这时的变分分布称为平均场(mean filed).

2. LDA的EM算法推导
2.1 证据下界的定义

在这里插入图片描述
如上图所示,实线圈是LDA的模型参数,而虚线圈代表了变分参数。

和上一章节的参数一致,随机变量 w , z , θ , φ w, z, \theta, \varphi w,z,θ,φ的联合概率分布是:

p ( w , z , θ , φ ∣ α , β ) = p ( θ ∣ α ) p ( φ ∣ β ) p ( z ∣ θ ) p ( w ∣ z , φ ) p(w, z,\theta, \varphi | \alpha, \beta) = p(\theta|\alpha) p(\varphi|\beta) p(z|\theta) p(w|z,\varphi) p(w,z,θ,φα,β)=p(θα)p(φβ)p(zθ)p(wz,φ)

其中 w w w是可观测变量, θ , φ , z \theta, \varphi, z θ,φ,z是隐变量, α , β \alpha, \beta α,β是参数。

定义基于平均场的变分分布

q ( θ , φ , z ∣ γ , ρ , η ) = q ( θ ∣ γ ) q ( φ ∣ ρ ) q ( z ∣ η ) q(\theta, \varphi, z| \gamma, \rho, \eta) = q(\theta| \gamma) q(\varphi|\rho) q(z| \eta) q(θ,φ,zγ,ρ,η)=q(θγ)q(φρ)q(zη)

其中 γ , ρ \gamma, \rho γ,ρ分别是 θ , φ \theta, \varphi θ,φ的狄利克雷分布参数, η \eta η z z z的多项分布参数。

由此,可以得到文本的证据下界

L ( γ , η , ρ , α , β ) = E q [ log ⁡ p ( w , z , θ , φ ∣ α , β ) ] − E q [ log ⁡ p ( θ , φ , z ∣ γ , ρ , η ) ] L(\gamma,\eta,\rho,\alpha,\beta) = E_q[\log p(w, z,\theta, \varphi | \alpha, \beta)] - E_q[\log p(\theta, \varphi, z| \gamma, \rho, \eta)] L(γ,η,ρ,α,β)=Eq[logp(w,z,θ,φα,β)]Eq[logp(θ,φ,zγ,ρ,η)]

将上式展开

L ( γ , η , ρ , α , β ) = E q [ log ⁡ p ( θ ∣ α ) ] + E q [ log ⁡ p ( φ ∣ β ) ] + E q [ log ⁡ p ( z ∣ θ ) ] + E q [ log ⁡ p ( w ∣ z , φ ) ] − E q [ log ⁡ q ( θ ∣ γ ) ] − E q [ log ⁡ q ( φ ∣ ρ ) ] − E q [ log ⁡ q ( z ∣ η ) ] L(\gamma,\eta,\rho,\alpha,\beta) = E_q[\log p(\theta|\alpha)] + E_q[\log p(\varphi|\beta)] + E_q[\log p(z|\theta)] + E_q[\log p(w|z,\varphi)] \\ - E_q[\log q(\theta| \gamma)] - E_q[\log q(\varphi|\rho)] - E_q[\log q(z| \eta)] L(γ,η,ρ,α,β)=Eq[logp(θα)]+Eq[logp(φβ)]+Eq[logp(zθ)]+Eq[logp(wz,φ)]Eq[logq(θγ)]Eq[logq(φρ)]Eq[logq(zη)]

变分参数增加了各个隐变量之间的独立性

下面就是依据变分参数和模型参数将上面每一项展开

第一项:

E q [ log ⁡ p ( θ ∣ α ) ] = E q [ log ⁡ ∏ m = 1 M p ( θ m ∣ α ) ] = E q [ log ⁡ ∏ m = 1 M Γ ( ∑ k = 1 K α k ) ∏ k = 1 K Γ ( α k ) ∏ k = 1 K θ m k α k − 1 ] = ∑ m = 1 M E q [ log ⁡ Γ ( ∑ k = 1 K α k ) − ∑ k = 1 K log ⁡ Γ ( α k ) + ∑ k = 1 K ( α k − 1 ) log ⁡ θ m k ] = ∑ m = 1 M { log ⁡ Γ ( ∑ k = 1 K α k ) − ∑ k = 1 K log ⁡ Γ ( α k ) + ∑ k = 1 K ( α k − 1 ) E q [ log ⁡ θ m k ] } = ∑ m = 1 M { log ⁡ Γ ( ∑ k = 1 K α k ) − ∑ k = 1 K log ⁡ Γ ( α k ) + ∑ k = 1 K ( α k − 1 ) [ Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) ] } \begin{aligned} E_q[\log p(\theta|\alpha)] &= E_q[\log \prod_{m=1}^M p(\theta_m|\alpha)] \\ & = E_q[\log \prod_{m=1}^M \frac {\Gamma(\sum_{k=1}^K \alpha_k)} {\prod_{k=1}^K \Gamma(\alpha_k)} \prod_{k=1}^K \theta_{mk}^{\alpha_k-1} ] \\ & = \sum_{m=1}^M E_q[\log \Gamma(\sum_{k=1}^K \alpha_k) - \sum_{k=1}^K \log \Gamma(\alpha_k) + \sum_{k=1}^K (\alpha_k-1) \log \theta_{mk} ] \\ & = \sum_{m=1}^M \{\log \Gamma(\sum_{k=1}^K \alpha_k) - \sum_{k=1}^K \log \Gamma(\alpha_k) + \sum_{k=1}^K (\alpha_k-1)E_q[ \log \theta_{mk}] \} \\ &= \sum_{m=1}^M \{\log \Gamma(\sum_{k=1}^K \alpha_k) - \sum_{k=1}^K \log \Gamma(\alpha_k) + \sum_{k=1}^K (\alpha_k-1)[\Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml})]\} \\ \end{aligned} Eq[logp(θα)]=Eq[logm=1Mp(θmα)]=Eq[logm=1Mk=1KΓ(αk)Γ(k=1Kαk)k=1Kθmkαk1]=m=1MEq[logΓ(k=1Kαk)k=1KlogΓ(αk)+k=1K(αk1)logθmk]=m=1M{logΓ(k=1Kαk)k=1KlogΓ(αk)+k=1K(αk1)Eq[logθmk]}=m=1M{logΓ(k=1Kαk)k=1KlogΓ(αk)+k=1K(αk1)[Ψ(γmk)Ψ(l=1Kγml)]}

其中 Ψ \Psi Ψ是digamma函数,是对数伽马函数的一阶导数。

第二项:

E q [ log ⁡ p ( φ ∣ β ) ] = E q [ log ⁡ ∏ k = 1 K p ( φ k ∣ β ) ] = E q [ log ⁡ ∏ k = 1 K Γ ( ∑ v = 1 V β v ) ∏ v = 1 V Γ ( β v ) ∏ v = 1 V φ k v β v − 1 ] = ∑ k = 1 K E q [ log ⁡ Γ ( ∑ v = 1 V β v ) − ∑ v = 1 V log ⁡ Γ ( β v ) + ∑ v = 1 V ( β v − 1 ) log ⁡ φ k v ] = ∑ k = 1 K { log ⁡ Γ ( ∑ v = 1 V β v ) − ∑ v = 1 V log ⁡ Γ ( β v ) + ∑ v = 1 V ( β v − 1 ) E q [ log ⁡ φ k v ] } = ∑ k = 1 K { log ⁡ Γ ( ∑ v = 1 V β v ) − ∑ v = 1 V log ⁡ Γ ( β v ) + ∑ v = 1 V ( β v − 1 ) [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] } \begin{aligned} E_q[\log p(\varphi|\beta)] &= E_q[\log \prod_{k=1}^K p(\varphi_k|\beta)] \\ & =E_q[\log \prod_{k=1}^K \frac {\Gamma (\sum_{v=1}^V \beta_v)} {\prod_{v=1}^V \Gamma(\beta_v)} \prod_{v=1}^V \varphi_{kv}^{\beta_v-1}] \\ &= \sum_{k=1}^K E_q[\log \Gamma (\sum_{v=1}^V \beta_v) - \sum_{v=1}^V \log \Gamma(\beta_v) + \sum_{v=1}^V (\beta_v -1) \log \varphi_{kv}] \\ &= \sum_{k=1}^K \{ \log \Gamma (\sum_{v=1}^V \beta_v) - \sum_{v=1}^V \log \Gamma(\beta_v) + \sum_{v=1}^V (\beta_v -1) E_q[ \log \varphi_{kv}] \}\\ &= \sum_{k=1}^K \{ \log \Gamma (\sum_{v=1}^V \beta_v) - \sum_{v=1}^V \log \Gamma(\beta_v) + \sum_{v=1}^V (\beta_v -1) [\Psi(\rho_{kv} )- \Psi( \sum_{s=1}^V \rho_{ks} ) ] \}\\ \end{aligned} Eq[logp(φβ)]=Eq[logk=1Kp(φkβ)]=Eq[logk=1Kv=1VΓ(βv)Γ(v=1Vβv)v=1Vφkvβv1]=k=1KEq[logΓ(v=1Vβv)v=1VlogΓ(βv)+v=1V(βv1)logφkv]=k=1K{logΓ(v=1Vβv)v=1VlogΓ(βv)+v=1V(βv1)Eq[logφkv]}=k=1K{logΓ(v=1Vβv)v=1VlogΓ(βv)+v=1V(βv1)[Ψ(ρkv)Ψ(s=1Vρks)]}

第三项:

E q [ log ⁡ p ( z ∣ θ ) ] = E q [ log ⁡ ∏ m = 1 M ∏ n = 1 N m ∏ k = 1 K θ m k I ( z m n = k ) ] = ∑ m = 1 M ∑ n = 1 N m ∑ k = 1 E q [ I ( z m n = k ) ] E q [ log ⁡ θ m k ] = ∑ m = 1 M ∑ n = 1 N m ∑ k = 1 η m n k E q [ log ⁡ θ m k ] = ∑ m = 1 M ∑ n = 1 N m ∑ k = 1 η m n k { Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) } \begin{aligned} E_q[\log p(z|\theta)] &= E_q[\log \prod_{m=1}^M \prod_{n=1}^{N_m} \prod_{k=1}^K \theta_{mk}^{I(z_{mn} =k)}] \\ & = \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{k=1} E_q[I(z_{mn} =k)] E_q[\log \theta_{mk}] \\ & = \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{k=1} \eta_{mnk} E_q[\log \theta_{mk}] \\ & = \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{k=1} \eta_{mnk} \{\Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml}) \} \end{aligned} Eq[logp(zθ)]=Eq[logm=1Mn=1Nmk=1KθmkI(zmn=k)]=m=1Mn=1Nmk=1Eq[I(zmn=k)]Eq[logθmk]=m=1Mn=1Nmk=1ηmnkEq[logθmk]=m=1Mn=1Nmk=1ηmnk{Ψ(γmk)Ψ(l=1Kγml)}

第四项:

E q [ log ⁡ p ( w ∣ z , φ ) ] = E q [ log ⁡ ∏ m = 1 M ∏ n = 1 N m ∏ v = 1 V ∏ k = 1 K φ k v I ( z m n = k , w m n = v ) ] = E q [ ∑ m = 1 M ∑ n = 1 N m ∑ v = 1 V ∑ k = 1 K I ( z m n = k , w m n = v ) log ⁡ φ k v ] = ∑ m = 1 M ∑ n = 1 N m ∑ v = 1 V ∑ k = 1 K I ( z m n = k , ) I ( w m n = v ) E q [ log ⁡ φ k v ] = ∑ m = 1 M ∑ n = 1 N m ∑ v = 1 V ∑ k = 1 K η n m k w m n v [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] \begin{aligned} E_q[\log p(w|z,\varphi)] &= E_q[\log \prod_{m=1}^M \prod_{n=1}^{N_m} \prod_{v=1}^V \prod_{k=1}^K \varphi_{kv} ^{I(z_{mn}=k, w_{mn}=v)}]\\ & = E_q[ \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{v=1}^V \sum_{k=1}^K I(z_{mn}=k, w_{mn}=v) \log \varphi_{kv}]\\ & = \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{v=1}^V \sum_{k=1}^K I(z_{mn}=k,) I(w_{mn}=v) E_q[\log \varphi_{kv}] \\ &= \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{v=1}^V \sum_{k=1}^K \eta_{nmk} w_{mn}^v [\Psi(\rho_{kv}) - \Psi(\sum_{s=1}^V \rho_{ks})] \\ \end{aligned} Eq[logp(wz,φ)]=Eq[logm=1Mn=1Nmv=1Vk=1KφkvI(zmn=k,wmn=v)]=Eq[m=1Mn=1Nmv=1Vk=1KI(zmn=k,wmn=v)logφkv]=m=1Mn=1Nmv=1Vk=1KI(zmn=k,)I(wmn=v)Eq[logφkv]=m=1Mn=1Nmv=1Vk=1Kηnmkwmnv[Ψ(ρkv)Ψ(s=1Vρks)]

第五项:

E q [ log ⁡ q ( θ ∣ γ ) ] = E q [ log ⁡ ∏ m = 1 M q ( θ m ∣ γ m ) ] = E q [ log ⁡ ∏ m = 1 M Γ ( ∑ k = 1 K γ m k ) ∏ k = 1 K Γ ( γ m k ) ∏ k = 1 K θ m k γ m k − 1 ] = ∑ m = 1 M E q [ log ⁡ Γ ( ∑ k = 1 K γ m k ) − ∑ k = 1 K log ⁡ Γ ( γ m k ) + ∑ k = 1 K ( γ m k − 1 ) log ⁡ θ m k ] = ∑ m = 1 M { log ⁡ Γ ( ∑ k = 1 K γ m k ) − ∑ k = 1 K log ⁡ Γ ( γ m k ) + ∑ k = 1 K ( γ m k − 1 ) E q [ log ⁡ θ m k ] } = ∑ m = 1 M { log ⁡ Γ ( ∑ k = 1 K γ m k ) − ∑ k = 1 K log ⁡ Γ ( γ m k ) + ∑ k = 1 K ( γ m k − 1 ) [ Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) ] } \begin{aligned} E_q[\log q(\theta| \gamma)] &= E_q[\log \prod_{m=1}^M q(\theta_m|\gamma_m)] \\ & = E_q[\log \prod_{m=1}^M \frac {\Gamma(\sum_{k=1}^K \gamma_{mk})} {\prod_{k=1}^K \Gamma(\gamma_{mk})} \prod_{k=1}^K \theta_{mk}^{\gamma_{mk} - 1}] \\ &=\sum_{m=1}^M E_q[\log \Gamma(\sum_{k=1}^K \gamma_{mk}) - \sum_{k=1}^K \log \Gamma (\gamma_{mk}) + \sum_{k=1}^K (\gamma_{mk} - 1) \log \theta_{mk}] \\ &= \sum_{m=1}^M \{ \log \Gamma(\sum_{k=1}^K \gamma_{mk}) - \sum_{k=1}^K \log \Gamma (\gamma_{mk}) + \sum_{k=1}^K (\gamma_{mk} - 1) E_q[\log \theta_{mk}] \}\\ &= \sum_{m=1}^M \{ \log \Gamma(\sum_{k=1}^K \gamma_{mk}) - \sum_{k=1}^K \log \Gamma (\gamma_{mk}) + \sum_{k=1}^K (\gamma_{mk} - 1) [\Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml})] \}\\ \end{aligned} Eq[logq(θγ)]=Eq[logm=1Mq(θmγm)]=Eq[logm=1Mk=1KΓ(γmk)Γ(k=1Kγmk)k=1Kθmkγmk1]=m=1MEq[logΓ(k=1Kγmk)k=1KlogΓ(γmk)+k=1K(γmk1)logθmk]=m=1M{logΓ(k=1Kγmk)k=1KlogΓ(γmk)+k=1K(γmk1)Eq[logθmk]}=m=1M{logΓ(k=1Kγmk)k=1KlogΓ(γmk)+k=1K(γmk1)[Ψ(γmk)Ψ(l=1Kγml)]}

第六项:

E q [ log ⁡ q ( φ ∣ ρ ) ] = E q [ log ⁡ ∏ k = 1 K q ( φ k ) ∣ ρ k ] = E q [ log ⁡ ∏ k = 1 K Γ ( ∑ v = 1 V ρ k v ) ∏ v = 1 V Γ ( ρ k v ) ∏ v = 1 V φ k v ρ k v − 1 ] = ∑ k = 1 K E q [ log ⁡ Γ ( ∑ v = 1 V ρ k v ) − ∑ v = 1 V log ⁡ Γ ( ρ k v ) + ∑ v = 1 V ( ρ k v − 1 ) log ⁡ φ k v ] = ∑ k = 1 K { log ⁡ Γ ( ∑ v = 1 V ρ k v ) − ∑ v = 1 V log ⁡ Γ ( ρ k v ) + ∑ v = 1 V ( ρ k v − 1 ) E q [ log ⁡ φ k v ] } = ∑ k = 1 K { log ⁡ Γ ( ∑ v = 1 V ρ k v ) − ∑ v = 1 V log ⁡ Γ ( ρ k v ) + ∑ v = 1 V ( ρ k v − 1 ) [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] } \begin{aligned} E_q[\log q(\varphi|\rho)] &= E_q[\log \prod_{k=1}^K q(\varphi_k)| \rho_k] \\ & = E_q[\log \prod_{k=1}^K \frac {\Gamma (\sum_{v=1}^V \rho_{kv})} {\prod_{v=1}^V \Gamma(\rho_{kv})} \prod_{v=1}^V \varphi_{kv}^{\rho_{kv}-1}] \\ & = \sum_{k=1}^K E_q[\log \Gamma (\sum_{v=1}^V \rho_{kv}) - \sum_{v=1}^V \log \Gamma(\rho_{kv}) + \sum_{v=1}^V (\rho_{kv}-1) \log \varphi_{kv}] \\ &= \sum_{k=1}^K \{\log \Gamma (\sum_{v=1}^V \rho_{kv}) - \sum_{v=1}^V \log \Gamma(\rho_{kv}) + \sum_{v=1}^V (\rho_{kv}-1) E_q[\log \varphi_{kv}] \} \\ &= \sum_{k=1}^K \{\log \Gamma (\sum_{v=1}^V \rho_{kv}) - \sum_{v=1}^V \log \Gamma(\rho_{kv}) + \sum_{v=1}^V (\rho_{kv}-1) [\Psi(\rho_{kv} )- \Psi( \sum_{s=1}^V \rho_{ks} ) ] \} \\ \end{aligned} Eq[logq(φρ)]=Eq[logk=1Kq(φk)ρk]=Eq[logk=1Kv=1VΓ(ρkv)Γ(v=1Vρkv)v=1Vφkvρkv1]=k=1KEq[logΓ(v=1Vρkv)v=1VlogΓ(ρkv)+v=1V(ρkv1)logφkv]=k=1K{logΓ(v=1Vρkv)v=1VlogΓ(ρkv)+v=1V(ρkv1)Eq[logφkv]}=k=1K{logΓ(v=1Vρkv)v=1VlogΓ(ρkv)+v=1V(ρkv1)[Ψ(ρkv)Ψ(s=1Vρks)]}

第七项:

E q [ log ⁡ q ( z ∣ η ) ] = E q [ log ⁡ ∏ m = 1 M ∏ n = 1 N m q ( z m n ∣ η m n ) ] = E q [ log ⁡ ∏ m = 1 M ∏ n = 1 N m ∏ k = 1 K η m n k I ( z m n = k ) ] = ∑ m = 1 M ∑ n = 1 N m ∑ k = 1 K E q [ I ( z m n = k ) ] E q [ log ⁡ η m n k ] = ∑ m = 1 M ∑ n = 1 N m ∑ k = 1 K η m n k log ⁡ η m n k \begin{aligned} E_q[\log q(z| \eta)] &= E_q[\log \prod_{m=1}^M \prod_{n=1}^{N_m} q(z_{mn}|\eta_{mn})]\\ & = E_q[\log \prod_{m=1}^M \prod_{n=1}^{N_m} \prod_{k=1}^K \eta_{mnk}^{I(z_{mn}=k)}]\\ & = \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{k=1}^K E_q[I(z_{mn}=k)] E_q[\log \eta_{mnk}] \\ & = \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{k=1}^K \eta_{mnk} \log \eta_{mnk} \end{aligned} Eq[logq(zη)]=Eq[logm=1Mn=1Nmq(zmnηmn)]=Eq[logm=1Mn=1Nmk=1KηmnkI(zmn=k)]=m=1Mn=1Nmk=1KEq[I(zmn=k)]Eq[logηmnk]=m=1Mn=1Nmk=1Kηmnklogηmnk

将上面的公式带入证据下界

L ( γ , η , ρ , α , β ) = E q [ log ⁡ p ( θ ∣ α ) ] + E q [ log ⁡ p ( φ ∣ β ) ] + E q [ log ⁡ p ( z ∣ θ ) ] + E q [ log ⁡ p ( w ∣ z , φ ) ] − E q [ log ⁡ q ( θ ∣ γ ) ] − E q [ log ⁡ q ( φ ∣ ρ ) ] − E q [ log ⁡ q ( z ∣ η ) ] = ∑ m = 1 M { log ⁡ Γ ( ∑ k = 1 K α k ) − ∑ k = 1 K log ⁡ Γ ( α k ) + ∑ k = 1 K ( α k − 1 ) [ Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) ] } + ∑ k = 1 K { log ⁡ Γ ( ∑ v = 1 V β v ) − ∑ v = 1 V log ⁡ Γ ( β v ) + ∑ v = 1 V ( β v − 1 ) [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] } + ∑ m = 1 M ∑ n = 1 N m ∑ k = 1 K η m n k { Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) } + ∑ m = 1 M ∑ n = 1 N m ∑ v = 1 V ∑ k = 1 K η n m k w m n v [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] − ∑ m = 1 M { log ⁡ Γ ( ∑ k = 1 K γ m k ) − ∑ k = 1 K log ⁡ Γ ( γ m k ) + ∑ k = 1 K ( γ m k − 1 ) [ Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) ] } − ∑ k = 1 K { log ⁡ Γ ( ∑ v = 1 V ρ k v ) − ∑ v = 1 V log ⁡ Γ ( ρ k v ) + ∑ v = 1 V ( ρ k v − 1 ) [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] } − ∑ m = 1 M ∑ n = 1 N m ∑ k = 1 K η m n k log ⁡ η m n k \begin{aligned} L(\gamma,\eta,\rho,\alpha,\beta) &= E_q[\log p(\theta|\alpha)] + E_q[\log p(\varphi|\beta)] + E_q[\log p(z|\theta)] + E_q[\log p(w|z,\varphi)] \\ &- E_q[\log q(\theta| \gamma)] - E_q[\log q(\varphi|\rho)] - E_q[\log q(z| \eta)] \\ & = \sum_{m=1}^M \{\log \Gamma(\sum_{k=1}^K \alpha_k) - \sum_{k=1}^K \log \Gamma(\alpha_k) + \sum_{k=1}^K (\alpha_k-1)[\Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml})]\} \\ &+ \sum_{k=1}^K \{ \log \Gamma (\sum_{v=1}^V \beta_v) - \sum_{v=1}^V \log \Gamma(\beta_v) + \sum_{v=1}^V (\beta_v -1) [\Psi(\rho_{kv} )- \Psi( \sum_{s=1}^V \rho_{ks} ) ] \}\\ & + \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{k=1}^K \eta_{mnk} \{\Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml}) \} \\ & + \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{v=1}^V \sum_{k=1}^K \eta_{nmk} w_{mn}^v [\Psi(\rho_{kv}) - \Psi(\sum_{s=1}^V \rho_{ks})] \\ & - \sum_{m=1}^M \{ \log \Gamma(\sum_{k=1}^K \gamma_{mk}) - \sum_{k=1}^K \log \Gamma (\gamma_{mk}) + \sum_{k=1}^K (\gamma_{mk} - 1) [\Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml})] \}\\ & - \sum_{k=1}^K \{\log \Gamma (\sum_{v=1}^V \rho_{kv}) - \sum_{v=1}^V \log \Gamma(\rho_{kv}) + \sum_{v=1}^V (\rho_{kv}-1) [\Psi(\rho_{kv} )- \Psi( \sum_{s=1}^V \rho_{ks} ) ] \} \\ & - \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{k=1}^K \eta_{mnk} \log \eta_{mnk} \end{aligned} L(γ,η,ρ,α,β)=Eq[logp(θα)]+Eq[logp(φβ)]+Eq[logp(zθ)]+Eq[logp(wz,φ)]Eq[logq(θγ)]Eq[logq(φρ)]Eq[logq(zη)]=m=1M{logΓ(k=1Kαk)k=1KlogΓ(αk)+k=1K(αk1)[Ψ(γmk)Ψ(l=1Kγml)]}+k=1K{logΓ(v=1Vβv)v=1VlogΓ(βv)+v=1V(βv1)[Ψ(ρkv)Ψ(s=1Vρks)]}+m=1Mn=1Nmk=1Kηmnk{Ψ(γmk)Ψ(l=1Kγml)}+m=1Mn=1Nmv=1Vk=1Kηnmkwmnv[Ψ(ρkv)Ψ(s=1Vρks)]m=1M{logΓ(k=1Kγmk)k=1KlogΓ(γmk)+k=1K(γmk1)[Ψ(γmk)Ψ(l=1Kγml)]}k=1K{logΓ(v=1Vρkv)v=1VlogΓ(ρkv)+v=1V(ρkv1)[Ψ(ρkv)Ψ(s=1Vρks)]}m=1Mn=1Nmk=1Kηmnklogηmnk

2.2 变分参数的估计
  1. γ m k \gamma_{mk} γmk:
    先抽取证据下界中含有 γ m k \gamma_{mk} γmk的项

    L ( γ m k ) = ( α k − 1 ) [ Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) + ∑ n = 1 N m η m n k { Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) − log ⁡ Γ ( ∑ k = 1 K γ m k ) + log ⁡ Γ ( γ m k ) − ( γ m k − 1 ) [ Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) ] \begin{aligned} L(\gamma_{mk}) = & (\alpha_k-1)[\Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml}) \\ & + \sum_{n=1}^{N_m} \eta_{mnk} \{\Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml}) \\ & - \log \Gamma(\sum_{k=1}^K \gamma_{mk}) + \log \Gamma (\gamma_{mk}) - (\gamma_{mk} - 1) [\Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml})] \end{aligned} L(γmk)=(αk1)[Ψ(γmk)Ψ(l=1Kγml)+n=1Nmηmnk{Ψ(γmk)Ψ(l=1Kγml)logΓ(k=1Kγmk)+logΓ(γmk)(γmk1)[Ψ(γmk)Ψ(l=1Kγml)]

    求导:

    ∂ L ( γ m k ) ∂ γ m k = ( α k − 1 ) [ Ψ ′ ( γ m k ) − Ψ ′ ( ∑ l = 1 K γ m l ) ] + ∑ n = 1 N m η m n k { Ψ ′ ( γ m k ) − Ψ ′ ( ∑ l = 1 K γ m l ) − Ψ ( ∑ l = 1 K γ m l ) + Ψ ( γ m k ) − ( γ m k − 1 ) [ Ψ ′ ( γ m k ) − Ψ ′ ( ∑ l = 1 K γ m l ) ] − Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) = [ ( α k − 1 ) + ∑ n = 1 N m η m n k − ( γ m k − 1 ) ] [ Ψ ′ ( γ m k ) − Ψ ′ ( ∑ l = 1 K γ m l ) ] \begin{aligned} \frac {\partial L(\gamma_{mk})} {\partial \gamma_{mk}} = & (\alpha_k-1)[\Psi'(\gamma_{mk}) - \Psi'(\sum_{l=1}^K \gamma_{ml})] \\ & + \sum_{n=1}^{N_m} \eta_{mnk} \{\Psi'(\gamma_{mk}) - \Psi'(\sum_{l=1}^K \gamma_{ml}) \\ & - \Psi(\sum_{l=1}^K \gamma_{ml})+ \Psi(\gamma_{mk}) \\ &- (\gamma_{mk} - 1) [\Psi'(\gamma_{mk}) - \Psi'(\sum_{l=1}^K \gamma_{ml})] \\ & - \Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml}) \\ & = [(\alpha_k-1) + \sum_{n=1}^{N_m} \eta_{mnk}- (\gamma_{mk} - 1) ] [\Psi'(\gamma_{mk}) - \Psi'(\sum_{l=1}^K \gamma_{ml})] \end{aligned} γmkL(γmk)=(αk1)[Ψ(γmk)Ψ(l=1Kγml)]+n=1Nmηmnk{Ψ(γmk)Ψ(l=1Kγml)Ψ(l=1Kγml)+Ψ(γmk)(γmk1)[Ψ(γmk)Ψ(l=1Kγml)]Ψ(γmk)Ψ(l=1Kγml)=[(αk1)+n=1Nmηmnk(γmk1)][Ψ(γmk)Ψ(l=1Kγml)]

    令上式为零,又有 Ψ ′ ( γ m k ) − Ψ ′ ( ∑ l = 1 K γ m l ) ≠ 0 \Psi'(\gamma_{mk}) - \Psi'(\sum_{l=1}^K \gamma_{ml}) \ne 0 Ψ(γmk)Ψ(l=1Kγml)=0,故

    ( α k − 1 ) + ∑ n = 1 N m η m n k − ( γ m k − 1 ) = 0 (\alpha_k-1) + \sum_{n=1}^{N_m}\eta_{mnk} - (\gamma_{mk} - 1) = 0 (αk1)+n=1Nmηmnk(γmk1)=0

    解得:

    γ m k = α k + ∑ n = 1 N m η m n k \gamma_{mk} = \alpha_k + \sum_{n=1}^{N_m} \eta_{mnk} γmk=αk+n=1Nmηmnk

  2. ρ k v \rho_{kv} ρkv:
    先抽取证据下界中含有 ρ k v \rho_{kv} ρkv的项

    L ( ρ k v ) = ( β v − 1 ) [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] + ∑ m = 1 M ∑ n = 1 N m η n m k w m n v [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] − log ⁡ Γ ( ∑ v = 1 V ρ k v ) + log ⁡ Γ ( ρ k v ) − ( ρ k v − 1 ) [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] \begin{aligned} L(\rho_{kv}) &= (\beta_v -1) [\Psi(\rho_{kv} )- \Psi( \sum_{s=1}^V \rho_{ks} ) ] \\ & + \sum_{m=1}^M \sum_{n=1}^{N_m} \eta_{nmk} w_{mn}^v [\Psi(\rho_{kv}) - \Psi(\sum_{s=1}^V \rho_{ks})] \\ & - \log \Gamma (\sum_{v=1}^V \rho_{kv}) + \log \Gamma(\rho_{kv}) - (\rho_{kv}-1) [\Psi(\rho_{kv} )- \Psi( \sum_{s=1}^V \rho_{ks} ) ] \\ \end{aligned} L(ρkv)=(βv1)[Ψ(ρkv)Ψ(s=1Vρks)]+m=1Mn=1Nmηnmkwmnv[Ψ(ρkv)Ψ(s=1Vρks)]logΓ(v=1Vρkv)+logΓ(ρkv)(ρkv1)[Ψ(ρkv)Ψ(s=1Vρks)]

    求导:

    ∂ L ( ρ k v ) ∂ ρ k v = ( β v − 1 ) [ Ψ ′ ( ρ k v ) − Ψ ′ ( ∑ s = 1 V ρ k s ) ] + ∑ m = 1 M ∑ n = 1 N m η n m k w m n v [ Ψ ′ ( ρ k v ) − Ψ ′ ( ∑ s = 1 V ρ k s ) ] − Ψ ( ∑ s = 1 V ρ k s ) + Ψ ( ρ k v ) − ( ρ k v − 1 ) [ Ψ ′ ( ρ k v ) − Ψ ′ ( ∑ s = 1 V ρ k s ) ] − Ψ ( ρ k v ) + Ψ ( ∑ s = 1 V ρ k s ) = [ ( β v − 1 ) + ∑ m = 1 M ∑ n = 1 N m η n m k w m n v − ( ρ k v − 1 ) ] [ Ψ ′ ( ρ k v ) − Ψ ′ ( ∑ s = 1 V ρ k s ) ] = 0 \begin{aligned} \frac {\partial L(\rho_{kv})} {\partial \rho_{kv}} &= (\beta_v -1) [\Psi'(\rho_{kv} )- \Psi'( \sum_{s=1}^V \rho_{ks} ) ] \\ & + \sum_{m=1}^M \sum_{n=1}^{N_m} \eta_{nmk} w_{mn}^v [\Psi'(\rho_{kv}) - \Psi'(\sum_{s=1}^V \rho_{ks})] \\ & - \Psi( \sum_{s=1}^V \rho_{ks} ) + \Psi(\rho_{kv} ) - (\rho_{kv}-1) [\Psi'(\rho_{kv} )- \Psi'( \sum_{s=1}^V \rho_{ks} ) ] \\ & - \Psi(\rho_{kv} )+ \Psi( \sum_{s=1}^V \rho_{ks} ) \\ & = [(\beta_v -1) + \sum_{m=1}^M \sum_{n=1}^{N_m} \eta_{nmk} w_{mn}^v - (\rho_{kv}-1)] [\Psi'(\rho_{kv}) - \Psi'(\sum_{s=1}^V \rho_{ks})] \\ & = 0 \end{aligned} ρkvL(ρkv)=(βv1)[Ψ(ρkv)Ψ(s=1Vρks)]+m=1Mn=1Nmηnmkwmnv[Ψ(ρkv)Ψ(s=1Vρks)]Ψ(s=1Vρks)+Ψ(ρkv)(ρkv1)[Ψ(ρkv)Ψ(s=1Vρks)]Ψ(ρkv)+Ψ(s=1Vρks)=[(βv1)+m=1Mn=1Nmηnmkwmnv(ρkv1)][Ψ(ρkv)Ψ(s=1Vρks)]=0

    因为 Ψ ′ ( ρ k v ) − Ψ ′ ( ∑ s = 1 V ρ k s ) ≠ 0 \Psi'(\rho_{kv}) - \Psi'(\sum_{s=1}^V \rho_{ks}) \ne 0 Ψ(ρkv)Ψ(s=1Vρks)=0,所以得到:

    ρ k v = β v + ∑ m = 1 M ∑ n = 1 N m η n m k w m n v \rho_{kv} = \beta_v + \sum_{m=1}^M \sum_{n=1}^{N_m} \eta_{nmk} w_{mn}^v ρkv=βv+m=1Mn=1Nmηnmkwmnv

  3. η m n k \eta_{mnk} ηmnk
    先抽取证据下界中含有 η m n k \eta_{mnk} ηmnk的项,又因为 ∑ k = 1 K η m n k = 1 \sum_{k=1}^K \eta_{mnk} = 1 k=1Kηmnk=1, 所以有约束的最优化问题拉格朗日函数为

    L ( η m n k ) = η m n k { Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) } + ∑ v = 1 V η n m k w m n v [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] − η m n k log ⁡ η m n k + λ ( η m n k − 1 ) \begin{aligned} L(\eta_{mnk}) & = \eta_{mnk} \{\Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml}) \} \\ & + \sum_{v=1}^V \eta_{nmk} w_{mn}^v [\Psi(\rho_{kv}) - \Psi(\sum_{s=1}^V \rho_{ks})] \\ & - \eta_{mnk} \log \eta_{mnk} \\ & + \lambda(\eta_{mnk} - 1) \end{aligned} L(ηmnk)=ηmnk{Ψ(γmk)Ψ(l=1Kγml)}+v=1Vηnmkwmnv[Ψ(ρkv)Ψ(s=1Vρks)]ηmnklogηmnk+λ(ηmnk1)

    求导

    ∂ L ( η m n k ) ∂ η m n k = Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) + ∑ v = 1 V w m n v Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) − 1 − log ⁡ η m n k + λ = 0 \begin{aligned} \frac {\partial L(\eta_{mnk})} {\partial \eta_{mnk}} & = \Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml}) \\ & + \sum_{v=1}^V w_{mn}^v \Psi(\rho_{kv}) - \Psi(\sum_{s=1}^V \rho_{ks}) \\ & - 1 - \log \eta_{mnk} + \lambda \\ & = 0 \end{aligned} ηmnkL(ηmnk)=Ψ(γmk)Ψ(l=1Kγml)+v=1VwmnvΨ(ρkv)Ψ(s=1Vρks)1logηmnk+λ=0

    则:

    log ⁡ η m n k = [ Ψ ( γ m k ) + ∑ v = 1 V w m n v Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ) ] + [ λ − 1 − Ψ ( ∑ l = 1 K γ m l ) ] \log \eta_{mnk} = [\Psi(\gamma_{mk}) + \sum_{v=1}^V w_{mn}^v \Psi(\rho_{kv}) - \Psi(\sum_{s=1}^V \rho_{ks})) ] + [\lambda -1 - \Psi(\sum_{l=1}^K \gamma_{ml})] logηmnk=[Ψ(γmk)+v=1VwmnvΨ(ρkv)Ψ(s=1Vρks))]+[λ1Ψ(l=1Kγml)]

    k k k变动时,后面一项不影响 η m n k \eta_{mnk} ηmnk,所以可以写作

    η m n k ⋉ exp ⁡ ( Ψ ( γ m k ) + ∑ v = 1 V w m n v Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ) \eta_{mnk} \ltimes \exp(\Psi(\gamma_{mk}) + \sum_{v=1}^V w_{mn}^v \Psi(\rho_{kv}) - \Psi(\sum_{s=1}^V \rho_{ks})) ηmnkexp(Ψ(γmk)+v=1VwmnvΨ(ρkv)Ψ(s=1Vρks))

    在具体实现过程中要对 η m n k \eta_{mnk} ηmnk归一化处理。

2.3 模型参数的估计
  1. α k \alpha_k αk
    先抽取证据下界中含有 α k \alpha_k αk的项

    L ( α k ) = ∑ m = 1 M { log ⁡ Γ ( ∑ k = 1 K α k ) − log ⁡ Γ ( α k ) + ( α k − 1 ) [ Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) ] } L(\alpha_k) = \sum_{m=1}^M \{\log \Gamma(\sum_{k=1}^K \alpha_k) - \log \Gamma(\alpha_k) + (\alpha_k-1)[\Psi(\gamma_{mk}) - \Psi(\sum_{l=1}^K \gamma_{ml})]\} L(αk)=m=1M{logΓ(k=1Kαk)logΓ(αk)+(αk1)[Ψ(γmk)Ψ(l=1Kγml)]}

    求偏导得

    ∂ L ( α k ) α k = M [ Ψ ( ∑ l = 1 K α l ) − Ψ ( α k ) ] + ∑ m = 1 M [ Ψ ( γ m k ) − Ψ ( ∑ l = 1 K γ m l ) ] \frac {\partial L(\alpha_k)} {\alpha_k} = M[\Psi(\sum_{l=1}^K \alpha_l) - \Psi(\alpha_k)] + \sum_{m=1}^M [\Psi (\gamma_{mk} ) - \Psi(\sum_{l=1}^K \gamma_{ml})] αkL(αk)=M[Ψ(l=1Kαl)Ψ(αk)]+m=1M[Ψ(γmk)Ψ(l=1Kγml)]

    这个结果是 L L L α \alpha α得梯度 g ( α ) g(\alpha) g(α)

    在对 α l \alpha_l αl求偏导得

    ∂ 2 L ∂ α k ∂ α l = M [ Ψ ′ ( ∑ l = 1 M α l ) − δ ( k , l ) Ψ ′ ( α k ) ] \frac {\partial^2 L} {\partial \alpha_k \partial \alpha_l} = M[\Psi'(\sum_{l=1}^M \alpha_l) - \delta(k, l) \Psi'(\alpha_k)] αkαl2L=M[Ψ(l=1Mαl)δ(k,l)Ψ(αk)]

    这个结果是Hessian矩阵 H ( α ) H(\alpha) H(α)
    用以下公式迭代,得到参数 α \alpha α的估计值

    α n e w = α o l d − H ( α o l d ) − 1 g ( α o l d ) \alpha_{new} = \alpha_{old} - H(\alpha_{old})^{-1} g(\alpha_{old}) αnew=αoldH(αold)1g(αold)

  2. β v \beta_v βv
    先抽取证据下界中含有 β v \beta_v βv的项

    L ( β v ) = ∑ k = 1 K { log ⁡ Γ ( ∑ s = 1 V β s ) − log ⁡ Γ ( β v ) + ( β v − 1 ) [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] } L(\beta_v) = \sum_{k=1}^K \{ \log \Gamma (\sum_{s=1}^V \beta_s) - \log \Gamma(\beta_v) + (\beta_v -1) [\Psi(\rho_{kv} )- \Psi( \sum_{s=1}^V \rho_{ks} ) ] \} L(βv)=k=1K{logΓ(s=1Vβs)logΓ(βv)+(βv1)[Ψ(ρkv)Ψ(s=1Vρks)]}

    同样的

    g ( β ) = ∂ L ∂ β k = K [ Ψ ( ∑ s = 1 V β s ) − Ψ ( β v ) ] + ∑ k = 1 K [ Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ] g(\beta) = \frac {\partial L} {\partial \beta_k} = K[\Psi(\sum_{s=1}^V \beta_s) - \Psi(\beta_v)] + \sum_{k=1}^K [\Psi(\rho_{kv} )- \Psi( \sum_{s=1}^V \rho_{ks} ) ] g(β)=βkL=K[Ψ(s=1Vβs)Ψ(βv)]+k=1K[Ψ(ρkv)Ψ(s=1Vρks)]

    H ( β ) = ∂ L ∂ β v ∂ β s = K [ Ψ ′ ( ∑ s = 1 V β s ) − δ ( v , s ) Ψ ′ ( β v ) ] H(\beta) = \frac {\partial L} {\partial \beta_v \partial \beta_s} = K[\Psi'(\sum_{s=1}^V \beta_s) - \delta(v,s) \Psi'(\beta_v)] H(β)=βvβsL=K[Ψ(s=1Vβs)δ(v,s)Ψ(βv)]

    用以下公式迭代:

    β n e w = β o l d − H ( β o l d ) − 1 g ( β o l d ) \beta_{new} = \beta_{old} - H(\beta_{old})^{-1}g(\beta_{old}) βnew=βoldH(βold)1g(βold)

变分EM算法
E步:求解变分参数

γ m k = α k + ∑ n = 1 N m η m n k \gamma_{mk} = \alpha_k + \sum_{n=1}^{N_m} \eta_{mnk} γmk=αk+n=1Nmηmnk

ρ k v = β v + ∑ m = 1 M ∑ n = 1 N m η n m k w m n v \rho_{kv} = \beta_v + \sum_{m=1}^M \sum_{n=1}^{N_m} \eta_{nmk} w_{mn}^v ρkv=βv+m=1Mn=1Nmηnmkwmnv

η m n k ⋉ exp ⁡ ( Ψ ( γ m k ) + ∑ v = 1 V w m n v Ψ ( ρ k v ) − Ψ ( ∑ s = 1 V ρ k s ) ) \eta_{mnk} \ltimes \exp(\Psi(\gamma_{mk}) +\sum_{v=1}^V w_{mn}^v \Psi(\rho_{kv}) - \Psi(\sum_{s=1}^V \rho_{ks})) ηmnkexp(Ψ(γmk)+v=1VwmnvΨ(ρkv)Ψ(s=1Vρks))

M步: 求模型参数

α n e w = α o l d − H ( α o l d ) − 1 g ( α o l d ) \alpha_{new} = \alpha_{old} - H(\alpha_{old})^{-1} g(\alpha_{old}) αnew=αoldH(αold)1g(αold)

β n e w = β o l d − H ( β o l d ) − 1 g ( β o l d ) \beta_{new} = \beta_{old} - H(\beta_{old})^{-1}g(\beta_{old}) βnew=βoldH(βold)1g(βold)

3. LDA的python实现

代码实现变量名称与理论推导参数的对应关系
:param gamma: γ \gamma γ
:param rho: ρ \rho ρ
:param eta: η \eta η
:param alpha: α \alpha α
:param beta: β \beta β

import numpy as np
from scipy.special import digamma, polygamma, gammaln
from copy import deepcopy


class LDAEM(object):
    def __init__(self, K=3):
        """
        变分EM算法实现LDA模型
        :param K: 主题个数,默认值为3
        :param M: 文档个数
        :param Nm: 第m个文档的单词个数
        :param N: 文本单词个数列表,(M,)
        :param V: 单词集合的个数
        :param tockens: 单词tockens
        :param gamma: 变分参数,对应文档主题,(M, K)
        :param rho: 变分参数,对应单词主题,(K, V)
        :param eta: 变分参数,对应文本中单词的主题(M, Nm, K)
        :param alpha: 模型参数文档主题theta的参数,(K,)
        :param beta: 模型参数单词主题varphi的参数,(V,)
        """
        self.K = K

        self.M = None
        self.V = None
        self.tockens = None
        self.N = None

        self.params = {
            'gamma': None,
            'rho': None,
            'eta': None,
            'alpha': None,
            'beta': None
        }

    def _init_params(self):
        """
        初始化参数
        :return:
        """
        # 初始化变分参数
        # 约束条件sum_k gamma_mk = 1, 即对于第m个文档主题之和为1
        gamma = np.random.dirichlet(100 * np.ones(self.K), self.M)

        # 约束条件sum_v rho_kv = 1,即对于第k个主题,所有单词之和为1
        rho = np.random.dirichlet(100 * np.ones(self.V), self.K)

        # 约束条件 sum_k eta_mnk=1,即对于第m个文档,第n个单词,文档主题概率之和为1
        eta = np.array([np.random.dirichlet(100 * np.ones(self.K), Nm) for Nm in self.N])

        # 初始化模型参数
        alpha = np.ones(self.K)
        beta = np.ones(self.V)

        self.params = {
            'gamma': gamma,
            'rho': rho,
            'eta': eta,
            'alpha': alpha,
            'beta': beta
        }

    def _update_gamma(self):
        """
         $$\gamma_{mk} = \alpha_k + \sum_{n=1}^{N_m} \eta_{mnk}$$
        :return:
        """
        alpha = deepcopy(self.params['alpha'])
        eta = deepcopy(self.params['eta'])

        # 预定义占位
        gamma = np.zeros((self.M, self.K))

        for m in range(self.M):
            gamma[m] = alpha + np.sum(eta[m], axis=0)

        # 归一化
        gamma = np.array([gamma[:, k] / np.sum(gamma, axis=1) for k in range(self.K)]).T
        return gamma

    def _update_rho(self):
        """
        $$\rho_{kv} = \beta_v + \sum_{m=1}^M \sum_{n=1}^{N_m}  \eta_{nmk} w_{mn}^v $$
        :return:
        """
        beta = deepcopy(self.params['beta'])
        eta = deepcopy(self.params['eta'])

        rho = np.zeros((self.K, self.V))

        for k in range(self.K):
            for v in range(self.V):
                sum_mn = 0
                for m in range(self.M):
                    for n in range(self.N[m]):
                        sum_mn += eta[m][n][k] * (self.texts[m][n] == self.tockens[v])
                rho[k][v] = sum_mn + beta[v]

        # 归一化
        rho = np.array([rho[k] / np.sum(rho[k]) for k in range(self.K)])
        return rho


    def _update_eta(self):
        """
        $$\eta_{mnk} \ltimes \exp(\Psi(\gamma_{mk})
        +\sum_{v=1}^V w_{mn}^v  \Psi(\rho_{kv}) - \Psi(\sum_{s=1}^V \rho_{ks}))$$
        计算中需要归一化
        :return:
        """
        gamma = deepcopy(self.params['gamma'])
        rho = deepcopy(self.params['rho'])

        eta = np.array([np.ones((n, self.K)) for n in self.N])

        for m in range(self.M):
            for n in range(self.N[m]):
                sum_k = 0
                eta_mn_k_list = np.zeros(self.K)
                for k in range(self.K):
                    sum_digamma_rho_k = 0
                    for v in range(self.V):
                        sum_digamma_rho_k += digamma(rho[k][v]) * (self.tockens[v] == self.texts[m][n])
                    a = digamma(gamma[m][k]) + sum_digamma_rho_k + digamma(np.sum(rho[k]))
                    # 为了防止指数增长中的内存溢出,限定一个阈值
                    if a > 20:
                        a = 20
                    sum_k += np.exp(a)
                    eta_mn_k_list[k] = np.exp(a)
                # 将k维度上的数据归一化后赋值
                eta[m][n] = eta_mn_k_list / sum_k

        return eta

    def _E_step(self):
        """
        更新变分参数,为了方便理解,将gamma,rho, eta分别计算
        :return:
        """
        # 更新gamma
        gamma = self._update_gamma()

        # 更新rho
        rho = self._update_rho()

        # 更新eta
        eta = self._update_eta()

        # 将结果写入
        self.params['gamma'] = gamma
        self.params['rho'] = rho
        self.params['eta'] = eta

    def _update_alpha(self, max_iter=1000, tol=0.1):
        """
        $$\alpha_{new} = \alpha_{old} - H(\alpha_{old})^{-1} g(\alpha_{old})$$
        其中g(alpha)是关于alpha的一阶导数
        H(alpha)是关于alpha的Henssian矩阵
        :return:
        """
        alpha = deepcopy(self.params['alpha'])
        gamma = deepcopy(self.params['gamma'])
        for _ in range(max_iter):
            alpha_old = alpha

            # 计算alpha的一阶导数
            # np.tile是将数据扩展
            g = self.M * (digamma(np.sum(alpha)) - digamma(alpha)) + \
                np.sum(
                    digamma(gamma) - np.tile(digamma(np.sum(gamma, axis=1)), (self.K, 1)).T,
                    axis=0
                )

            # 计算Hessen矩阵
            h = -1 * self.M * polygamma(1, alpha)
            z = self.M * polygamma(1, np.sum(alpha))
            c = np.sum(g / h) / (z ** (-1.0) + np.sum(h ** (-1.0)))

            # update alpha
            alpha = alpha - (g - c) / h

            # 设置终止条件
            if np.sqrt(np.mean(np.square(alpha - alpha_old))) < tol:
                break
        return alpha

    def _update_beta(self, max_iter=1000, tol=0.1):
        """
        $$\beta_{new} = \beta_{old} - H(\beta_{old})^{-1}g(\beta_{old})$$
        """
        beta = deepcopy(self.params['beta'])
        rho = deepcopy(self.params['rho'])

        for _ in range(max_iter):
            beta_old = beta

            g = self.K * (digamma(np.sum(beta)) - digamma(beta)) + \
                np.sum(
                    digamma(rho) - np.tile(digamma(np.sum(rho, axis=1)), (self.V, 1)).T,
                    axis=0
                )

            h = -1 * self.K * polygamma(1, beta)
            z = self.K * polygamma(1, np.sum(beta))
            c = np.sum(g / h) / (z ** (-1.0) + np.sum(h ** (-1.0)))

            beta = beta - (g - c) / h

            if np.sqrt(np.mean(np.square(beta - beta_old))) < tol:
                break
        return beta


    def _M_step(self):
        """
        更新模型参数,alpha, beta
        :return:
        """
        # 更新alpha
        alpha = self._update_alpha()

        # 更新beta
        beta = self._update_beta()

        self.params['alpha'] = alpha
        self.params['beta'] = beta

    def fit(self, texts, tokens, max_iter=10):
        """
        训练入口
        :param texts:
        :param tokens:
        :param max_iter:
        :return:
        """
        self.M = len(texts)
        self.tockens = tokens
        self.V = len(tokens)
        self.N = np.array([len(d) for d in texts])

        self.texts = texts

        self._init_params()

        for i in range(max_iter):
            print('iter: ', i + 1)

            self._E_step()
            self._M_step()



  • 10
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 6
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值