10.2 高斯的变分混合(未完)

文章详细介绍了高斯混合模型的概率分布和条件概率,以及在贝叶斯框架下的变分分布和变分下界。通过引入狄利克雷分布和高斯-Wishart先验,文章阐述了如何进行参数估计和模型优化,并讨论了变分推断在模型训练过程中的作用。
摘要由CSDN通过智能技术生成

来源

PRML 中文翻译书 by 马春鹏

强烈推荐的课程:

https://www.bilibili.com/video/BV1yu411Y7Xy/?spm_id_from=333.999.0.0&vd_source=439c3cef17017b914dd03acc84a34958

高斯混合模型

对于⾼斯混合模型的似然函数:

 对于每个观测x_n,我们有⼀个对应的潜在变量z_n,它是⼀个1-of-K的⼆值向量,元素为z_{nk},其中k = 1,\cdots, K
将观测数据集记作X = \{x_1; \cdots ; x_N\},类似地,我们将潜在变量记作Z = \{z1; \cdots ; zN\}。给定混合系数\pi,根据公式(9.10),我们可以写出Z的条件概率分布,形式为

p(Z| \pi)= \prod _{n=1}^{N}\prod _{k=1}^{K}\pi _{k}^{z_{nk}} (10.37)

对于一个K为二值随机变量z, p(z_k=1)=\pi_k其中参数\{ \pi_k \}必须满足 \sum _{k=1}^{K} \pi_k=1 ,概率分布可以写为:

p(z)=\prod_{k=1}^{K}\pi_k^{z_k} (9.10)

给定潜在变量和分量参数,根据公式(9.11),我们可以写出观测数据向量的条件概率分布,形式为

p(X|Z, \mu , \Lambda)= \prod _{n=1}^{N}\prod _{k=1}^{K} \mathcal{N}(x_{n}| \mu _{k}, \Lambda _{k}^{-1})^{z_{nk}} (10.38)

其中\mu = \{ \mu_k\}\Lambda = \{ \Lambda_k \}。精度矩阵方便计算

引⼊参数\mu,\Lambda\pi上的先验概率分布。如果我们使⽤共轭先验分布,那么分析过程会得到极⼤的简化。于是,我们选择混合系数\pi上的狄利克雷分布。

p(\pi)=\mathrm{Dir}(\pi | \alpha _{0})=C(\alpha _{0})\prod _{k=1}^{K}\pi _{k}^{\alpha _{0}-1} (10.39)

根据对称性,我们为每个分量选择了同样的参数\alpha_0C(\alpha_0)是狄利克雷分布的归⼀化常数,正如我们已经看到的那样,参数\alpha_0可以看成与混合分布的每个分量关联的观测的有效先验数量。如果\alpha_0的值很⼩,那么后验概率分布会主要被数据集影响,⽽受到先验概率的影响很⼩。

引⼊⼀个独⽴的⾼斯-Wishart先验分布,控制每个⾼斯分布的均值和精度,形式为

p(\mu , \Lambda)=p(\mu | \Lambda)p(\Lambda)\\ = \prod _{k=1}^{k}N(\mu _{k}\vert m_{0},(\beta _{0}A_{k})^{-1}) \mathcal{W} (\Lambda _{k}\vert W_{0},v_{0}) (10.40)

这是由于当均值和精度均未知的时候,它表⽰共轭先验分布。通常根据对称性,我们选择m_0 = 0。这里的对称性不知道什么意思

10.2.1 变分分布

所有随机变量的联合概率分布:

p(X,Z,\pi,\mu,\Lambda)p(X|Z,\mu,\Lambda)p(Z\vert \pi)p(\pi)p(\mu \vert \Lambda)p(\Lambda) (10.41)

考虑⼀个变分分布,它可以在潜在变量与参数之间进⾏分解

q(Z, \pi , \mu , \Lambda)=q(Z)q(\pi , \mu , \Lambda) (10.42)

为了让我们的贝叶斯混合模型能够有⼀个合理的可以计算的解,这是我们需要做出的唯⼀的假设。特别地,因⼦q(Z)q(\pi , \mu , \Lambda)的函数形式会在变分分布的最优化过程中⾃动确定。

因子q(Z)

由(10.9)

\ln q^{*}(Z)=\mathbb{E}_{\pi , \mu , \Lambda}\left[ \ln p(X,Z, \pi , \mu , \Lambda)\right] + Const (10.43)

使⽤公式(10.41)给出的分解⽅式。有

\ln q^{*}(Z)=\mathbb{E}_{\pi}\left[ \ln p(Z| \pi)\right] +\mathbb{E}_{\mu , \Lambda}\left[ \ln p(X|Z, \mu , \Lambda)\right] + Const (10.44)

\ln q^{\ast}(Z)=\mathbb{E}_{\pi , \mu , \Lambda}\left[ \ln p(X,Z, \pi , \mu , \Lambda)\right] + Const \\ \because Equation (10.41) \\ = \mathbb{E}_{\pi , \mu , \Lambda} [\ln p(X|Z, \mu , \Lambda) p(Z| \pi)p(\pi)p(\mu | \Lambda)p(\Lambda)] +Const\\ =\mathbb{E}_{\pi , \mu , \Lambda}[ \ln p(X|Z, \mu , \Lambda)] + \mathbb{E}_{\pi , \mu , \Lambda}[\ln p(Z| \pi)] + \mathbb{E}_{\pi , \mu , \Lambda}[\ln p(\pi)] + \mathbb{E}_{ \pi , \mu , \Lambda}[\ln p(\mu | \Lambda)] + \mathbb{E}_{\pi , \mu , \Lambda}[\ln p(\Lambda)] +const \\ \because \,\, we \,\, only \,\, have \,\, Z\,\, in \,\, considered \\ =\mathbb{E}_{\pi , \mu , \Lambda}[\ln p(X|Z, \mu , \Lambda)] + \mathbb{E}_{\pi , \mu , \Lambda}[\ln p(Z| \pi)] +Const = \mathbb{E}_{\mu , \Lambda}[\ln p(X|Z, \mu , \Lambda)] + \mathbb{E}_{\pi}[ \ln p(Z| \pi)] +Const

对于\mathbb{E}_{\mu , \Lambda}[p(X|Z, \mu , \Lambda)]有: 

first: for \, \ln p(X|Z, \mu , \Lambda)\\ \because equation 10.38 \\ \ln p(X|Z, \mu , \Lambda) = \ln \prod _{n=1}^{N}\prod _{k=1}^{K} \mathcal{N}(x_{n}| \mu _{k}, \Lambda _{k}^{-1})^{z_{nk}}\\ = \sum_{n=1}^N\sum_{k=1}^{K} {z_{nk}}\ln \mathcal{N}(x_{n}| \mu _{k}, \Lambda _{k}^{-1})\\ then: for \, \mathbb{E}_{\mu,\Lambda}[\ln p(X|Z, \mu , \Lambda)]\\ \, we \,\, have \\ \mathbb{E}_{\mu,\Lambda}[\ln p(X|Z, \mu , \Lambda)] = \mathbb{E}_{\mu,\Lambda}[ \sum_{n=1}^N\sum_{k=1}^{K} {z_{nk}}\ln \mathcal{N}(x_{n}| \mu _{k}, \Lambda _{k}^{-1})]\\ \\ = \sum_{n=1}^N\sum_{k=1}^{K} \{z_{nk} \mathbb{E}_{\mu,\Lambda}[\ln \mathcal{N}(x_{n}| \mu _{k}, \Lambda _{k}^{-1})] \} \\ = \sum_{n=1}^N\sum_{k=1}^{K} \{z_{nk} \, \mathbb{E}_{\mu,\Lambda}[ -\frac{D}{2}\ln 2\pi +\frac{1}{2} \ln \vert\Lambda_k \vert -\frac{1}{2} (x-\mu _k)^T \Lambda_k (x-\mu_k)] \\ = \sum_{n=1}^N\sum_{k=1}^{K} \{z_{nk} \, [ -\frac{D}{2}\ln 2\pi + \frac{1}{2} \mathbb{E}_{\Lambda} [\ln \vert\Lambda_k \vert ] -\frac{1}{2} \mathbb{E}_{\mu,\Lambda} [(x-\mu _k)^T \Lambda_k (x-\mu_k)]

对于 \mathbb{E}_{\pi}[ \ln p(Z| \pi)] 来说有:

 \mathbb{E}_{\pi}[ \ln p(Z| \pi)] = \mathbb{E}_{\pi}[ \sum_{n=1}^N\sum_{k=1}^K z_{nk} \ln \pi_k] \\ = \sum_{n=1}^N\sum_{k=1}^K z_{nk} \mathbb{E}_{\pi} [\ln \pi_k]

于是有:

\ln q^{*}(Z)= \mathbb{E}_{\mu , \Lambda}[\ln p(X|Z, \mu , \Lambda)] + \mathbb{E}_{\pi}[ \ln p(Z| \pi)] +Const \\ = \sum_{n=1}^N\sum_{k=1}^{K} \{z_{nk} \, [ -\frac{D}{2}\ln 2\pi + \frac{1}{2} \mathbb{E}_{\Lambda} [\ln \vert\Lambda_k \vert ] -\frac{1}{2} \mathbb{E}_{\mu,\Lambda} [(x-\mu _k)^T \Lambda_k (x-\mu_k)] + \sum_{n=1}^N\sum_{k=1}^K z_{nk} \mathbb{E}_{\pi} [\ln \pi_k] +Const \\ = \sum_{n=1}^N\sum_{k=1}^{K} \{z_{nk} [ -\frac{D}{2}\ln 2\pi +\frac{1}{2} \mathbb{E}_{\Lambda} [\ln \vert\Lambda_k \vert ] -\frac{1}{2} \mathbb{E}_{\mu,\Lambda} [(x-\mu _k)^T \Lambda_k (x-\mu_k) + \mathbb{E}_{\pi} [\ln \pi_k] ] \} +Const (10.44.1)

 替换 Equation 10.44中右侧的两个条件分布,然后再次把与Z无关的项整合到可加性常数中,有

\ln q^{\ast}(Z)= \sum _{n=1}^{N}\sum _{k=1}^{K}z_{nk}\ln \rho _{nk}+ Const (10.45)

根据 Equation 10.44.1 有

\ln \rho _{nk}= \mathbb{E}[\ln \pi _{k}]+ \frac{1}{2} \mathbb{E}[\vert \Lambda_{k} \vert ]- \frac{D}{2}\ln(2 \pi)\\ - \frac{1}{2}\mathbb{E}_{\mu _{k},\Lambda_{k}}[(x_{n}- \mu _{k})^{T}\Lambda _{k}(x_{n}- \mu _{k})] (10.46)

其中D是数据变量x的维度。公式(10.45)两侧取指数,有

q^{\ast}(Z)\propto \prod _{n=1}^{N}\prod _{k=1}^{K}\rho _{nk}^{z_{nk}} (10.47)

这个概率分布是归⼀化的,并且我们注意到对于每个n值, z_{nk}都是⼆值的,在所有的k值上的加和等于1,因此有

q^{\ast}(Z)= \prod _{n=1}^{N}\prod _{k=1}^{K}r_{nk}^{z_{nk}}

r_{nk}= \frac{\rho _{nk}}{\sum _{j=1}^{K}\rho _{nj}}

对于离散概率分布q^{\ast}(Z),有标准的结果

\mathbb{E} \left[ z_{nk}\right] =r_{nk}

因为z_{nk}为一个one-hot变量,只有0/1的取值


z_{nk}==1 \rightarrow p=r_{nk} \quad z_{nk}==0 \rightarrow p=1-r_{nk}
 

所以有10.50

定义观测数据关于责任的三个统计量:

N_{k}= \sum _{n=1}^{N}r_{nk}    (10.51)

\overline{x}_{k}= \frac{1}{N_{k}}\sum _{n=1}^{N}r_{nk}x_{n}    (10.52)

S_{k}= \frac{1}{N_{k}}\sum _{n=1}^{N}r_{nk}(x_{n}- \overline{x}_{k})(x_{n}- \overline{x}_{k})^{T}  (10.53)

因子 q(\pi , \mu , \Lambda)

同样的:由(10.9) 有

\ln q^{\ast}(\pi , \mu , \Lambda)= \ln p(\pi)+ \sum _{k=1}^{k}\ln p(\mu _{k},\Lambda_{k})+ \mathbb{E}_{Z}\left[ \ln p(Z\vert \pi)\right] + + \sum _{k=1}^{K}\sum _{n=1}^{N}\mathbb{E} \left[ z_{nk}\right] \ln \mathcal{N}(x_{n}| \mu _{k}, \Lambda _{k}^{-1})+Const   (10.54)

\ln q^{*}(\pi , \mu , \Lambda)=\mathbb{E}_{Z}\left[ \ln p(X,Z, \pi , \mu , \Lambda)\right] + Const \\ = \mathbb{E}_{Z}\left[ \ln p(x\vert z,\mu,\Lambda) +\ln p(z\vert \pi) + \ln p(\pi) +\ln p(\mu, \Lambda) \right] + Const \\ = \mathbb{E}_{Z}\left[ \ln p(x\vert z,\mu,\Lambda) \right] + \mathbb{E}_{Z}\left[ \ln p(z\vert \pi) \right] + \mathbb{E}_{Z}\left[ \ln p(\pi) \right] + \mathbb{E}_{Z}\left[ \ln p(\mu, \Lambda) \right] +Const \\ \because \, expectation \, for \, z\\ = \mathbb{E}_{Z}\left[ \ln p(x\vert z,\mu,\Lambda) \right] + \mathbb{E}_{Z}\left[ \ln p(z\vert \pi) \right] + \ln p(\pi) + \ln p(\mu, \Lambda) +Const \\

For \mathbb{E}_{Z}\left[ \ln p(x\vert z,\mu,\Lambda) \right] we have

\mathbb{E}_{Z}\left[ \ln p(x\vert z,\mu,\Lambda) \right] = \mathbb{E}_{Z}\left[ \sum_{n=1}^N\sum_{k=1}^{K} {z_{nk}}\ln \mathcal{N}(x_{n}| \mu _{k}, \Lambda _{k}^{-1}) \right] \\ = \sum_{n=1}^N\sum_{k=1}^{K} \mathbb{E}_{Z}\left[ {z_{nk}} \right] \ln \mathcal{N}(x_{n}| \mu _{k}, \Lambda _{k}^{-1}) \\

带入有 Equation 10.54

观察到,这个10.54的右侧分解成了若⼲项的和,⼀些项只与\pi相关,⼀些项只与\mu\Lambda相关,这表明变分后验概率q(\pi, \mu , \Lambda)可以分解为q(\pi)q(\mu, \Lambda)。此外,与\mu\Lambda相关的项本⾝由k个与\mu\Lambda相关的项有关,因此可以进⼀步分解,即

q(\pi , \mu , \Lambda)= q(\pi)q(\mu , \Lambda) =q(\pi)\prod _{k=1}^{K}q(\mu _{k}, \Lambda _{k})  (10.55)

分离出公式(10.54)右侧的与\pi相关的项,我们有

\ln q^{\ast}(\pi)=(\alpha _{0}-1)\sum _{k=1}^{K}\ln \pi _{k}+ \sum _{k=1}^{K}\sum _{n=1}^{N}r_{nk}\ln \pi _{k}+ Const (10.56)

\ln q^{\ast}(\pi) = \ln p(\pi)+ \mathbb{E}_{Z}\left[ \ln p(Z\vert \pi)\right] +Const \\ for \,\, Dirichlet \, Distribution: \\ = \ln \left[ C(\alpha)\prod _{k=1}^{K}\pi _{k}^{\alpha _{0}-1} \right] + \mathbb{E}_Z \left[ \sum_{n=1}^N\sum_{k=1}^K z_{nk} \ln \pi_k \right] + Const\\ = \ln C(\alpha) + (\alpha _{0}-1) \sum_{k=1}^K \ln \pi_k + \sum_{n=1}^N\sum_{k=1}^K \pi_k \mathbb{E}_Z \ln [z_{nk}] +Const \\ = (\alpha _{0}-1) \sum_{k=1}^K \ln \pi_k + \sum_{k=1}^K \pi_k \sum_{n=1}^N\mathbb{E}_Z \ln [z_{nk}] +Const\\ = (\alpha _{0}-1) \sum_{k=1}^K \ln \pi_k + \sum_{k=1}^K \pi_k N_k +Const \quad (Equation 10.51)\\ = \sum_{k=1}^K (\alpha _{0}-1 + N_k) \ln \pi_k +Const

两侧取指数,我们将q^{\ast}(\pi)看成狄利克雷分布

q^{\ast}(\pi)=\mathrm{Dir}(\pi | \alpha)

其中\alpha的原色为\alpha_k

\alpha_k = \alpha_0 +N_k 

对于一个Dirichlet Distribution:

D(\pi \vert \alpha)=\frac{\Gamma(\sum _{k=1}^{K}\alpha _{k})}{\Gamma(\alpha _{1})\cdots \Gamma(\alpha _{K})} \prod _{k=1}^K {\pi_k^{\alpha_k-1}} \quad \ln D(\pi \vert \alpha) = \ln \frac{\Gamma(\sum _{k=1}^{K}\alpha _{k})}{\Gamma(\alpha _{1})\cdots \Gamma(\alpha _{K})} + \ln \prod _{k=1}^K {\pi_k^{\alpha_k-1}}= \ln C + \sum_{k=1}^K (\alpha_k-1)\ln \pi_k

对于 i.i.d 仅分析一个k处的 Gaussian-Wishart Distribution:

\ln q^{\ast}(\mu , \Lambda)= \sum _{k=1}^{k}\ln p(\mu _{k},\Lambda_{k})+ \sum _{k=1}^{K}\sum _{n=1}^{N}\mathbb{E} \left[ z_{nk}\right] \ln \mathcal{N}(x_{n}| \mu _{k}, \Lambda _{k}^{-1})+Const

For Wishart Distribution:

\mathcal{W}(\Lambda _{k}\vert W_{0},V_{0})=B\vert \Lambda _{k}\vert^{(v_{0}-D-1)/_{2}}\exp(- \frac{1}{2} \mathrm{Tr}(W_{0}^{-1}\Lambda _{k}))

其中B是一个归一化参数

\ln p(\mu_k, \Lambda_k) = -\frac{D}{2}\ln 2\pi +\frac{1}{2}\ln \vert \beta_0\Lambda_k \vert - \frac{\beta_0}{2}(\mu_k-m_0)^T\Lambda_k(\mu_k-m_0)+ \ln B + \frac{\nu_0 -D-1}{2} \ln \vert \Lambda_k \vert -\frac{1}{2}\mathrm{Tr}(W_0^{-1}\Lambda_k) \\

\ln q^{\ast}(\mu _{k}, \Lambda _{k}) = \ln p(\mu _{k},\Lambda_{k})+ \sum _{n=1}^{N}\mathbb{E} \left[ z_{nk}\right] \ln \mathcal{N}(x_{n}| \mu _{k}, \Lambda _{k}^{-1})+Const \\ = \sum_{n=1}^{N} r_{nk} \{ \ln \left[ (\frac{1}{2\pi})^{D/2} \sqrt{\vert \Lambda_k\vert} \exp(-\frac{1}{2}(x_n-\mu_k)^T \Lambda_k(x_n-\mu_k)) \right] \} + \\ -\frac{D}{2}\ln 2\pi +\frac{1}{2}\ln \vert \beta_0\Lambda_k \vert - \frac{\beta_0}{2}(\mu_k-m_0)^T\Lambda_k(\mu_k-m_0)+ \ln B + \frac{\nu_0 -D-1}{2} \ln \vert \Lambda_k \vert -\frac{1}{2}\mathrm{Tr}(W_0^{-1}\Lambda_k) \\ = \frac{1}{2}\sum_{n=1}^N r_{nk} \ln \vert \Lambda_k \vert +\frac{r_{nk}}{2}\sum_{n=1}^N (x_n-\mu_k)^T \Lambda_k(x_n-\mu_k) +\frac{1}{2}\ln \vert \beta_0 \Lambda_k\vert \\ -\frac{\beta_0}{2}(\mu_k-m_0)^T\Lambda_k(\mu_k-m_0) + \frac{\nu_0-D-1}{2} \ln \vert \Lambda_k\vert - \frac{1}{2}\mathrm{Tr}(W_0^{-1}\Lambda_k) +Const

关于\mu_k的二次项有:

Q_2(\mu_k) = -\frac{1}{2}\sum_{n=1}^N r_{nk}\mu_k^T\Lambda _k\mu_k -\frac{\beta_0}{2}\mu_0^{T}\Lambda_k \mu_k \\ = -\frac{1}{2} \mu_k^T (\sum_{n=1}^N r_{nk}\Lambda_k+\beta_0\Lambda_k)\mu_k\\ =-\frac{1}{2}\mu_k\Lambda_k(\beta_0+N_k)\mu_k

关于\mu_k的一次项有:

Q_1(\mu_k) = \sum_{n=1}^N r_{nk}\mu_k^T\Lambda_k x_n +\beta_0\mu_k^T \Lambda_k m_0\\ = \mu_k^T \Lambda_k (\sum_{n=1}^N r_{nk}x_n+\beta_0 m_0)\\ = \mu_k^T \Lambda_k (\beta_0 +N_k)(\frac{N_k\bar{x_k}+\beta_0m_0}{\beta_0+N_k})

\bar{x_k}=\frac{1}{N_k}\sum_{n=1}^N r_{nk}x_n

所以有 q^{\ast}(\mu_k)服从高斯分布\mathcal{N}(\mu _{k}|m_{k},(\beta _{k}\Lambda _{k})^{-1})

剩余的项只与\Lambda_k有关

\ln q^{\ast}(\Lambda _{k})= \frac{1}{2}\sum _{n=1}^{N}r _{nk}\ln \vert \Lambda _{k}\vert - \frac{1}{2}\sum _{n=1}^{N}r _{nk}x _{n}^T \Lambda_{k}x _{n}+ \frac{1}{2}\ln\vert \beta _{0}\Lambda _{k}\vert- \frac{\beta _{0}}{2}m_0 ^{T}\Lambda _{k}m_{0} + \frac{\nu_0-D-1}{2} \ln \vert \Lambda_k \vert -\frac{1}{2}\mathrm{Tr}(W_0^{-1}\Lambda_k)+C

For  - \frac{1}{2}\sum _{n=1}^{N}r _{nk}x _{n}^T \Lambda_{k}x _{n}+ \frac{1}{2}\ln\vert \beta _{0}\Lambda _{k}\vert- \frac{\beta _{0}}{2}m_0 ^{T}\Lambda _{k}m_{0} + \frac{\nu_0-D-1}{2} \ln \vert \Lambda_k \vert -\frac{1}{2}\mathrm{Tr}(W_0^{-1}\Lambda_k)

 有

-\frac{1}{2}\sum _{n=1}^{N}r _{nk}x _{n}^T \Lambda_{k}x _{n}+ \frac{1}{2}\ln\vert \beta _{0}\Lambda _{k}\vert- \frac{\beta _{0}}{2}m_0 ^{T}\Lambda _{k}m_{0} + \frac{\nu_0-D-1}{2} \ln \vert \Lambda_k \vert -\frac{1}{2}\mathrm{Tr}(W_0^{-1}\Lambda_k) \\ = -\frac{1}{2} \mathrm{Tr} (W_k^{-1}\Lambda_k)\\ W_k^{-1} = W_0^{-1}+N_k S_k+\frac{\beta_0N_k}{\beta_0 +N_k}(\bar{x}_k -m_0)(\bar{x}_k -m_0)^T

定义依旧沿用 10.51-10.53 这里还没推导

所以

\ln q^{\ast}(\Lambda _{k})= \frac{\nu_0-D-1+N_k}{2} -\frac{1}{2}\mathrm{Tr}(W_k^{-1}\Lambda_k)+C

于是 q^{\ast}(\Lambda_k)服从Wishart分布 \mathcal{W}(\Lambda_k\vert W_k, \nu_k)

10.2.2 变分下界

在实际应⽤中,能够在重新估计期间监视模型的下界是很有⽤的,这可以⽤来检测是否收敛。它也可以为解的数学表达式和它们的软件执⾏提供⼀个有价值的检查,因为在迭代重新估计的每个步骤中,这个下界的值应该不会减小。我们可以进⼀步地使⽤变分下界检查更新⽅程的数学推导和它们的软件执⾏的正确性,方法是使⽤有限差来检查每次更新确实给出了下界的⼀个(具有限制条件的)极⼤值

对于高斯分布的变分混合,下届10.3为

\mathcal{L}= \sum _{Z}\int \int \int _{}q(Z, \pi , \mu , \Lambda)\ln \left\{ \frac{p(X,Z, \pi , \mu , \Lambda)}{q(Z, \pi , \mu , \Lambda)}\right\} \mathrm{d} \pi \mathrm{d} \mu \mathrm{d}\Lambda \\ =\mathbb{E} \left[ \ln p(X,Z, \pi , \mu , \Lambda)\right] -\mathbb{E} \left[ \ln q(Z, \pi , \mu , \Lambda)\right] \\ =\mathbb{E} \left[ \ln p(X|Z, \mu , \Lambda)\right] +\mathbb{E} \left[ \ln p(Z|X)\right] +\mathbb{E} \left[ \ln p(\pi)\right] +\mathbb{E} \left[ \ln p(\mu , \Lambda)\right] \\ \quad -\mathbb{E} \left[ \ln q(Z)\right] -\mathbb{E} \left[ \ln q(\pi)\right] -\mathbb{E} \left[ \ln q(\mu , \Lambda)\right]  (10.70)

\mathbb{E}[\ln p(X|Z, \mu , \Lambda)]= \frac{1}{2}\sum _{k=1}^{K}N_{k}\{ \ln \tilde{\Lambda}_{k}-D \beta _{k}^{-1}-\nu_{k}\mathrm{Tr}(S_{k}W_{k}) -\nu_{k}(\bar{x} _{k}-m_{k})^{T}W_{k}(\bar{x} _{k}-m_{k})-D \ln(2 \pi)\}  (10.71)

中间过程见 附录

\mathbb{E} \left[ \ln p(Z\vert \pi)\right] = \sum _{n=1}^{N}\sum _{k=1}^{K}r_{nk}\ln \tilde{\pi}_{k}

\mathbb{E} \left[ \ln p(\pi)\right] = \ln C(\mathbf{\alpha} _{0})+(\alpha _{0}-1)\sum _{k=1}^{K}\ln \tilde{\pi}_{k}

\mathbb{E} \left[ \ln p(\mu , \Lambda)\right] = \frac{1}{2}\sum _{k=1}^{K} \left\{ D \ln(\frac{\beta _{0}}{2 \pi})+ \ln \tilde{\Lambda}_{k}- \frac{D \beta _{0}}{\beta _{k}} - \beta _{0}\nu _{0}(m_{k}-m_{0})^{T}W_{k}(m_{k}-m_{0}) \right\} \\ +K \ln B(W_{0}, \nu _{0})+ \frac{\nu _{0}-D-1}{2}\sum _{k=1}^{K}\ln \tilde{\Lambda}_{k}- \frac{1}{2}\sum _{k=1}^{K}\nu _{k}Tr(W_{0}^{-1}W_{k})

\mathbb{E} \left[ \ln q(Z)\right] = \sum _{n=1}^{N}\sum _{k=1}^{K}r_{nk}\ln r_{nk}

\mathbb{E} \left[ \ln q(\pi)\right] = \sum _{k=1}^{K}(\alpha _{k}-1)\ln \tilde{\pi}_{k}+ \ln C(\alpha)

\mathbb{E} \left[ \ln q(\mu , \Lambda)\right] = \sum _{k=1}^{K}\left\{ \frac{1}{2}\ln \tilde{\Lambda}_{k}+ \frac{D}{2}\ln(\frac{\beta _{k}}{2 \pi})- \frac{D}{2}-H \left[ q(\Lambda _{k})\right] \right\}

附录

  • 10.71

\mathbb{E}[\ln p(X|Z, \mu , \Lambda)] =\sum_{n=1}^{N}\sum_{k=1}^{K} \mathbb{E}_q\left[ z_{nk}\ln \mathcal{N}(x_n\vert \mu_k,\Lambda_k^{-1}) \right]\\ \quad \because \,\, z_{nk} \,\, and \,\,\ln \mathcal{N} \,\, independent \\ = \sum_{n=1}^{N}\sum_{k=1}^{K} \mathbb{E}_q\left[ z_{nk} \right] \,\,\mathbb{E}_q\left[ \ln \mathcal{N}(x_n\vert \mu_k,\Lambda_k^{-1}) \right]\\ = \sum_{n=1}^{N}\sum_{k=1}^{K} r_{nk} \, \mathbb{E}_q\left[ \ln ( \frac{1}{2\pi}^{D/2} \sqrt{\vert \Lambda_k \vert} \exp(-\frac{1}{2}(x_n-\mu_k)^T\Lambda_k(x_n-\mu_k) ) ) \right]\\ = \sum_{n=1}^{N}\sum_{k=1}^{K} r_{nk} \mathbb{E}_q\left[ -\frac{D}{2}\ln 2 \pi + \frac{1}{2}\ln \vert \Lambda_k \vert -\frac{1}{2}(x_n-\mu_k)^T\Lambda_k(x_n-\mu_k) \right] \\ = \sum_{n=1}^{N}\sum_{k=1}^{K} r_{nk} \mathbb{E}_q\left[ -\frac{D}{2}\ln 2 \pi \right] + \mathbb{E}_q\left[ \frac{1}{2}\ln \vert \Lambda_k \vert \right] -\mathbb{E}_q\left[\frac{1}{2}(x_n-\mu_k)^T\Lambda_k(x_n-\mu_k) \right] \\ \quad \because \,\, Equation \, 10.64\,\,10.65 \\ = \sum_{n=1}^{N}\sum_{k=1}^{K} \frac{r_{nk}}{2} \left[ -D\ln 2\pi +\ln \tilde{\Lambda}_k -D\beta_k^{-1} \right] - \sum_{n=1}^{N}\sum_{k=1}^{K} \frac{1}{2}\nu_kr_{nk} \left[ (x_n -m_k)^TW_k(x_n -m_k)\right] \\

\quad \because \,\, \mathrm{Tr}(vw^T)=v^Tw \\ = \sum_{n=1}^{N}\sum_{k=1}^{K} \frac{r_{nk}}{2} \left[ -D\ln 2\pi +\ln \tilde{\Lambda}_k -D\beta_k^{-1} \right] - \sum_{n=1}^{N}\sum_{k=1}^{K} \frac{1}{2}\nu_kr_{nk} \left[ \mathrm{Tr}((x_n -m_k)(x_n -m_k)^T W_k^T )\right] \\ = \sum_{n=1}^{N}\sum_{k=1}^{K} \frac{r_{nk}}{2} \left[ -D\ln 2\pi +\ln \tilde{\Lambda}_k -D\beta_k^{-1} \right] -\sum_{k=1}^{K} \frac{1}{2}\nu_k \left[ \mathrm{Tr}( \left[ \sum_{n=1}^{N} r_{nk} x_nx_n^T - \sum_{n=1}^{N} 2r_{nk} x_n^T m_k + \sum_{n=1}^{N} r_{nk}m_k^Tm_k \right] W_k^T )\right] \\ \quad \because \,\, Equation\,\, 10.51\,\, 10.52 \\ = \sum_{n=1}^{N}\sum_{k=1}^{K} \frac{r_{nk}}{2} \left[ -D\ln 2\pi +\ln \tilde{\Lambda}_k -D\beta_k^{-1} \right] -\sum_{k=1}^{K} \frac{1}{2}\nu_k \left[ \mathrm{Tr}( \left[ \sum_{n=1}^{N} r_{nk} x_nx_n^T - 2 N_k\bar{x}_k^T m_k + N_k m_k^Tm_k \right] W_k^T )\right] \\

= \sum_{n=1}^{N}\sum_{k=1}^{K} \frac{r_{nk}}{2} \left[ -D\ln 2\pi +\ln \tilde{\Lambda}_k -D\beta_k^{-1} \right] \\ \quad-\sum_{k=1}^{K} \frac{1}{2}\nu_k \left[ \mathrm{Tr}( \left[ \sum_{n=1}^{N} r_{nk} x_nx_n^T - 2 N_k\bar{x}_k^T m_k + N_k m_k^Tm_k + N_k\bar{x}_k^T\bar{x}_k - N_k\bar{x}_k^T\bar{x}_k \right] W_k^T )\right] \\ = \sum_{n=1}^{N}\sum_{k=1}^{K} \frac{r_{nk}}{2} \left[ -D\ln 2\pi +\ln \tilde{\Lambda}_k -D\beta_k^{-1} \right] \\ \quad-\sum_{k=1}^{K} \frac{1}{2}\nu_k \left[ \mathrm{Tr}( \left[ \sum_{n=1}^{N} r_{nk} x_nx_n^T - N_k\bar{x}_k^T\bar{x}_k- N_k(\bar{x}_k-m_k)^T(\bar{x}_k-m_k) \right] W_k^T )\right] \\

\quad \because \,\, \mathrm{Tr}(vw^T)=v^Tw \\ = \sum_{k=1}^{K} \frac{N_k}{2} \left[ -D\ln 2\pi +\ln \tilde{\Lambda}_k -D\beta_k^{-1} - v_k(\bar{x}_k-m_k)W_k(\bar{x}_k-m_k)^T \right] \\ \quad-\sum_{k=1}^{K} \frac{1}{2}\nu_k \left[ \mathrm{Tr}( \left[ \sum_{n=1}^{N} r_{nk} x_nx_n^T - N_k\bar{x}_k^T\bar{x}_k \right] W_k^T )\right] \\ = \sum_{k=1}^{K} \frac{N_k}{2} \left[ -D\ln 2\pi +\ln \tilde{\Lambda}_k -D\beta_k^{-1} - v_k(\bar{x}_k-m_k)W_k(\bar{x}_k-m_k)^T \right] \\

\quad-\sum_{k=1}^{K} \frac{1}{2}\nu_k \left[ \mathrm{Tr}( \left[ \sum_{n=1}^{N} r_{nk} x_nx_n^T - 2N_k\bar{x}_k^T\bar{x}_k + N_k\bar{x}_k^T\bar{x}_k \right] W_k^T )\right] \\ = \sum_{k=1}^{K} \frac{N_k}{2} \left[ -D\ln 2\pi +\ln \tilde{\Lambda}_k -D\beta_k^{-1} - v_k(\bar{x}_k-m_k)W_k(\bar{x}_k-m_k)^T \right] \\ \quad-\sum_{k=1}^{K} \frac{1}{2}\nu_k \left[ \mathrm{Tr}( \left[ \sum_{n=1}^{N} r_{nk} x_nx_n^T - 2\sum_{n=1}^{N}r_{nk}{x}_n^T\bar{x}_k + \sum_{n=1}^{N}r_{nk}\bar{x}_k^T\bar{x}_k \right] W_k^T )\right] \\ = \sum_{k=1}^{K} \frac{N_k}{2} \left[ -D\ln 2\pi +\ln \tilde{\Lambda}_k -D\beta_k^{-1} - v_k(\bar{x}_k-m_k)W_k(\bar{x}_k-m_k)^T \right] \\

\quad-\sum_{k=1}^{K} \frac{1}{2}\nu_k \left[ \mathrm{Tr}( \left[ \sum_{n=1}^{N} r_{nk} (x_nx_n^T - 2{x}_n^T\bar{x}_k + \bar{x}_k^T\bar{x}_k) \right] W_k^T )\right] \\ \quad \because \,\, Equation \, 10.53 \quad over \\

  •  10.72

\mathbb{E} \left[ \ln p(Z\vert \pi)\right] = \mathbb{E} \left[ \ln \prod_{n=1}^N\prod_{k=1}^K \pi_k^{z_{nk}} \right]\\ = \sum_{n=1}^{N}\sum_{k=1}^K\mathbb{E}\left[ z_{nk}\ln \pi_k \right]\\ = \sum _{n=1}^{N}\sum _{k=1}^{K}r_{nk}\ln \tilde{\pi}_{k} \\

  •  10.73

迪利克雷分布

\mathbb{E} \left[ \ln q(\pi)\right] =\mathbb{E} \left[ \ln \mathrm{Dir} (\pi |\alpha)\right] \\ =\mathbb{E} \left[ \ln C(\alpha_{0})\prod _{k=1}^{k}\pi _{k}^{\alpha_{0}-1}\right] \\ =\mathbb{E} \left[ \ln C(\alpha) \right]+ \sum _{k=1}^{k}(\alpha_{0}-1)\mathbb{E}(\ln \pi _{k})\\ = \ln C(\alpha_0)+ \sum _{k=1}^{k}(\alpha_{0}-1)(\ln \tilde{\pi} _{k})

  •  10.74

\mathbb{E} \left[ \ln p(\mu , \Lambda) \right] =\mathbb{E} \left[ \ln \prod _{k=1}^{k}\mathcal{N}(\mu _{k} \vert m_{0},(\beta_{k}\Lambda _{k})^{-1})W(\Lambda _{k}\vert w_{0},v_{0})\right] \\ = = \sum _{k=1}^{k}\mathbb{E} \left[ \ln \mathcal{N}(\mu _{k}|m_{0},(\beta _{0}\Lambda _{k})^{-1})+ \ln W(\Lambda _{k}| \omega _{k},\nu_{0})\right] \\ = \sum _{k=1}^K \mathbb{E} \left\{- \frac{D}{2}\ln 2\pi+ \frac{1}{2}\ln \vert \beta _{0}\Lambda _{k}\vert- \frac{1}{2}(\mu_{k}-m_{0})^T( \beta _{0}\Lambda _{k})(\mu _{k}-m_{0})+\ln B+ \frac{\nu _{0}-D-1}{2}\ln \vert \Lambda _{k}\vert -\frac{1}{2}\mathrm{Tr}[W_o^T\Lambda_k] \right\} \\ = 10.74

  • 10.75

\mathbb{E} \left[ \ln q ( Z ) \right] = \mathbb{E} \left[ \ln \prod_{n=1}^N \prod _ {k = 1 } ^ {K} r_{nk}^{z_{nk}} \right] \\ = \sum _{n=1}^{N}\sum _{k=1}^{K}\mathbb{E} \left[ z_{nk}\ln r_{nk}\right]\\ = \sum _ { n = 1 } ^ { N } \sum _ { k = 1 } ^ {K } r _ { n k } \ln r _ { n k }

  • 10.76

\mathbb{E} \left[ \ln q(\pi)\right] =\mathbb{E} \left[ \ln \mathrm{Dir} (\pi |\alpha)\right] \\ =\mathbb{E} \left[ \ln C(\alpha_{0})\prod _{k=1}^{k}\pi _{k}^{\alpha_{k}-1}\right] \\ =\mathbb{E} \left[ \ln C(\alpha) \right]+ \sum _{k=1}^{k}(\alpha_{k}-1)\mathbb{E}(\ln \pi _{k})\\ = \ln C(\alpha_0)+ \sum _{k=1}^{k}(\alpha_{k}-1)(\ln \tilde{\pi} _{k})

\mathbb{E} \left[ \ln q(u, \Lambda)\right] =\mathbb{E} \left[ \ln q(\mu | \Lambda)q(\Lambda)\right] \\ = \sum _{k=1}^{k} \mathbb{E} \left[ \ln \mathcal{N}(\mu _{k} \vert m_{k},(\beta _{k}\Lambda _{k})^{-1})+ \ln W(\Lambda _{k}\vert w_{k}, \nu _{k}) \right] \\ = \sum _{k=1}^{k} \mathbb{E} \{ - \frac{D}{2}\ln 2 \pi + \frac{D}{2}\ln \beta _{k}+ \frac{1}{2}\ln\vert \Lambda _{k}\vert- \frac{1}{2}(\mu _{k}-m_{k})^{T}(\beta _{k}\Lambda _{k})(\mu _{k}-m_{k}) \\ +\ln B ( w _ { k } , v _ { k } ) + \frac { \nu _ { k } - D - 1 } { 2 } \ln \vert \Lambda_k \vert - \frac{1}{2} \mathrm{Tr} \left[ W_{k}^{-1} \Lambda_{k} \right] \} \\ = \frac{kD}{2}\ln \frac{\beta _{k}}{2 \pi}+ \frac{1}{2}\sum _{k=1}^{k}\ln \tilde{\Lambda}_{k}- \frac{kD}{2}- \sum _{k=1}^{k}H \left[ q(\Lambda _{k})\right] \\

H \left[ q(\Lambda _{k})\right] =- \ln B -\frac{\nu _{k-D-1}}{2} \mathbb{E} \left[ \ln\vert \Lambda_k \vert \right] + \frac{\nu _{k}D}{2}

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值