Ladder VAE

最新推荐文章于 2023-11-22 10:47:19 发布

Ghy817920

最新推荐文章于 2023-11-22 10:47:19 发布

阅读量1.1k

点赞数

分类专栏：变分自动编码机

本文链接：https://blog.csdn.net/Ghy817920/article/details/96176560

版权

变分自动编码机专栏收录该内容

9 篇文章 0 订阅

订阅专栏

在IWAE中仅仅考虑了两层的隐藏层，但是随着层数多了，高层很多都训练不好，因此提出了LVAE。首先比较下模型与IWAE的区别。各层
在这里插入图片描述
建模如下 $\begin{aligned} p_{\theta}(\mathbf{z}) &=p_{\theta}\left(\mathbf{z}_{L}\right) \prod_{i=1}^{L-1} p_{\theta}\left(\mathbf{z}_{i} | \mathbf{z}_{i+1}\right) \\ p_{\theta}\left(\mathbf{z}_{i} | \mathbf{z}_{i+1}\right) &=\mathcal{N}\left(\mathbf{z} | \mu_{p, i}\left(\mathbf{z}_{i+1}\right), \sigma_{p, i}^{2}\left(\mathbf{z}_{i+1}\right)\right), \quad p_{\theta}\left(\mathbf{z}_{L}\right)=\mathcal{N}\left(\mathbf{z}_{L} | \mathbf{0}, \mathbf{I}\right) \\ p_{\theta}\left(\mathbf{x} | \mathbf{z}_{1}\right) &=\mathcal{N}\left(\mathbf{x} | \mu_{p, 0}\left(\mathbf{z}_{1}\right), \sigma_{p, 0}^{2}\left(\mathbf{z}_{1}\right)\right) \text { or } P_{\theta}\left(\mathbf{x} | \mathbf{z}_{1}\right)=\mathcal{B}\left(\mathbf{x} | \mu_{p, 0}\left(\mathbf{z}_{1}\right)\right) \end{aligned}$ 一般的VAE的inference部分建模如下 $\begin{aligned} q_{\phi}(\mathbf{z} | \mathbf{x}) &=q_{\phi}\left(\mathbf{z}_{1} | \mathbf{x}\right) \prod_{i=2}^{L} q_{\phi}\left(\mathbf{z}_{i} | \mathbf{z}_{i-1}\right) \\ q_{\phi}\left(\mathbf{z}_{1} | \mathbf{x}\right) &=\mathcal{N}\left(\mathbf{z}_{1} | \mu_{q, 1}(\mathbf{x}), \sigma_{q, 1}^{2}(\mathbf{x})\right) \\ q_{\phi}\left(\mathbf{z}_{i} | \mathbf{z}_{i-1}\right) &=\mathcal{N}\left(\mathbf{z}_{i} | \mu_{q, i}\left(\mathbf{z}_{i-1}\right), \sigma_{q, i}^{2}\left(\mathbf{z}_{i-1}\right)\right), i=2 \ldots L \end{aligned}$ 但是这样训练会出现问题，难以优化，因此本文提出一个新的inference模型。首先用正常的神经网络得到 $\begin{aligned} \mathbf{d}_{n} &=\operatorname{MLP}\left(\mathbf{d}_{n-1}\right) \\ \hat{\mu}_{q, i} &=\operatorname{Linear}\left(\mathbf{d}_{i}\right), i=1 \ldots L \\ \hat{\sigma}_{q, i}^{2} &=\operatorname{Softplus}\left(\operatorname{Linear}\left(\mathbf{d}_{i}\right)\right), i=1 \ldots L \end{aligned}$ 其中 $\mathbf{d}_{0}=\mathbf{x}$ 。然后 $\begin{array}{c}{q_{\phi}(\mathbf{z} | \mathbf{x})=q_{\phi}\left(\mathbf{z}_{L} | \mathbf{x}\right) \prod_{i=1}^{L-1} q_{\phi}\left(\mathbf{z}_{i} | \mathbf{z}_{i+1}, \mathbf{x}\right)} \\ {\sigma_{q, i}=\frac{1}{\hat{\sigma}_{q, i}^{-2}+\sigma_{p, i}^{-2}}} \\ {\mu_{q, i}=\frac{\hat{\mu}_{q, i} \hat{\sigma}_{q, i}^{-2}+\mu_{p, i} \sigma_{p, i}^{-2}}{\hat{\sigma}_{q, i}^{-2}+\sigma_{p, i}^{-2}}} \\ {q_{\phi}\left(\mathbf{z}_{i} | \cdot\right)=\mathcal{N}\left(\mathbf{z}_{i} | \mu_{q, i}, \sigma_{q, i}^{2}\right)}\end{array}$ 其中 $\mu_{q, L}=\hat{\mu}_{q, L}$ ， $\sigma_{q, L}^{2}=\hat{\sigma}_{q, L}^{2}$ 。可以看出 $\sigma_{q, i},\mu_{q, i}$ 由生成模型（“先验”）的 $\sigma_{p,i},\mu_{p,i}$ 和（“似然”） $\hat{\sigma}_{q,i},\hat{\mu}_{q, i}$ 组成“后验”。
随着网络的加深，有些隐变量会变得uninformative，这种uninformative是从一开始就变成这样，一旦uninformative就会一直uninformative，随着训练的进行不会再激活它们。本文中先以standard deterministic auto-encoder进行参数初始化，然后 $\mathcal{L}(\theta, \phi ; \mathbf{x})_{W U}=-\beta K L\left(q_{\phi}(z | x) \| p_{\theta}(\mathbf{z})\right)+E_{q_{\phi}(z | x)}\left[\log p_{\theta}(\mathbf{x} | \mathbf{z})\right]$ 其中 $\beta$ 为前 $N_t$ 线性从0增加到1，这种方式称为warm-up（WU）。
在这里插入图片描述
从上面的效果可以看出，本文的方法能够得到更多有意义的神经元。不仅如此，可视化效果可以看出，LVAE能够学习到结构性的高层特征

Ghy817920

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Ladder VAE

在IWAE中仅仅考虑了两层的隐藏层，但是随着层数多了，高层很多都训练不好，因此提出了LVAE。首先比较下模型与IWAE的区别。各层建模如下pθ(z)=pθ(zL)∏i=1L−1pθ(zi∣zi+1)pθ(zi∣zi+1)=N(z∣μp,i(zi+1),σp,i2(zi+1)),pθ(zL)=N(zL∣0,I)pθ(x∣z1)=N(x∣μp,0(z1),σp,02(z1)) or&nb...
复制链接

扫一扫

专栏目录