变分扩散模型

1. VDM

1.1 VDM简介

VDM (Variational Diffusion Models) 基于 [[MHVAE]] 模型,但与 [[MHVAE]] 模型有3个不同:

  • 对于所有时间步 t t t:隐变量 z t \boldsymbol{z}_t zt 的维度和数据 x \boldsymbol{x} x 的维度相等,即 z t ∈ R d \boldsymbol{z}_t \in \mathbb{R}^d ztRd, x ∈ R d x \in \mathbb{R}^d xRd

  • 对于所有时间步 t t t:隐变量 z t \boldsymbol{z}_t zt 不是通过神经网络模型学习得到的,而是以前一个时间步 z t − 1 \boldsymbol{z}_{t-1} zt1 为均值的高斯分布。所以,Diffusion Models 不需要通过神经网络模型学习一个 Encoder;

  • 随着时间步 t t t 的增大,隐变量 z t \boldsymbol{z}_t zt 逐渐逼近标准正态分布,最后在第 T T T 步时 z T ∼ N ( 0 , I ) \mathbf{z}_T \sim \mathcal{N}(\mathbf{0},\mathbf{I}) zTN(0,I)(T足够大)。

对照 [[MHVAE]] 的联合概率公式 (1) ,可得 VDM 的联合概率公式 (9):

$$\begin{align}

\underbrace{p\left (x_{0:T}\right )}{\text{Joint Distribution}} = \underbrace{p\left (x_T\right )}{\text{Prior}} \prod_{t=1}^{T} \underbrace{p_{\theta}\left (x_{t-1}\mid x_t\right )}_{\text{Decoder}}

\end{align}$$

对照[[MHVAE]]的后验公式 (2),可得VDM的后验公式 (10):

$$\begin{align}

\underbrace{q\left (\boldsymbol{x}{1:T}\mid \boldsymbol{x}0\right )}{\text{Posterior Distribution}} = \prod{t=1}^{T} \underbrace{q\left (\boldsymbol{x}t\mid \boldsymbol{x}{t-1}\right )}_{\text{Encoder}}

\end{align}$$

注意[[MHVAE]]的公式和VDM的公式有以下区别:

  • q ϕ q_{\phi} qϕ 全部修改为 q q q ,因为VDM模型的Encoder不需要用神经网络建模;

  • z t \boldsymbol{z}_t zt 全部修改为 x t \boldsymbol{x}_t xt,因为在VDM中 z t \boldsymbol{z}_t zt 的维度和 x t \boldsymbol{x}_t xt 的维度相等。

  • x \boldsymbol{x} x 全部修改为 x 0 \boldsymbol{x}_0 x0

1.2 如何推导VDM的ELBo?

根据[[MHVAE]]的ELBo公式 (8) 将 q ϕ q_{\phi} qϕ 改成 q q q z t z_t zt 改成 x t x_t xt 即可得到VDM的ELBo:

$$
\begin{align}

& \log \underbrace{p(\boldsymbol{x})}_{\text{Evidence}} \

\geq & \underbrace{\mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}0\right)}\left[\log \frac{p\left(\boldsymbol{x}{0: T}\right)}{q\left(\boldsymbol{x}_{1: T} \mid \boldsymbol{x}0\right)}\right]}{\text{ELBo of VDM}} \

=& \mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}0\right)}\left[\log \frac{p\left(\boldsymbol{x}T\right) \prod{t=1}^T p{\boldsymbol{\theta}}\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}t\right)}{\prod{t=1}^T q\left(\boldsymbol{x}t \mid \boldsymbol{x}{t-1}\right)}\right] \

= & \mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}0\right)}\left[\log \frac{p\left(\boldsymbol{x}T\right) p{\boldsymbol{\theta}}\left(\boldsymbol{x}0 \mid \boldsymbol{x}1\right) \prod{t=2}^T p{\boldsymbol{\theta}}\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_t\right)}{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}0\right) \prod{t=2}^T q\left(\boldsymbol{x}t \mid \boldsymbol{x}{t-1}\right)}\right] \

= & \mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}0\right)}\left[\log \frac{p\left(\boldsymbol{x}T\right) p{\boldsymbol{\theta}}\left(\boldsymbol{x}0 \mid \boldsymbol{x}1\right) \prod{t=2}^T p{\boldsymbol{\theta}}\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_t\right)}{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}0\right) \prod{t=2}^T q\left(\boldsymbol{x}t \mid \boldsymbol{x}{t-1}, \boldsymbol{x}_0\right)}\right] \

= & \mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}0\right)}\left[\log \frac{p{\boldsymbol{\theta}}\left(\boldsymbol{x}T\right) p{\boldsymbol{\theta}}\left(\boldsymbol{x}_0 \mid \boldsymbol{x}1\right)}{q\left(\boldsymbol{x}1 \mid \boldsymbol{x}0\right)}+\log \prod{t=2}^T \frac{p{\boldsymbol{\theta}}\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_t\right)}{q\left(\boldsymbol{x}t \mid \boldsymbol{x}{t-1}, \boldsymbol{x}_0\right)}\right] \

= & \mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}_0\right)}\left[\log \frac{p\left(\boldsymbol{x}T\right) p{\boldsymbol{\theta}}\left(\boldsymbol{x}_0 \mid \boldsymbol{x}1\right)}{q\left(\boldsymbol{x}1 \mid \boldsymbol{x}0\right)}+\log \prod{t=2}^T \frac{p{\boldsymbol{\theta}}\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}t\right)}{\frac{q\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_t, \boldsymbol{x}_0\right) q\left(\boldsymbol{x}_t \mid \boldsymbol{x}0\right)}{q\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_0\right)}}\right] \

= & \mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}_0\right)}\left[\log \frac{p\left(\boldsymbol{x}T\right) p{\boldsymbol{\theta}}\left(\boldsymbol{x}_0 \mid \boldsymbol{x}1\right)}{q\left(\boldsymbol{x}1 \mid \boldsymbol{x}0\right)}+\log \prod{t=2}^T \frac{p{\boldsymbol{\theta}}\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}t\right)}{\frac{q\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_t, \boldsymbol{x}_0\right) q\left(\boldsymbol{x}_t \mid \boldsymbol{x}0\right)}{q\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_0\right)}}\right] \

= & \mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}_0\right)}\left[\log \frac{p\left(\boldsymbol{x}T\right) p{\boldsymbol{\theta}}\left(\boldsymbol{x}_0 \mid \boldsymbol{x}_1\right)}{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}_0\right)}+\log \frac{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}0\right)}{q\left(\boldsymbol{x}T \mid \boldsymbol{x}0\right)}+\log \prod{t=2}^T \frac{p{\boldsymbol{\theta}}\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}t\right)}{q\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_t, \boldsymbol{x}_0\right)}\right] \

= & \mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}_0\right)}\left[\log \frac{p\left(\boldsymbol{x}T\right) p{\boldsymbol{\theta}}\left(\boldsymbol{x}_0 \mid \boldsymbol{x}1\right)}{q\left(\boldsymbol{x}T \mid \boldsymbol{x}0\right)}+\sum{t=2}^T \log \frac{p{\boldsymbol{\theta}}\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}t\right)}{q\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_t, \boldsymbol{x}_0\right)}\right] \

= & \mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}0\right)}\left[\log p{\boldsymbol{\theta}}\left(\boldsymbol{x}0 \mid \boldsymbol{x}1\right)\right]+\mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}0\right)}\left[\log \frac{p\left(\boldsymbol{x}T\right)}{q\left(\boldsymbol{x}T \mid \boldsymbol{x}0\right)}\right]+\sum{t=2}^T \mathbb{E}{q\left(\boldsymbol{x}{1: T} \mid \boldsymbol{x}0\right)}\left[\log \frac{p{\boldsymbol{\theta}}\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}t\right)}{q\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_t, \boldsymbol{x}_0\right)}\right] \

= & \mathbb{E}_{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}0\right)}\left[\log p{\boldsymbol{\theta}}\left(\boldsymbol{x}_0 \mid \boldsymbol{x}1\right)\right]+\mathbb{E}{q\left(\boldsymbol{x}_T \mid \boldsymbol{x}_0\right)}\left[\log \frac{p\left(\boldsymbol{x}T\right)}{q\left(\boldsymbol{x}T \mid \boldsymbol{x}0\right)}\right]+\sum{t=2}^T \mathbb{E}{q\left(\boldsymbol{x}t, \boldsymbol{x}{t-1} \mid \boldsymbol{x}0\right)}\left[\log \frac{p{\boldsymbol{\theta}}\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}t\right)}{q\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_t, \boldsymbol{x}_0\right)}\right] \

= & \underbrace{\underbrace{\mathbb{E}{q\left(\boldsymbol{x}1 \mid \boldsymbol{x}0\right)}\left[\log p{\boldsymbol{\theta}}\left(\boldsymbol{x}0 \mid \boldsymbol{x}1\right)\right]}{x_0\approx{x1}}}{\text{reconstruction term} \color{red}{\approx 0}}-\underbrace{D{\mathrm{KL}}\left(\underbrace{q\left(\boldsymbol{x}T \mid \boldsymbol{x}0\right)}{\approx N(0, I)} \parallel \underbrace{p\left(\boldsymbol{x}T\right)}{=N(0, I)}\right)}{\text{prior matching term}\color{red}{\approx 0}} -\underbrace{\sum{t=2}^T \underbrace{\mathbb{E}_{q\left(\boldsymbol{x}_t \mid \boldsymbol{x}_0\right)}\left[

D_{\mathrm{KL}}\left(\underbrace{q\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}t, \boldsymbol{x}0\right)}{\color{red}{\text {complexity posterior}}} \parallel \underbrace{p{\boldsymbol{\theta}}\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}t\right)}{\textcolor{green}{\text{Decoder of VDM}}}\right)

\right]}{\text {denoising matching term }}}{\textbf{Objective function to optimize}}

\end{align}
$$

1.3 如何从ELBo推导VDM的目标函数?

1.3.1 重建项 (reconstruction term)

结论:可以通过蒙特卡罗估计计算,但是真实情况是当 T T T 比较大时 x 0 ≈ x 1 \boldsymbol{x}_0 \approx \boldsymbol{x}_1 x0x1,可忽略不计。

$$\begin{align}

\underbrace{\underbrace{\mathbb{E}_{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}0\right)}\left[\log p{\boldsymbol{\theta}}\left(\boldsymbol{x}0 \mid \boldsymbol{x}1\right)\right]}{x_0\approx{x1}}}{\text{reconstruction term}} \approx 0

\end{align}$$

1.3.2 先验匹配项 (prior matching term)

在 VDM 中,我们有如下假设:

$$\begin{align}

\underbrace{q\left(\boldsymbol{x}t \mid \boldsymbol{x}{t-1} \right)}{\textcolor{red}{\text{Encoder of VDM}}} & =\mathcal{N}\left(\boldsymbol{x}t ; \sqrt{\alpha_t} \boldsymbol{x}{t-1},\left(1-\alpha_t\right) \mathbf{I}\right) \quad \text{ with } \boldsymbol{\alpha{t} \in (0, 1)}

\end{align}$$

利用重参数化技巧,可得:

$$\begin{align}

\boldsymbol{x}t & =\sqrt{\alpha_t} \boldsymbol{x}{t-1}+\sqrt{1-\alpha_t} \boldsymbol{\epsilon} \quad \text { with } \boldsymbol{\epsilon} \sim \mathcal{N}(\boldsymbol{\epsilon} ; \mathbf{0}, \mathbf{I}) \text{, }\boldsymbol{\alpha_{t} \in (0, 1)}

\end{align}$$
基于重参数化技巧继续推导,可得:

$$

\begin{align}

\boldsymbol{x}t & =\sqrt{\alpha_t} \boldsymbol{x}{t-1}+\sqrt{1-\alpha_t} \boldsymbol{\epsilon}_{t-1}^* \

& =\sqrt{\alpha_t}\left(\sqrt{\alpha_{t-1}} \boldsymbol{x}{t-2}+\sqrt{1-\alpha{t-1}} \epsilon_{t-2}^\right)+\sqrt{1-\alpha_t} \boldsymbol{\epsilon}_{t-1}^ \

& =\sqrt{\alpha_t \alpha_{t-1}} \boldsymbol{x}{t-2}+\sqrt{\alpha_t-\alpha_t \alpha{t-1}} \boldsymbol{\epsilon}{t-2}^*+\sqrt{1-\alpha_t} \boldsymbol{\epsilon}{t-1}^* \

& =\sqrt{\alpha_t \alpha_{t-1}} \boldsymbol{x}{t-2}+\sqrt{{\sqrt{\alpha_t-\alpha_t \alpha{t-1}}}^2+\sqrt{1-\alpha_t}} \boldsymbol{\epsilon}{t-2} \quad \text{(apply } \boldsymbol{\lbrace \epsilon_t^*,\epsilon_t \rbrace{t=0}^{T}\overset{iid}{\sim}\mathcal{N}\boldsymbol{(\epsilon; \mathbf{0}, \mathbf{I})}}\text{)}\

& =\sqrt{\alpha_t \alpha_{t-1}} \boldsymbol{x}{t-2}+\sqrt{\alpha_t-\alpha_t \alpha{t-1}+1-\alpha_t} \boldsymbol{\epsilon}_{t-2} \

& =\sqrt{\alpha_t \alpha_{t-1}} \boldsymbol{x}{t-2}+\sqrt{1-\alpha_t \alpha{t-1}} \boldsymbol{\epsilon}_{t-2} \

& =\ldots \

& =\sqrt{\prod_{i=1}^t \alpha_i} \boldsymbol{x}0+\sqrt{1-\prod{i=1}^t \alpha_i \boldsymbol{\epsilon}0} \quad \text{ with } \boldsymbol{\epsilon}{0} \sim \mathcal{N}(\boldsymbol{\epsilon}; \mathbf{0}, \mathbf{I}) \

& =\sqrt{\bar{\alpha}_t} \boldsymbol{x}_0+\sqrt{1-\bar{\alpha}_t} \boldsymbol{\epsilon}0 \quad \text{ with } \boldsymbol{\epsilon}{0} \sim \mathcal{N}(\boldsymbol{\epsilon}; \mathbf{0}, \mathbf{I}) \

& \sim \mathcal{N}\left(\boldsymbol{x}_t ; \sqrt{\bar{\alpha}_t} \boldsymbol{x}_0,\left(1-\bar{\alpha}_t\right) \mathbf{I}\right)

\end{align}

$$

可得:

$$

q(\boldsymbol{x}_t) = \mathcal{N}\left(\boldsymbol{x}_t ; \sqrt{\bar{\alpha}_t} \boldsymbol{x}_0,\left(1-\bar{\alpha}_t\right) \mathbf{I}\right)

$$

由马尔可夫性:

$$\begin{align}

q(\boldsymbol{x}_t \mid \boldsymbol{x}_0) = q(\boldsymbol{x}_t) = \mathcal{N}\left(\boldsymbol{x}_t ; \sqrt{\bar{\alpha}_t} \boldsymbol{x}_0,\left(1-\bar{\alpha}_t\right) \mathbf{I}\right)

\end{align}$$
T T T 足够大,比如 T = 1000 T = 1000 T=1000

$$

\alpha_t \in (0, 1) \implies \bar{\alpha}_T \approx 0

$$

所以:

$$

\boldsymbol{q}(\boldsymbol{x}_T \mid \boldsymbol{x}_0) \approx \mathcal{N}\left(\mathbf{0}, \mathbf{I}\right)

$$

结论:先验匹配项可以忽略不计:

$$

\underbrace{D_{\mathrm{KL}}\left(\underbrace{q\left(\boldsymbol{x}T \mid \boldsymbol{x}0\right)}{\approx N(0, I)} \parallel \underbrace{p\left(\boldsymbol{x}T\right)}{=N(0, I)}\right)}{\text{prior matching term}} \approx 0

$$

1.3.3 降噪匹配项 (denoising matching term)

下面对降噪匹配项中的子式分别进行推导:

$$

\begin{align}

& \underbrace{q\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_t, \boldsymbol{x}0\right) }{\text{complexity posterior}} \

= & \frac{q\left(\boldsymbol{x}t \mid \boldsymbol{x}{t-1}, \boldsymbol{x}0\right) q\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}_0\right)}{q\left(\boldsymbol{x}_t \mid \boldsymbol{x}_0\right)} \

= & \frac{\overbrace{\mathcal{N}\left(\boldsymbol{x}t ; \sqrt{\alpha_t} \boldsymbol{x}{t-1},\left(1-\alpha_t\right) \mathbf{I}\right)}^{\text{apply Markov Property in Eq.(25)}} \overbrace{\mathcal{N}\left(\boldsymbol{x}{t-1} ; \sqrt{\bar{\alpha}{t-1}} \boldsymbol{x}0,\left(1-\bar{\alpha}{t-1}\right) \mathbf{I}\right)}^{\text{apply Eq.(37)}}}{\underbrace{\mathcal{N}\left(\boldsymbol{x}_t ; \sqrt{\bar{\alpha}_t} \boldsymbol{x}_0,\left(1-\bar{\alpha}t\right) \mathbf{I}\right)}{\text{apply Eq.(37)}}} \

\propto & \exp \left{-\left[\frac{\left(\boldsymbol{x}t-\sqrt{\alpha_t} \boldsymbol{x}{t-1}\right)^2}{2\left(1-\alpha_t\right)}+\frac{\left(\boldsymbol{x}{t-1}-\sqrt{\bar{\alpha}{t-1}} \boldsymbol{x}0\right)^2}{2\left(1-\bar{\alpha}{t-1}\right)}-\frac{\left(\boldsymbol{x}_t-\sqrt{\bar{\alpha}_t} \boldsymbol{x}_0\right)^2}{2\left(1-\bar{\alpha}_t\right)}\right]\right} \

= & \exp \left{-\frac{1}{2}\left[\frac{\left(\boldsymbol{x}t-\sqrt{\alpha_t} \boldsymbol{x}{t-1}\right)^2}{1-\alpha_t}+\frac{\left(\boldsymbol{x}{t-1}-\sqrt{\bar{\alpha}{t-1}} \boldsymbol{x}0\right)^2}{1-\bar{\alpha}{t-1}}-\frac{\left(\boldsymbol{x}_t-\sqrt{\bar{\alpha}_t} \boldsymbol{x}_0\right)^2}{1-\bar{\alpha}_t}\right]\right} \

= & \exp \left{-\frac{1}{2}\left[\frac{\left(-2 \sqrt{\alpha_t} \boldsymbol{x}t \boldsymbol{x}{t-1}+\alpha_t \boldsymbol{x}{t-1}2\right)}{1-\alpha_t}+\frac{\left(\boldsymbol{x}_{t-1}2-2 \sqrt{\bar{\alpha}{t-1}} \boldsymbol{x}_{t-1} \boldsymbol{x}0\right)}{1-\bar{\alpha}{t-1}}+C\left(\boldsymbol{x}_t, \boldsymbol{x}_0\right)\right]\right} \

\propto & \exp \left{-\frac{1}{2}\left[-\frac{2 \sqrt{\alpha_t} \boldsymbol{x}t \boldsymbol{x}{t-1}}{1-\alpha_t}+\frac{\alpha_t \boldsymbol{x}{t-1}2}{1-\alpha_t}+\frac{\boldsymbol{x}_{t-1}2}{1-\bar{\alpha}{t-1}}-\frac{2 \sqrt{\bar{\alpha}{t-1}} \boldsymbol{x}{t-1} \boldsymbol{x}0}{1-\bar{\alpha}{t-1}}\right]\right} \

= & \exp \left{-\frac{1}{2}\left[\left(\frac{\alpha_t}{1-\alpha_t}+\frac{1}{1-\bar{\alpha}{t-1}}\right) \boldsymbol{x}{t-1}^2-2\left(\frac{\sqrt{\alpha_t} \boldsymbol{x}t}{1-\alpha_t}+\frac{\sqrt{\bar{\alpha}{t-1}} \boldsymbol{x}0}{1-\bar{\alpha}{t-1}}\right) \boldsymbol{x}_{t-1}\right]\right} \

= & \exp \left{-\frac{1}{2}\left[\frac{\alpha_t\left(1-\bar{\alpha}{t-1}\right)+1-\alpha_t}{\left(1-\alpha_t\right)\left(1-\bar{\alpha}{t-1}\right)} \boldsymbol{x}{t-1}^2-2\left(\frac{\sqrt{\alpha_t} \boldsymbol{x}t}{1-\alpha_t}+\frac{\sqrt{\bar{\alpha}{t-1}} \boldsymbol{x}0}{1-\bar{\alpha}{t-1}}\right) \boldsymbol{x}{t-1}\right]\right} \

= & \exp \left{-\frac{1}{2}\left[\frac{\alpha_t-\bar{\alpha}t+1-\alpha_t}{\left(1-\alpha_t\right)\left(1-\bar{\alpha}{t-1}\right)} \boldsymbol{x}{t-1}^2-2\left(\frac{\sqrt{\alpha_t} \boldsymbol{x}t}{1-\alpha_t}+\frac{\sqrt{\bar{\alpha}{t-1}} \boldsymbol{x}0}{1-\bar{\alpha}{t-1}}\right) \boldsymbol{x}{t-1}\right]\right} \

= & \exp \left{-\frac{1}{2}\left[\frac{1-\bar{\alpha}t}{\left(1-\alpha_t\right)\left(1-\bar{\alpha}{t-1}\right)} \boldsymbol{x}{t-1}^2-2\left(\frac{\sqrt{\alpha_t} \boldsymbol{x}t}{1-\alpha_t}+\frac{\sqrt{\bar{\alpha}{t-1}} \boldsymbol{x}0}{1-\bar{\alpha}{t-1}}\right) \boldsymbol{x}{t-1}\right]\right} \

= & \exp \left{-\frac{1}{2}\left(\frac{1-\bar{\alpha}t}{\left(1-\alpha_t\right)\left(1-\bar{\alpha}{t-1}\right)}\right)\left[\boldsymbol{x}{t-1}^2-2 \frac{\left(\frac{\sqrt{\alpha_t} \boldsymbol{x}t}{1-\alpha_t}+\frac{\sqrt{\bar{\alpha}{t-1}} \boldsymbol{x}0}{1-\bar{\alpha}{t-1}}\right)}{\frac{1-\bar{\alpha}t}{\left(1-\alpha_t\right)\left(1-\bar{\alpha}{t-1}\right)}} \boldsymbol{x}{t-1}\right]\right} \

= & \exp \left{-\frac{1}{2}\left(\frac{1-\bar{\alpha}t}{\left(1-\alpha_t\right)\left(1-\bar{\alpha}{t-1}\right)}\right)\left[\boldsymbol{x}{t-1}^2-2 \frac{\left(\frac{\sqrt{\alpha_t} \boldsymbol{x}t}{1-\alpha_t}+\frac{\sqrt{\bar{\alpha}{t-1}} \boldsymbol{x}0}{1-\bar{\alpha}{t-1}}\right)\left(1-\alpha_t\right)\left(1-\bar{\alpha}{t-1}\right)}{1-\bar{\alpha}t} \boldsymbol{x}{t-1}\right]\right} \

= & \exp \left{-\frac{1}{2}\left(\frac{1}{\frac{\left(1-\alpha_t\right)\left(1-\bar{\alpha}{t-1}\right)}{1-\bar{\alpha}t}}\right)\left[\boldsymbol{x}{t-1}^2-2 \frac{\sqrt{\alpha_t}\left(1-\bar{\alpha}{t-1}\right) \boldsymbol{x}t+\sqrt{\bar{\alpha}{t-1}}\left(1-\alpha_t\right) \boldsymbol{x}_0}{1-\bar{\alpha}t} \boldsymbol{x}{t-1}\right]\right} \

\propto & \mathcal{N}(\boldsymbol{x}{t-1} ; \underbrace{\frac{\sqrt{\alpha_t}\left(1-\bar{\alpha}{t-1}\right) \boldsymbol{x}t+\sqrt{\bar{\alpha}{t-1}}\left(1-\alpha_t\right) \boldsymbol{x}_0}{1-\bar{\alpha}t}}{\mu_q\left(\boldsymbol{x}_t, \boldsymbol{x}0\right)}, \underbrace{\left.\frac{\left(1-\alpha_t\right)\left(1-\bar{\alpha}{t-1}\right)}{1-\bar{\alpha}t} \mathbf{I}\right)}{\boldsymbol{\Sigma}_q(t)}

\end{align}

$$

参考式 (52) ,试图将 p θ ( x t − 1 ∣ x t ) p_{\theta}(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_t) pθ(xt1xt) 也改写为正态分布的形式,令方差与式 (52) 相等:

$$\begin{align}

p_{\theta}(\boldsymbol{x}{t-1} \mid \boldsymbol{x}t) = \mathcal{N}(\boldsymbol{x}{t-1}; \underbrace{\mu{\theta}(\boldsymbol{x}t, t)}{\text{learned by model}}, \Sigma_{q}(t))

\end{align}$$

参考式 (52) 中的 μ q ( x t , x 0 ) \mu_{q}(\boldsymbol{x}_t, \boldsymbol{x}_0) μq(xt,x0),可得:

$$\begin{align}

\boldsymbol{\mu}_{\boldsymbol{\theta}}\left(\boldsymbol{x}_t, t\right)

=\frac{\sqrt{\alpha_t}\left(1-\bar{\alpha}{t-1}\right) \boldsymbol{x}t+\sqrt{\bar{\alpha}{t-1}}\left(1-\alpha_t\right) \overbrace{\hat{\boldsymbol{x}}{\boldsymbol{\theta}}\left(\boldsymbol{x}_t, t\right)}^{\text{learned by model}}}{1-\bar{\alpha}_t}

\end{align}$$

2.3.4 VDM的目标函数

公式 (23) 中的重建项和先验匹配项小到可以忽略不计,可得目标函数:

$$\begin{align}

&\operatorname{arg}\max \log{p(\boldsymbol{x})} \

\propto &\operatorname{arg}\min_{\theta} \underbrace{\sum_{t=2}^T \underbrace{\mathbb{E}_{q\left(\boldsymbol{x}_t \mid \boldsymbol{x}_0\right)}\left[

D_{\mathrm{KL}}\left(\underbrace{q\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}t, \boldsymbol{x}0\right)}{\color{red}{\text {complexity posterior}}} \parallel \underbrace{p{\boldsymbol{\theta}}\left(\boldsymbol{x}{t-1} \mid \boldsymbol{x}t\right)}{\textcolor{red}{\text{Decoder of VDM}}}\right)

\right]}{\text {denoising matching term }}}{\textbf{Objective function to optimize}}

\end{align} 根据公式 ( 56 ) , V D M 的目标函数为: 根据公式 (56) ,VDM的目标函数为: 根据公式(56)VDM的目标函数为:\begin{align}

&\operatorname{arg}\min \sum_{t=2}^{T}\mathbb{E}{q\left(\boldsymbol{x}t \mid \boldsymbol{x}0\right)}\left[ D{\text{KL}}(q(\boldsymbol{x}{t-1}|\boldsymbol{x}{t}, \boldsymbol{x}{0})\parallel p{\theta}(\boldsymbol{x}{t-1}|\boldsymbol{x}{t}))\right] \

=& \operatorname{arg}\min\sum_{t=2}^{T}\mathbb{E}{q\left(\boldsymbol{x}t \mid \boldsymbol{x}0\right)}\left[ D{\operatorname{KL}}(\mathcal{N}(\boldsymbol{x}{t-1}; \boldsymbol{\mu}{q}(\boldsymbol{x}t, \boldsymbol{x}0), \boldsymbol{\Sigma}{q}(t)) \parallel \mathcal{N}(\boldsymbol{x}{t-1}; \boldsymbol{\mu}_{\theta}(\boldsymbol{x}t,\boldsymbol{t}), \boldsymbol{\Sigma}{q}(t)))\right] \

=& \operatorname{arg}\min\sum_{t=2}^{T}\mathbb{E}_{q\left(\boldsymbol{x}_t \mid \boldsymbol{x}_0\right)}\left[ \dfrac{1}{2}[\log\dfrac{|\boldsymbol{\Sigma}_q(t)|}{|\boldsymbol{\Sigma}_q(t)|}-d+\operatorname{tr}(\boldsymbol{\Sigma}_q(t)^{-1}\boldsymbol{\Sigma}q(t))+(\boldsymbol{\mu}{\theta}(\boldsymbol{x}t, \boldsymbol{t})-\boldsymbol{\mu}{q}(\boldsymbol{x}_t, \boldsymbol{x}0))T\boldsymbol{\Sigma}_q(t){-1}(\boldsymbol{\mu}{\theta}(\boldsymbol{x}t, \boldsymbol{t})-\boldsymbol{\mu}{q}(\boldsymbol{x}_t, \boldsymbol{x}_0))]\right] \quad \text{(apply )}\

=& \operatorname{arg}\min\sum_{t=2}^{T}\mathbb{E}_{q\left(\boldsymbol{x}_t \mid \boldsymbol{x}0\right)}\left[ \dfrac{1}{2}\left[\log1-d+d+(\boldsymbol{\mu}{\theta}(\boldsymbol{x}t, \boldsymbol{t})-\boldsymbol{\mu}{q}(\boldsymbol{x}_t, \boldsymbol{x}0))T\sum_q(t){-1}(\boldsymbol{\mu}{\theta}(\boldsymbol{x}t, \boldsymbol{t})-\boldsymbol{\mu}{q}(\boldsymbol{x}_t, \boldsymbol{x}_0))\right]\right] \

=& \operatorname{arg}\min_{\theta} \sum_{t=2}^{T}\mathbb{E}_{q\left(\boldsymbol{x}_t \mid \boldsymbol{x}0\right)}\left[ \frac{1}{2}\left[\left(\boldsymbol{\mu}{\theta}(\boldsymbol{x}t, \boldsymbol{t})-\boldsymbol{\mu}{q}(\boldsymbol{x}_t, \boldsymbol{x}0)\right){T}\sum_{q}(t){-1}\left(\boldsymbol{\mu}{\theta}(\boldsymbol{x}t, \boldsymbol{t})-\boldsymbol{\mu}{q}(\boldsymbol{x}_t, \boldsymbol{x}_0)\right)\right]\right] \

=& \operatorname{arg}\min\limits_{\theta} \sum_{t=2}^{T}\mathbb{E}_{q\left(\boldsymbol{x}_t \mid \boldsymbol{x}0\right)}\left[ \dfrac{1}{2}\left[\left(\boldsymbol{\mu}{\theta}(\boldsymbol{x}t,\boldsymbol{t})-\boldsymbol{\mu}{q}(\boldsymbol{x}_t, \boldsymbol{x}0)\right){T}\left(\sigma_{q}{2}(t)I\right)^{-1}\left(\boldsymbol{\mu}{\theta}(\boldsymbol{x}t,\boldsymbol{t})-\boldsymbol{\mu}{q}(\boldsymbol{x}_t, \boldsymbol{x}_0)\right)\right]\right] \

=& \operatorname{arg}\min_{\theta}\sum_{t=2}^{T}\mathbb{E}_{q\left(\boldsymbol{x}t \mid \boldsymbol{x}0\right)}\left[ \frac{1}{2\sigma{q}^{2}(t)}\left[\left|\boldsymbol{\mu}{\theta}(\boldsymbol{x}t, \boldsymbol{t})-\boldsymbol{\mu}{q}(\boldsymbol{x}_t, \boldsymbol{x}0)\right|{2}^{2}\right] \right] \

=& \operatorname{arg}\min_{\theta} \sum_{t=2}^{T}\mathbb{E}{q\left(\boldsymbol{x}t \mid \boldsymbol{x}0\right)}\left[ \frac{1}{2\sigma{q}^{2}(t)}\left[\left| \underbrace{\frac{\sqrt{\alpha_t}\left(1-\bar{\alpha}{t-1}\right) \boldsymbol{x}t+\sqrt{\bar{\alpha}{t-1}}\left(1-\alpha_t\right) \overbrace{\hat{\boldsymbol{x}}{\boldsymbol{\theta}}\left(\boldsymbol{x}t, t\right)}^{\text{learned by model}}}{1-\bar{\alpha}t}}{\mu{\theta} \text{ apply Eq.(55)}} - \underbrace{\frac{\sqrt{\alpha_t}\left(1-\bar{\alpha}{t-1}\right) \boldsymbol{x}t+\sqrt{\bar{\alpha}{t-1}}\left(1-\alpha_t\right) \boldsymbol{x}0}{1-\bar{\alpha}t}}{\mu{q} \text{ apply Eq.(53)}} \right|{2}^{2}\right] \right] \

=& \operatorname{arg}\min_{\theta} \sum_{t=2}^{T}\mathbb{E}{q\left(\boldsymbol{x}t \mid \boldsymbol{x}0\right)} \left[ \frac{1}{2\sigma{q}^{2}(t)} \left[ \left| \frac{\sqrt{\bar{\alpha}{t-1}}(1-\alpha{t})\hat{x}{\theta}(x_t,t)}{1-\bar{\alpha}t} - \frac{\sqrt{\bar{\alpha}{t-1}}(1-\alpha{t})x_0}{1-\bar{\alpha}t} \right|{2}^{2} \right] \right] \

=& \operatorname{arg}\min_{\theta} \sum_{t=2}^{T}\mathbb{E}{q\left(\boldsymbol{x}t \mid \boldsymbol{x}0\right)} \left[ \frac{1}{2\sigma{q}^{2}(t)} \left[ \left| \frac{\sqrt{\bar{\alpha}{t-1}}(1-\alpha{t})}{1-\bar{\alpha}t} (\hat{x}{\theta}(x_t,t) - x_0) \right|_{2}^{2} \right] \right] \

=& \underbrace{\operatorname{arg}\min_{\theta} \sum_{t=2}^{T}\mathbb{E}{q\left(\boldsymbol{x}t \mid \boldsymbol{x}0\right)} \left[ \frac{1}{2\sigma{q}^{2}(t)} \frac{\bar{\alpha}{t-1}(1-\alpha{t})2}{(1-\bar{\alpha}_t)2} \left[ \left| \hat{x}{\theta}(x_t,t) - x_0 \right|{2}^{2} \right] \right]}_{\text{Objective function of Diffusion Model}}

\end{align}$$
式 (67) 采用蒙特卡洛估计,可得:

$$\begin{align}

\operatorname{arg}\min_{\theta} \mathbb{E}{t\sim \mathbf{U}(2,T)} \left [ \mathbb{E}{q\left(\boldsymbol{x}t \mid \boldsymbol{x}0\right)} \left[ \frac{1}{2\sigma{q}^{2}(t)} \frac{\bar{\alpha}{t-1}(1-\alpha_{t})2}{(1-\bar{\alpha}_t)2} \left[ \left| \hat{x}{\theta}(x_t,t) - x_0 \right|{2}^{2} \right] \right] \right]

\end{align}$$

综上所述,VDM模型学习的预测原图 x 0 \boldsymbol{x}_0 x0,因为目标函数中的 ∥ x ^ θ ( x t , t ) − x 0 ∥ 2 2 \|\hat{\boldsymbol{x}}_{\theta}(\boldsymbol{x}_t, t) - \boldsymbol{x}_0\|_{2}^{2} x^θ(xt,t)x022

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值