【扩散模型】 DDPM和DDIM讲解

求求你来BUG行不行

已于 2023-10-18 17:18:00 修改

阅读量7.1k

点赞数

文章标签：深度学习扩散模型

于 2023-10-05 16:36:03 首次发布

本文链接：https://blog.csdn.net/aaatomaaa/article/details/133580069

版权

扩散模型DDPM和DDIM

扩散模型之DDPM介绍了经典扩散模型DDPM的原理和实现，那么生成一个样本的次数和训练次数需要一致，导致采样过程很缓慢。这篇文章我们将介绍另外一种扩散模型DDIM（Denoising Diffusion Implicit Models），它两有相同的训练目标，但是它不再限制扩散过程必须是一个马尔卡夫链，这使得DDIM可以采用更小的采样步数来加速生成过程，DDIM的另外是一个特点是从一个随机噪音生成样本的过程是一个确定的过程（中间没有加入随机噪音）。
参考链接：
B站视频： https://www.bilibili.com/video/BV1JY4y1N7dn/
https://zhuanlan.zhihu.com/p/565698027 扩散模型之DDIM

https://zhuanlan.zhihu.com/p/627616358 【生成模型（三）】一文读懂DDIM凭什么可以加速DDPM的采样效率

https://blog.csdn.net/weixin_43850253/article/details/128413786 DDIM原理及代码(Denoising diffusion implicit models

https://blog.csdn.net/weixin_43850253/article/details/128275723 IDDPM原理和代码剖析
https://blog.csdn.net/D_Trump/article/details/126611014 008_SSSS_ Improved Denoising Diffusion Probabilistic Models
https://zhuanlan.zhihu.com/p/626062570 Improved Diffusion代码和理解
https://blog.csdn.net/zzfive/article/details/127169343 IDDPM论文解读

代码：
https://zhuanlan.zhihu.com/p/635144824
Diffusion之DDPM代码简述、MNIST与Fashion-MNIST生成实战及DDIM加速生成应用
https://blog.csdn.net/qq_41234663/article/details/128780745
扩散模型（Diffusion model）代码详细解读

回顾DDPM：

在这里插入图片描述
前向过程为一个马尔科夫链，上面公式和DDPM原文里面的alpha代表的不一样了，可能是为了方便起见了。通过推导可以得到：

反向过程也定义为马尔科夫链：

之后使用后验概率：

解得方差为定值，均值为：
在这里插入图片描述
通过变分推断和KL散度以及简化得到：

可以发现DDPM仅依赖于边缘分布，因此可以做其他修改。

DDIM:

为了构造非马尔科夫链的扩散过程，并且复用DDPM的训练过程，现在上面的公式变为：
在这里插入图片描述
并且我们自己构造一个分布，只要满足
即可。即前向过程具体怎样我们不需要管了。因此通过待定系数法可以得到：

这就是我们新的反向生成分布，也就是我们新的要去拟合的“终极目标”。
DDIM中定义以下公式：
在这里插入图片描述

上面这个公式是基于T的。
对于t>=1的时候要满足：

通过数学归纳法（在论文中有证明过程）可以得到以下边缘分布公式，因此可以使用DDPM去进行训练：

通过xt和x0之间的关系（因为不是马尔科夫了，可以进行变换）推导得到以下公式：
在这里插入图片描述
并且通过证明得到DDPM中的Lsimple和DDIM中的损失是一致的。
论文中将方差定义为超参数：

当标准差为0时，生成过程就是确定性的。
DDIM是一种模型，不能加速，但是我们可以使用技巧在DDIM上加速采样。
我们的采样过程是在一个更短的子序列上面进行的。

ddim_sample.py

@torch.no_grad()
def ddim_sample(self, shape, return_all_timesteps = False):
    batch, device, total_timesteps, sampling_timesteps, eta, objective = shape[0], self.betas.device, self.num_timesteps, self.sampling_timesteps, self.ddim_sampling_eta, self.objective
    times = torch.linspace(-1, total_timesteps - 1, steps = sampling_timesteps + 1)   # [-1, 0, 1, 2, ..., T-1] when sampling_timesteps == total_timesteps
    times = list(reversed(times.int().tolist()))
    time_pairs = list(zip(times[:-1], times[1:])) # [(T-1, T-2), (T-2, T-3), ..., (1, 0), (0, -1)]
    img = torch.randn(shape, device = device)
    imgs = [img]
    x_start = None
    for time, time_next in tqdm(time_pairs, desc = 'sampling loop time step'):
        time_cond = torch.full((batch,), time, device = device, dtype = torch.long)
        self_cond = x_start if self.self_condition else None
        pred_noise, x_start, *_ = self.model_predictions(img, time_cond, self_cond, clip_x_start = True)
        imgs.append(img)
        if time_next < 0:
            img = x_start
            continue

        alpha = self.alphas_cumprod[time]
        alpha_next = self.alphas_cumprod[time_next]
        sigma = eta * ((1 - alpha / alpha_next) * (1 - alpha_next) / (1 - alpha)).sqrt()
        c = (1 - alpha_next - sigma ** 2).sqrt()
        noise = torch.randn_like(img)
        img = x_start * alpha_next.sqrt() + \
              c * pred_noise + \
              sigma * noise
    ret = img if not return_all_timesteps else torch.stack(imgs, dim = 1)
    ret = self.unnormalize(ret)
    return ret