Classifier Guided Diffusion

前言

上次已经学习了open AI的 DDPM(DDPM原理与代码剖析)和 IDDPM(IDDPM原理和代码剖析), 以及 斯坦福的 DDIM DDIM原理及代码(Denoising diffusion implicit models). 这次来看openAI的另一个作品 Diffusion Models Beat GANs on Image Synthesis
github: https://github.com/openai/guided-diffusion

该博客主要参考 66、Classifier Guided Diffusion条件扩散模型论文与PyTorch代码详细解读

该部分代码主要基于 IDDPM这篇论文对应的代码 IDDPM原理和代码剖析

先挖个坑…
代码分析部分还没完成。。。



理论

前置

(1) 作者先在uncondition 的扩散模型上做了很多消融实验,得到了一些结论,并用这些结论设计结构
(2) 一种straightforward的condition 扩散模型方法是将label信息进行embedding后加到time embedding中,但是效果不是很好。所以本文加上了分类器指导的方法(并没有把上述的常规的condition生成方法丢弃)。
具体的做法是在分类器中获取图片X的梯度,从而辅助模型进行采样生成图像。



Introduction

(1) diffusion模型是一个似然模型。
(2) 模型借鉴了improve-ddpm中预测方差的range(即公式中的v)
Σ θ ( X t , t ) = e x p ( v l o g β t + ( 1 − v ) l o g β ~ t ) \Sigma_{\theta}(X_t, t)=exp(vlog\beta_t + (1-v)log \widetilde{\beta}_t) Σθ(Xt,t)=exp(vlogβt+(1v)logβ t)

(3) 更改unet结构:
We explore the following architectural changes:
• Increasing depth versus width, holding model size relatively constant.
• Increasing the number of attention heads.
• Using attention at 32×32, 16×16, and 8×8 resolutions rather than only at 16×16.
• Using the BigGAN residual block for upsampling and downsampling the activations,
following.
• Rescaling residual connections with 1 2 \frac{1}{\sqrt{2}} 2 1, following [60, 27, 28].

Adaptive Group Normalization
用time embedding和label embedding 去生成 y s y_s ys y b y_b yb
A d a G N ( h , y ) = y s G r o u p N o r m ( h ) + y b AdaGN(h, y) = y_s GroupNorm(h)+y_b AdaGN(h,y)=ysGroupNorm(h)+yb

以下部分在附录H (P25-26)










代码

其实推导了那么多,代码还是差不多,这里只讲有区别的地方。

p_sample

guided_diffusion/gaussian_diffusion.py

def p_sample(
        self,
        model,
        x,
        t,
        clip_denoised=True,
        denoised_fn=None,
        cond_fn=None,
        model_kwargs=None,
    ):
        """
        Sample x_{t-1} from the model at the given timestep.
        :param cond_fn: if not None, this is a gradient function that acts
                        similarly to the model.
        """
        out = self.p_mean_variance(
            model,
            x,
            t,
            clip_denoised=clip_denoised,
            denoised_fn=denoised_fn,
            model_kwargs=model_kwargs,
        )
        noise = th.randn_like(x)
        nonzero_mask = (
            (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
        )  # no noise when t == 0
        if cond_fn is not None:
            out["mean"] = self.condition_mean(
                cond_fn, out, x, t, model_kwargs=model_kwargs
            )
        sample = out["mean"] + nonzero_mask * th.exp(0.5 * out["log_variance"]) * noise
        return {"sample": sample, "pred_xstart": out["pred_xstart"]}

对比可以发现,这里多了这一步

if cond_fn is not None:
   out["mean"] = self.condition_mean(
      cond_fn, out, x, t, model_kwargs=model_kwargs
   )



condition_mean

guided_diffusion/gaussian_diffusion.py

    def condition_mean(self, cond_fn, p_mean_var, x, t, model_kwargs=None):
        """
        Compute the mean for the previous step, given a function cond_fn that
        computes the gradient of a conditional log probability with respect to
        x. In particular, cond_fn computes grad(log(p(y|x))), and we want to
        condition on y.

        This uses the conditioning strategy from Sohl-Dickstein et al. (2015).
        """
        gradient = cond_fn(x, self._scale_timesteps(t), **model_kwargs)
        new_mean = (
            p_mean_var["mean"].float() + p_mean_var["variance"] * gradient.float()
        )
        return new_mean



cond_fn

scripts/classifier_sample.py

这里要返回的是 s × ▽ X t l o g p ϕ ( y ∣ X t ) s \times \bigtriangledown_{X_t} log p_{\phi}(y|X_t) s×Xtlogpϕ(yXt), 其中, s s s 是 args.classifier_scale

def cond_fn(x, t, y=None):
   assert y is not None
   with th.enable_grad():
       x_in = x.detach().requires_grad_(True)
       logits = classifier(x_in, t)
       log_probs = F.log_softmax(logits, dim=-1)
       selected = log_probs[range(len(logits)), y.view(-1)]
       return th.autograd.grad(selected.sum(), x_in)[0] * args.classifier_scale



ddim_sample

这是ddim的采样方法,关于这个在 IDDPM原理和代码剖析 有介绍,不明白的请移步哦。这里只讲主要变换。

def ddim_sample(
        self,
        model,
        x,
        t,
        clip_denoised=True,
        denoised_fn=None,
        cond_fn=None,
        model_kwargs=None,
        eta=0.0,
    ):
        """
        Sample x_{t-1} from the model using DDIM.

        Same usage as p_sample().
        """
        out = self.p_mean_variance(
            model,
            x,
            t,
            clip_denoised=clip_denoised,
            denoised_fn=denoised_fn,
            model_kwargs=model_kwargs,
        )
        if cond_fn is not None:
            out = self.condition_score(cond_fn, out, x, t, model_kwargs=model_kwargs)

        # Usually our model outputs epsilon, but we re-derive it
        # in case we used x_start or x_prev prediction.
        eps = self._predict_eps_from_xstart(x, t, out["pred_xstart"])

        alpha_bar = _extract_into_tensor(self.alphas_cumprod, t, x.shape)
        alpha_bar_prev = _extract_into_tensor(self.alphas_cumprod_prev, t, x.shape)
        sigma = (
            eta
            * th.sqrt((1 - alpha_bar_prev) / (1 - alpha_bar))
            * th.sqrt(1 - alpha_bar / alpha_bar_prev)
        )
        # Equation 12.
        noise = th.randn_like(x)
        mean_pred = (
            out["pred_xstart"] * th.sqrt(alpha_bar_prev)
            + th.sqrt(1 - alpha_bar_prev - sigma ** 2) * eps
        )
        nonzero_mask = (
            (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
        )  # no noise when t == 0
        sample = mean_pred + nonzero_mask * sigma * noise
        return {"sample": sample, "pred_xstart": out["pred_xstart"]}

if cond_fn is not None:
    out = self.condition_score(cond_fn, out, x, t, model_kwargs=model_kwargs)



condition_score

    def condition_score(self, cond_fn, p_mean_var, x, t, model_kwargs=None):
        """
        Compute what the p_mean_variance output would have been, should the
        model's score function be conditioned by cond_fn.

        See condition_mean() for details on cond_fn.

        Unlike condition_mean(), this instead uses the conditioning strategy
        from Song et al (2020).
        """
        alpha_bar = _extract_into_tensor(self.alphas_cumprod, t, x.shape)

        eps = self._predict_eps_from_xstart(x, t, p_mean_var["pred_xstart"])
        eps = eps - (1 - alpha_bar).sqrt() * cond_fn(
            x, self._scale_timesteps(t), **model_kwargs
        )

        out = p_mean_var.copy()
        out["pred_xstart"] = self._predict_xstart_from_eps(x, t, eps)
        out["mean"], _, _ = self.q_posterior_mean_variance(
            x_start=out["pred_xstart"], x_t=x, t=t
        )
        return out

其中, alpha_bar 是 α ‾ t \overline{\alpha}_t αt

alpha_bar = _extract_into_tensor(self.alphas_cumprod, t, x.shape)

eps 是 ϵ θ ( X t ) − 1 − α ‾ t ▽ X t l o g p ϕ ( y ∣ X t ) \epsilon_{\theta}(X_t)-\sqrt{1-\overline{\alpha}_t} \bigtriangledown_{X_t} log p_{\phi}(y|X_t) ϵθ(Xt)1αt Xtlogpϕ(yXt), 其中 cond_fn 函数返回的就是 ▽ X t l o g p ϕ ( y ∣ X t ) \bigtriangledown_{X_t} log p_{\phi}(y|X_t) Xtlogpϕ(yXt)

eps = self._predict_eps_from_xstart(x, t, p_mean_var["pred_xstart"])
eps = eps - (1 - alpha_bar).sqrt() * cond_fn(
    x, self._scale_timesteps(t), **model_kwargs
)

后面的就和原始DDIM公式一样

但是我看代码更像是
μ ~ ( X t , X 0 ) = α ‾ t − 1 1 − α ‾ t X 0 + α t ( 1 − α ‾ t − 1 ) 1 − α ‾ t X t \widetilde{\mu}(X_t, X_0) = \frac{\sqrt{\overline{\alpha}_{t-1}}}{1-\overline{\alpha}_t} X_0 + \frac{\sqrt{\alpha_t}(1-\overline{\alpha}_{t-1})}{1-\overline{\alpha}_{t}}X_t μ (Xt,X0)=1αtαt1 X0+1αtαt (1αt1)Xt

out = p_mean_var.copy()
out["pred_xstart"] = self._predict_xstart_from_eps(x, t, eps)
out["mean"], _, _ = self.q_posterior_mean_variance(
    x_start=out["pred_xstart"], x_t=x, t=t
)

q_posterior_mean_variance函数返回的均值是这么算的

posterior_mean = (
            _extract_into_tensor(self.posterior_mean_coef1, t, x_t.shape) * x_start
            + _extract_into_tensor(self.posterior_mean_coef2, t, x_t.shape) * x_t
        )

μ ~ ( X t , X 0 ) = α ‾ t − 1 1 − α ‾ t X 0 + α t ( 1 − α ‾ t − 1 ) 1 − α ‾ t X t \widetilde{\mu}(X_t, X_0) = \frac{\sqrt{\overline{\alpha}_{t-1}}}{1-\overline{\alpha}_t} X_0 + \frac{\sqrt{\alpha_t}(1-\overline{\alpha}_{t-1})}{1-\overline{\alpha}_{t}}X_t μ (Xt,X0)=1αtαt1 X0+1αtαt (1αt1)Xt

posterior_mean_coef1 就是 α ‾ t − 1 1 − α ‾ t \frac{\sqrt{\overline{\alpha}_{t-1}}}{1-\overline{\alpha}_t} 1αtαt1

posterior_mean_coef2 就是 α t ( 1 − α ‾ t − 1 ) 1 − α ‾ t \frac{\sqrt{\alpha_t}(1-\overline{\alpha}_{t-1})}{1-\overline{\alpha}_{t}} 1αtαt (1αt1)

self.posterior_mean_coef1 = (
    betas * np.sqrt(self.alphas_cumprod_prev) / (1.0 - self.alphas_cumprod)
        )
self.posterior_mean_coef2 = (
    (1.0 - self.alphas_cumprod_prev)
    * np.sqrt(alphas)
    / (1.0 - self.alphas_cumprod)
)
  • 8
    点赞
  • 26
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
classifier guided diffusion 是一种通过使用分类器来指导扩散过程的算法。该算法通过如下步骤实现: 首先,需要准备一组训练数据,包括输入图像和相应的标签信息。这些标签信息可以是图像中物体的位置、类别或其他属性。 然后,通过训练一个分类器来学习输入图像和标签之间的关系。常用的分类器包括支持向量机(SVM)、卷积神经网络(CNN)等。训练得到的分类器可以将新的输入图像映射到对应的标签类别上。 在扩散过程中,首先从输入图像中选择一个像素点作为种子点,并将其对应的标签作为目标标签。然后,使用种子点和目标标签作为输入,通过分类器预测图像中其他像素点与目标标签的相似度。 接下来,根据预测的相似度,选择一个最相似的像素点,并将其标签指定为目标标签。然后,将该像素点作为新的种子点,继续进行相似度预测和像素点标签更新的过程,直到所有像素点都被扩散到为止。 最后,通过将扩散后的图像与原始图像进行比较,可以得到分类器指导下的图像扩散效果。根据不同的标签信息,可以实现物体分割、图像修复、图像分割等各种图像处理任务。 classifier guided diffusion 代码的实现需要编写分类器的训练和预测代码,并编写扩散过程的代码。在编写扩散代码时,需要注意优化算法的选择和参数设置,以确保扩散过程能够有效地进行。同时,还需要注意对输入图像和标签进行预处理,以提高分类器的准确性和扩散结果的质量。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值