AIGC-Factorized Diffusion: Perceptual Illusions by Noise Decomposition

github:https://dangeng.github.io/factorized_diffusion/
paper:https://arxiv.org/abs/2404.11615

在这里插入图片描述

Abstract

Given a factorization of an image into a sum of linear components, we present a zero-shot method to control each individual component through diffusion model sampling. For example, we can decompose an image into low and high spatial frequencies and condition these components on different text prompts. This produces hybrid images, which change appearance depending on viewing distance. By decomposing an image into three frequency subbands, we can generate hybrid images with three prompts. We also use a decomposition into grayscale and color components to produce images whose appearance changes when they are viewed in grayscale, a phenomena that naturally occurs under dim lighting. And we explore a decomposition by a motion blur kernel, which produces images that change appearance under motion blurring. Our method works by denoising with a composite noise estimate, built from the components of noise estimates conditioned on different prompts. We also show that for certain decompositions, our method recovers prior approaches to compositional generation and spatial control. Finally, we show that we can extend our approach to generate hybrid images from real images. We do this by holding one component fixed and generating the remaining components, effectively solving an inverse problem.

CONTRIBUTION

  • Given a decomposition of an image into a sum of components, we propose a zero-shot adaptation of diffusion models to control these components during image generation.
  • Using our method, we produce a variety of perceptual illusions, such as images that change appearance under different viewing distances (hybrid images), illumination conditions (color hybrids), and motion blurring (motion hybrids). Each of these illusions corresponds to a different image decomposition.
  • We provide quantitative evaluations comparing our hybrid images to those produced by traditional methods, and show that our results are better.
  • many types of transformations, like the multiscale processing considered in hybrid images, fail because they perturb the noise distribution,but our approach manipulates the noise estimate rather than the noisy image, enabling us to handle illusions that prior work cannot. Please see Appendix G for additional discussion and results.
  • We show a simple extension of our method allows us to solve inverse problems, and we apply this approach to synthesizing hybrid images from real images.

RELATED WORKS-Diffusion model control

  1. 扩散模型的控制能力(Diffusion model control):

    • 扩散模型不仅可以生成图像,还可以在文本提示的条件下编辑图像。这包括了多种技术,使得修改图像的风格、位置和内容的外观成为一项相对容易的任务。
  2. 逆过程修改(By modifying the reverse process):

    • 通过修改扩散模型的逆过程,可以控制生成的图像。逆过程涉及逐步去除噪声以恢复干净的图像。
  3. 微调(finetuning):

    • 对模型进行微调,以更好地适应特定的生成任务或特定类型的文本提示。
  4. 文本反转(performing text inversion):

    • 通过文本反转技术,可以从文本提示中提取特征,以更好地指导图像生成过程。
  5. 注意力图交换(swapping attention maps):

    • 交换不同图像的注意力图,以控制图像中不同区域的重要性或焦点。
  6. 提供指令(supplying instructions):

    • 直接向模型提供指令,以生成符合特定要求的图像。
  7. 使用引导(using guidance):

    • 使用引导技术,如条件引导或风格引导,以影响图像生成的方向。
  8. 组合生成(compositional generation):

    • 扩散模型能够生成符合文本提示组合的图像。例如,如果文本提示是“一只坐在草地上的猫”,模型可以生成同时包含猫和草地的图像。

METHODS

在这里插入图片描述

Factorized Diffusion

x = ∑ i N f i ( x ) , \mathbf{x}=\sum_i^Nf_i(\mathbf{x}), x=iNfi(x),

  • a decomposition of an image, x ∈ R 3 × H × W x ∈ R^{3×H×W} xR3×H×W is the sum of N components:

  • each f i ( x ) f_i(x) fi(x) is a component

  • each component corresponds to a different text prompt y i y_i yi

ϵ ~ = ∑ f i ( ϵ i ) . \tilde{\epsilon}=\sum f_i(\epsilon_i). ϵ~=fi(ϵi).

  • At each step of the reverse diffusion process, instead of computing a single noise estimate, we compute N noise estimates,each component conditioned on each y i y_i yi—which we denote by ϵ i = ϵ θ ( x t , y i , t ) ϵ_i = ϵ_θ(x_t, y_i, t) ϵi=ϵθ(xt,yi,t)【我们会计算N个噪声估计,每个都根据各自的文本提示 y i y_i yi进行条件化】
  • then construct a composite noise estimate ϵ ~ \tilde{\epsilon} ϵ~ made up of components from each ϵ i \epsilon_i ϵi:【然后构造由来自每个噪声源的分量组成的复合噪声估计】
  • This new noise estimate ϵ ~ \tilde{\epsilon} ϵ~, is used to perform the diffusion update step,each component of the image is denoised while being conditioned by a different text prompt, resulting in a clean image whose components are conditioned on the different prompts
  • difference:our method differs in that we modify only the noise estimate, and not the input to the diffusion model, x t x_t xt.

Analysis of Factorized Diffusion

x t − 1 → definition of the update step = u p d a t e ( x t , ϵ θ ) → applying the image decomposition = update ( ∑ f i ( x t ) , ∑ f i ( ϵ θ ) ) → assumption of linearity of the update function ∑ i update ( f i ( x t ) , f i ( ϵ θ ) ) ⇓ x t − 1 = ∑ i N u p d a t e ( f i ( x t ) , f i ( ϵ θ ( x t , y i , t ) ) . \begin{aligned} \mathbf{x}_{t-1}& \xrightarrow{\text{definition of the update step}}=\mathrm{update}(\mathbf{x}_t,\epsilon_\theta) \\ &\xrightarrow{\text{applying the image decomposition}}=\text{update}\left(\sum f_i(\mathbf{x}_t),\sum f_i(\epsilon_\theta)\right) \\ &\xrightarrow{\text{assumption of linearity of the update function}}\sum_i\text{update}(f_i(\mathbf{x}_t),f_i(\epsilon_\theta)) \end{aligned}\\\Downarrow\\\mathbf{x}_{t-1}=\sum_i^N\mathrm{update}(f_i(\mathbf{x}_t),f_i(\epsilon_\theta(\mathbf{x}_t,y_i,t)). xt1definition of the update step =update(xt,ϵθ)applying the image decomposition =update(f

  • 3
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值