AIGC-Factorized Diffusion: Perceptual Illusions by Noise Decomposition

hflexx

已于 2024-08-19 11:37:46 修改

阅读量326

点赞数 3

分类专栏： AIGC论文文章标签： AIGC stable diffusion pytorch

于 2024-08-01 11:35:04 首次发布

本文链接：https://blog.csdn.net/hwjokcq/article/details/140832871

版权

github:https://dangeng.github.io/factorized_diffusion/
paper:https://arxiv.org/abs/2404.11615

在这里插入图片描述

Abstract

Given a factorization of an image into a sum of linear components, we present a zero-shot method to control each individual component through diffusion model sampling. For example, we can decompose an image into low and high spatial frequencies and condition these components on different text prompts. This produces hybrid images, which change appearance depending on viewing distance. By decomposing an image into three frequency subbands, we can generate hybrid images with three prompts. We also use a decomposition into grayscale and color components to produce images whose appearance changes when they are viewed in grayscale, a phenomena that naturally occurs under dim lighting. And we explore a decomposition by a motion blur kernel, which produces images that change appearance under motion blurring. Our method works by denoising with a composite noise estimate, built from the components of noise estimates conditioned on different prompts. We also show that for certain decompositions, our method recovers prior approaches to compositional generation and spatial control. Finally, we show that we can extend our approach to generate hybrid images from real images. We do this by holding one component fixed and generating the remaining components, effectively solving an inverse problem.

CONTRIBUTION

Given a decomposition of an image into a sum of components, we propose a zero-shot adaptation of diffusion models to control these components during image generation.
Using our method, we produce a variety of perceptual illusions, such as images that change appearance under different viewing distances (hybrid images), illumination conditions (color hybrids), and motion blurring (motion hybrids). Each of these illusions corresponds to a different image decomposition.
We provide quantitative evaluations comparing our hybrid images to those produced by traditional methods, and show that our results are better.
many types of transformations, like the multiscale processing considered in hybrid images, fail because they perturb the noise distribution，but our approach manipulates the noise estimate rather than the noisy image, enabling us to handle illusions that prior work cannot. Please see Appendix G for additional discussion and results.
We show a simple extension of our method allows us to solve inverse problems, and we apply this approach to synthesizing hybrid images from real images.

RELATED WORKS-Diffusion model control

扩散模型的控制能力（Diffusion model control）：
- 扩散模型不仅可以生成图像，还可以在文本提示的条件下编辑图像。这包括了多种技术，使得修改图像的风格、位置和内容的外观成为一项相对容易的任务。
逆过程修改（By modifying the reverse process）：
- 通过修改扩散模型的逆过程，可以控制生成的图像。逆过程涉及逐步去除噪声以恢复干净的图像。
微调（finetuning）：
- 对模型进行微调，以更好地适应特定的生成任务或特定类型的文本提示。
文本反转（performing text inversion）：
- 通过文本反转技术，可以从文本提示中提取特征，以更好地指导图像生成过程。
注意力图交换（swapping attention maps）：
- 交换不同图像的注意力图，以控制图像中不同区域的重要性或焦点。
提供指令（supplying instructions）：
- 直接向模型提供指令，以生成符合特定要求的图像。
使用引导（using guidance）：
- 使用引导技术，如条件引导或风格引导，以影响图像生成的方向。
组合生成（compositional generation）：
- 扩散模型能够生成符合文本提示组合的图像。例如，如果文本提示是“一只坐在草地上的猫”，模型可以生成同时包含猫和草地的图像。

METHODS

在这里插入图片描述

Factorized Diffusion

$\mathbf{x}=\sum_i^Nf_i(\mathbf{x}),$

a decomposition of an image， $x ∈ R^{3×H×W}$ is the sum of N components:
each $f_i(x)$ is a component
each component corresponds to a different text prompt $y_i$

$\tilde{\epsilon}=\sum f_i(\epsilon_i).$

At each step of the reverse diffusion process, instead of computing a single noise estimate， we compute N noise estimates，each component conditioned on each $y_i$ —which we denote by $ϵ_i = ϵ_θ(x_t, y_i, t)$ 【我们会计算N个噪声估计，每个都根据各自的文本提示 $y_i$ 进行条件化】
then construct a composite noise estimate $\tilde{\epsilon}$ made up of components from each $\epsilon_i$ :【然后构造由来自每个噪声源的分量组成的复合噪声估计】
This new noise estimate $\tilde{\epsilon}$ , is used to perform the diffusion update step，each component of the image is denoised while being conditioned by a different text prompt, resulting in a clean image whose components are conditioned on the different prompts
difference:our method differs in that we modify only the noise estimate, and not the input to the diffusion model, $x_t$ .

Analysis of Factorized Diffusion

$\begin{aligned} \mathbf{x}_{t-1}& \xrightarrow{\text{definition of the update step}}=\mathrm{update}(\mathbf{x}_t,\epsilon_\theta) \\ &\xrightarrow{\text{applying the image decomposition}}=\text{update}\left(\sum f_i(\mathbf{x}_t),\sum f_i(\epsilon_\theta)\right) \\ &\xrightarrow{\text{assumption of linearity of the update function}}\sum_i\text{update}(f_i(\mathbf{x}_t),f_i(\epsilon_\theta)) \end{aligned}\\\Downarrow\\\mathbf{x}_{t-1}=\sum_i^N\mathrm{update}(f_i(\mathbf{x}_t),f_i(\epsilon_\theta(\mathbf{x}_t,y_i,t)).$

最低0.47元/天解锁文章

hflexx

关注

3
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
AIGC-Factorized Diffusion: Perceptual Illusions by Noise Decomposition

AIGC-Factorized Diffusion: Perceptual Illusions by Noise Decomposition论文详解
复制链接

扫一扫

专栏目录