AIGC-AI二维码《Diffusion-based Aesthetic QR Code Generation via Scanning-Robust Perceptual Guidance》论文解读-CSDN博客

本文链接：https://blog.csdn.net/hwjokcq/article/details/138394808

Diffusion-based Aesthetic QR Code Generation via Scanning-Robust Perceptual Guidance–CVPR2024- 基于扩散模型的艺术性二维码

论文：
https://arxiv.org/abs/2403.15878
网站:
https://jwliao1209.github.io/DiffQRCode/
常见的艺术性二维码网站:
https://cli.im/
https://qrdiffusion.com/
https://github.com/latentcat/qrbtf
https://www.midjourney.com/home
http://logoq.net/logoq/index.php
https://huggingface.co/monster-labs/control_v1p_sd15_qrcode_monster
https://openart.ai/apps/ai_qrcode
https://research.swtch.com/qart

在这里插入图片描述

通过潜在扩散模型 (LDM) 和 ControlNet 的卓越能力作为美观 QR 码图像的先验知识，再加上我们提出的扫描鲁棒性（感知）指导Scanning-Robust (Perceptual) Guidance，可以生成符合用户提示的自定义样式 QR 码，同时确保兼具可浏览性和美观性。

Abstract

QR codes, prevalent in daily applications, lack visual appeal due to their conventional black-and-white design. Integrating aesthetics while maintaining scannability poses a challenge. In this paper, we introduce a novel diffusion-model-based aesthetic QR code generation pipeline, utilizing pre-trained ControlNet and guided iterative refinement via a novel classifier guidance (SRG) based on the proposed ScanningRobust Loss (SRL) tailored with QR code mechanisms, which ensures both aesthetics and scannability. To further improve the scannability while preserving aesthetics, we propose a two-stage pipeline with ScanningRobust Perceptual Guidance (SRPG). Moreover, we can further enhance the scannability of the generated QR code by postprocessing it through the proposed Scanning-Robust Projected Gradient Descent (SRPGD) post-processing technique based on SRL with proven convergence. With extensive quantitative, qualitative, and subjective experiments, the results demonstrate that the proposed approach can generate diverse aesthetic QR codes with flexibility in detail. In addition, our pipelines outperforming existing models in terms of Scanning Success Rate (SSR) 86.67% (+40%) with comparable aesthetic scores. The pipeline combined with SRPGD further achieves 96.67% (+50%).

MOTIVATION

在将QR码模式与具有语义意义的参考图像集成时，不依赖于基于风格转移的方法会面临挑战，而次优的集成可能会影响扫描性能和美学效果。
- 主流方法是调整 ControlNet 无分类器引导 (CFG) 权重来生成美观的二维码。
- 然而，引导权重过低会抑制 QR 码的可扫描性，而引导权重较高则会损害美观。在实际方案中，通常采用手动后处理，但费时费力。（通过SRG解决）
现有方法遇到的困难：
- QR Code Monster 和 QRBTF 能够生成具有视觉吸引力的 QR 码，但可扫描性scannability不确定；
- QR Code AI Art 和 QR Diffusion 能够生成可扫描的 QR 码，但美观性有限（limited aesthetics）；

CONTRIBUTION

提出了一种新的基于扩散模型的迭代细化方法，结合了针对二维码机制定制的扫描鲁棒引导（Scanning-Robust Guidance, SRG）。
提出了一个两阶段的生成管道（a two-stage pipeline with Scanning-Robust Perceptual Guidance (SRPG)，通过扫描鲁棒感知引导（SRG）进一步提高美学，同时保持可扫描性。
通过基于 SRL （Scanning-Robust Loss）的扫描鲁棒投影梯度下降（Scanning-Robust Projected Gradient Descent-SRPGD）后处理技术进行后处理，进一步增强生成的 QR 码的可扫描性，该SRL 的收敛性已经经过验证。
通过广泛的定量、定性和主观实验，证明了所提方法的有效性，并在扫描成功率（Scanning Success Rate, SSR）和美学评分（LAION Aesthetics Score, LAS）方面超越了现有的开源和专有生成模型。

Related Work-Aesthetic QR Codes

基于模块的技术：以往的研究主要集中在两种技术——模块变形和模块重组。
- 模块变形：涉及对二维码的每个模块进行变形和缩放，以融入参考图像。例如，Visualead、LogoQ和Halftone QR code都是这类技术的应用。
- 模块重组：通过数学算法（如Gaussian-Jordan消除过程）重新排列二维码的模块，以匹配参考图像的模式，同时保持二维码的可解码性。
图像处理技术
the region of interest , central saliency , and global gray values进一步提高了美观QR码的视觉质量
Stylized aEsthEtic (SEE) QR code：SEE 通过后处理算法（postprocessing algorithm）解决了风格迁移引起的可扫描性退化。然而，它可能会导致黑白像素集中在二维码模块上，从而产生不自然的伪影。

diffusion model

为了加深印象，在此回忆一下 diffusion model

https://www.zhangzhenhu.com/aigc/ddim.html

DDPM

在 DDPM 中，真实世界的图像数据用随机变量X0,概率密度q(x0)具体形式是未知的。没法直接从q（x0）采样生成新的图片，但我们有很多x0的观测样本，可以想办法用这些观测样本估计出q(x0）的一个近似表示，从这个近似表示中抽样生成样本。
核心的思想是，构建一个马尔科夫链式结构，逐步的向 x0上添加高斯随机噪声，并最终令其演变成一个纯高斯数据(标准正态分布)，把这个过程称为加噪过程，或者前向过程。它的逆向过程就是逐步降噪的过程，主要估计出逆向过程中每一步的降噪转换核p(x_t-1|x_t)，就可以从一个标准正态分布的高斯噪声数据X_T ，逐步的降噪生成一张图片数据。
DDPM框架
在这里插入图片描述

step1：训练
$\begin{aligned}&扩散过程的递推公式\\ x_{t} & =\sqrt{\bar{\alpha}_{t}} x_{0}+\sqrt{1-\bar{\alpha}_{t}} \epsilon_{t}, \bar{\alpha}_{t}=\prod_{i=1}^{t} \alpha_{i}, \\ \end{aligned}$

$\epsilon_t\sim\mathcal{N}(0,I)\sim\mathcal{N}(\sqrt{\bar{\alpha}_t}\mathrm{~}x_0,(1-\bar{\alpha}_t)I)$

$x_0$ :初始图像
$x_t$ :第t步图像
$\epsilon_t$ :服从高斯分布的噪声，表示在时间步 $t$ 添加的噪声
在扩散的每一步，噪声项可以看作是围绕当前扩散表示 $X_t$ 的噪声分布

$最小化损失函数\\L_\gamma:=\sum_{t=1}^T\gamma_t\mathbb{E}_{q(x_t|x_0)}\left[\left\|\epsilon_t-{\epsilon}_\theta(x_t,t)\right\|_2^2\right] ,\epsilon_t\sim\mathcal{N}(0,I)$

T：扩散步骤的总数
$\gamma_t$ ：权重因子，用于在不同的时间步对损失进行加权
${\epsilon}_\theta(x_t,t)$ :模型参数 $θ$ 下 $x_t$ 的预测噪声项。

step2：采样
从T到1不断更新：扩散模型中用于预测前一时间步潜在表示 $x_{t-1}$ 的关键方程：

$x_{t-1} = \frac{1}{\sqrt{\alpha_t}}\left(x_t - \frac{1-\alpha_t}{\sqrt{1 - \bar{\alpha}_t}} \epsilon_\theta(x_t, t)\right) + \sigma_t z$

$x_t$ :当前时间步 𝑡的潜在表示，即扩散过程中的中间状态
$x_{t−1}$ ：我们希望计算的前一时间步 𝑡−1的潜在表示。
$z : z \sim N (0, I)$ ,标准正态分布 N(0,I) 中采样的随机噪声项。
${\epsilon}_\theta(x_t,t)$ :模型参数 $θ$ 下 $x_t$ 的预测噪声项。

DDIM(为非马尔科夫扩散过程)

在这个新的定义中，前向过程没有了马尔科夫的假设，并且逆向转换核也不再满足马尔科夫链的定义（仅依赖上一个状态）

$\begin{aligned} x_{t-1}& =\sqrt{\bar{\alpha}_{t-1}}\left.\hat{x}_0+\sqrt{1-\bar{\alpha}_{t-1}-\sigma_t^2}\cdot\frac{x_t-\sqrt{\bar{\alpha}_t}\left.\hat{x}_0\right.}{\sqrt{1-\bar{\alpha}_t}}+\sigma_tz\right. \\ &=\sqrt{\bar{\alpha}_{t-1}}\underbrace{\left(\frac{x_t-\sqrt{1-\bar{\alpha}_t}}{\sqrt{\bar{\alpha}_t}}\right.{\epsilon}_t(x_t,t)}_{\text{predict }x_0}+\underbrace{\sqrt{1-\bar{\alpha}_{t-1}-\sigma_t^2}.{\epsilon_t(x_t,t)}}_{\text{direction pointing to }x_t}+\underbrace{\sigma_tz}_{\text{random noise}} \\ &\mathrm{where}\quad z\sim\mathcal{N}(0,I) \end{aligned}$