STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing（CVPR19）

最新推荐文章于 2024-10-15 21:55:48 发布

o0Helloworld0o

最新推荐文章于 2024-10-15 21:55:48 发布

阅读量610

点赞数 1

分类专栏：读书笔记

本文链接：https://blog.csdn.net/o0Helloworld0o/article/details/103508085

版权

读书笔记专栏收录该内容

40 篇文章

订阅专栏

本文提出了一种名为STGAN的模型，该模型针对人脸属性编辑任务，利用Selective Transfer Units（STUs）改进了skip connections，以解决特征丢失问题，同时简化了属性向量输入。STUs基于GRU结构，能更有效地控制信息流，提高属性编辑的准确性和图像生成的质量。文章还讨论了属性向量差值输入的优势，并详细阐述了STGAN的网络架构和损失函数。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

3. Proposed Method

3.1 Limitation of Skip Connections in AttGAN

StarGAN and AttGAN adopt encoder-decoder structure, where spatial pooling or downsampling are essential to obtain high level abstract representation for attribute manipulation. Unfortunately, downsampling irreversibly diminishes spatial resolution and fine details of feature map, which cannot be completely recovered by transposed convolutions and the results are prone to blurring or missing details.

StarGAN和AttGAN使用encoder-decoder结构，其中downsampling操作会损失空间上的细节信息，并且无法通过反卷积来恢复，因此生成图像往往会模糊

AttGAN在encoder和decoder之间增加skip connection，但作用仍然有限，作者并没有从理论上进行分析，而是通过实验验证了skip connection的局限性

考虑AttGAN的4个版本

AttGAN-ED：不使用skip connection
AttGAN：官方版本，使用1个skip connection
AttGAN-2s：使用2个skip connection
AttGAN-UNet：所有层都使用skip connection，相当于UNet

在某个人脸图像数据集上，令target attribute vector等于source attribute vector，进行人脸重构的任务，Table 1列举了重构的2个指标（PSNR/SSIM），Figure 3展示了重构的结果，可以看到skip connection确实使得人脸重构的效果变好了
在这里插入图片描述

现在进行另一个任务，首先在CelebA数据集上训练了一个识别13种attribute的分类器，平均正确率为94.5%，然后生成带有新的attribute的图像，将图像交给attribute分类器去识别，看能不能识别出新加的attribute，从而计算出attribute generate accuracy
在这里插入图片描述
Figure 3展示了4个模型的attribute generation accuracy，可以看到skip connection加得越多，attribute generation accuracy越低

综合以上的结论，增加skip connection，重构的图像质量确实会变好，但生成attribure的能力却变差了

3.2 Taking Difference Attribute Vector as Input

定义 $\text{att}_s$ 为source attribute vector， $\text{att}_t$ 为target attribute vector

仅考虑source attribute vector和target attribute vector之间的差值，有3点好处
$\text{att}_{diff} = \text{att}_t - \text{att}_s \qquad(1)$

差值表示更简单，使得网络更容易训练
差值包含了哪些attribute需要/不需要编辑，attribute编辑的direction信息
差值更容易被用户提供，用户只需要指定改变哪些属性，以及改变的方向即可

3.3 Selective Transfer Units

作者提出一种更高级的skip connection，称为Selective Transfer Units，模型框架图如Figure 5所示
在这里插入图片描述
STU是在GRU的基础上进行改进
公式(2)～(7)

3.4 Network Architecture

STGAN包含2个网络：generator $G$ 、discriminator $D$

其中 $G$ 包括encoder $G_{enc}$ 和decoder $G_{dec}$ ， $D$ 包括判别网络 $D_{adv}$ 和属性分类网络 $D_{att}$

$G_{enc}$ 包含5层卷积层（kernel_size=4，stride=2），因此Figure 5中输入图像之后共有5个立方体

3.5 Loss Functions

$\mathbf{f}=G_{enc}(\mathbf{x}) \qquad(8)$
其中 $\mathbf{f}=\left \{ \mathbf{f}_{enc}^1, \cdots, \mathbf{f}_{enc}^5 \right \}$

4个STU单元的运算如下
$\left ( \mathbf{f}_t^l, s^l \right )=G_{st}^l \left ( \mathbf{f}_{enc}^l, s^{l+1}, \mathbf{att}_{diff} \right ) \qquad(9)$

上述公式可以理解为 $\mathbf{f}=\left \{ \mathbf{f}_{enc}^1, \cdots, \mathbf{f}_{enc}^5 \right \}$ 经过STU后变换为 $\mathbf{f}_t=\left \{ \mathbf{f}_t^1, \cdots, \mathbf{f}_t^4 \right \}$ ，再将 $\mathbf{f}_t$ 和 $\mathbf{f}_{enc}^5$ 送入 $G_{dec}$ 中用于生成图像 $\hat{\mathbf{y}}$ ，即
$\hat{\mathbf{y}}=G_{dec}\left ( \mathbf{f}_{enc}^5, \mathbf{f}_t \right ) \qquad(10)$
综合公式(8)～(10)，有
$\hat{\mathbf{y}}=G\left ( \mathbf{x}, \mathbf{att}_{diff} \right ) \qquad(11)$

Reconstruction loss

当 $\mathbf{att}_{diff}=\mathbf{0}$ 时，生成图像（重构图像）应该与输入图像相等，于是定义reconstruction loss如下
$\mathcal{L}_{rec}=\left \| \mathbf{x} - G(\mathbf{x}, \mathbf{0}) \right \|_1 \qquad(12)$
使用 $\ell_1$ -norm $\left \| \cdot \right \|_1$ 来保证重构图像的sharpness

Adversarial loss

当 $\mathbf{att}_{diff}\neq\mathbf{0}$ 时，生成图像的ground-truth未知，因此只能使用adversarial loss

本文使用WGAN-GP版本的adversarial loss，分别定义 $D_{adv}$ 和 $G$ 的loss如下
$\begin{aligned} \underset{D_{adv}}{\max}\ \mathcal{L}_{D_{adv}} =&\mathbb{E}_\mathbf{x}D_{adv}(\mathbf{x})-\mathbb{E}_\mathbf{\hat{y}}D_{adv}(\mathbf{\hat{y}}) +\\ &\lambda\mathbb{E}_\mathbf{\hat{x}}\left ( \left \| \nabla_\mathbf{\hat{x}}D_{adv}\left ( \mathbf{\hat{x}} \right ) \right \|_2 - 1 \right )^2 \qquad(13) \end{aligned}$
$\underset{G}{\max}\ \mathcal{L}_{G_{adv}}=\mathbb{E}_{\mathbf{x},\mathbf{att}_{diff}}D_{adv}\left ( G\left ( \mathbf{x},\mathbf{att}_{diff} \right ) \right ) \qquad(14)$
其中 $\hat{\mathbf{x}}$ is sampled along lines between pairs of real and generated images

Attribute manipulation loss

引入一个attribute classifier $D_{att}$ ，与 $D_{adv}$ 共享卷积部分的layer
分别定义 $D_{adv}$ 和 $G$ 的attribute manipulation loss如下
$\begin{aligned} \mathcal{L}_{D_{att}}=-\sum_{i=1}^{c}\Big [&\mathbf{att}_s^{(i)}\log D_{att}^{(i)}(\mathbf{x})+\\ &\left ( 1-\mathbf{att}_s^{(i)} \right )\log\left ( 1-D_{att}^{(i)}(\mathbf{x}) \right ) \Big ] \qquad(15) \end{aligned}$
$\begin{aligned} \mathcal{L}_{G_{att}}=-\sum_{i=1}^{c}\Big [&\mathbf{att}_t^{(i)}\log D_{att}^{(i)}(\mathbf{\hat{y}})+\\ &\left ( 1-\mathbf{att}_t^{(i)} \right )\log\left ( 1-D_{att}^{(i)}(\mathbf{\hat{y}}) \right ) \Big ] \qquad(16) \end{aligned}$
上标 $^{(i)}$ 表示属性的第 $i$ 个分量，共有 $c$ 个属性

Model Objective

$D$ 和 $G$ 的目标函数分别如下
$\underset{D}{\min}\ \mathcal{L}_D=-\mathcal{L}_{D_{adv}}+\lambda_1\mathcal{L}_{D_{att}} \qquad(17)$
$\underset{G}{\min}\ \mathcal{L}_G=-\mathcal{L}_{D_{adv}}+\lambda_2\mathcal{L}_{D_{att}}+\lambda_3\mathcal{L}_{rec} \qquad(18)$
实验中设置 $\lambda_1=1$ ， $\lambda_2=10$ ， $\lambda_3=100$