StarGAN v2: Diverse Image Synthesis for Multiple Domains

最新推荐文章于 2022-04-08 15:15:08 发布

o0Helloworld0o

最新推荐文章于 2022-04-08 15:15:08 发布

阅读量1.1k

点赞数

分类专栏：读书笔记

本文链接：https://blog.csdn.net/o0Helloworld0o/article/details/103642514

版权

读书笔记专栏收录该内容

40 篇文章 1 订阅

订阅专栏

1. Introduction

定义domain和style

Here, domain implies a set of images that can be grouped as a visually distinctive category, and each image has a unique appearance, which
we call style.
For example, we can set image domains based on the gender of a person, in which case the style include makeup, beard, and hairstyle.

2. StarGAN v2

2.1. Proposed framework

输入图像 $\mathbf{x}\in\mathcal{X}$ ，arbitrary domain $y\in\mathcal{Y}$ ， $y$ 相当于domain的id
在这里插入图片描述
Generator（Figure 2a）
生成器 $G$ 将一幅输入图像 $\mathbf{x}$ 按照指定的style code $\mathbf{s}$ ，生成图像 $G(\mathbf{x}, \mathbf{s})$

按照之前的做法，应该将target domain $y$ 送入 $G$ 中，此处将style code $\mathbf{s}$ 送入 $G$ 中，要求 $\mathbf{s}$ 是target domain $y$ 的一种style

Mapping network（Figure 2b）
给定latent code $\mathbf{z}$ （下文提到了 $\mathbf{z}$ 是随机采样Gaussian Distribution得到的），Mapping Network $F$ 将 $\mathbf{z}$ 转换为指定domain $y$ 中的一个style code $\mathbf{s}$ ，即 $\mathbf{s}=F_y(\mathbf{z})$ ，下标 $y$ 表示 $F$ 中的一个branch，因为 $F$ 是多个branch的，所以说 $F$ 是multi-task architecture
注：multi-branch architecture $F_y$ 其实可以用condition architecture $F(\mathbf{z}, y)$ 来代替，这样处理更加灵活

Style encoder（Figure 2c）
给定输入图像 $\mathbf{x}$ 以及对应的domain $y$ ，encoder网络 $E$ 负责解析出图像所包含的style code $\mathbf{s}=E_y(\mathbf{x})$
Question：因为 $E$ 也是multi-branch architecture，那么假设 $\mathbf{x}$ 对应了domain 1，那么 $F_{y=1}(\mathbf{x})$ 是对 $\mathbf{x}$ 提取得到的style code，但是其它branch $F_{y\neq1}(\mathbf{x})$ 的输出，其实是没有意义的

Discriminator（Figure 2d）
判别器 $D$ 包含多个分支，每个分支 $D_y$ 负责判别在domain $y$ 中图像的真假
Q：原版StarGAN的判别器只接收一个输入，如果额外接收一个label作为输入会怎样？
A：判别器只接收图像作为输入，输出real/fake和domain_id，等价于判别器接收图像和label作为输入，输出real/fake

2.2. Training objectives

Adversarial objective
在训练过程中，随机采样latent code $\mathbf{z}\in\mathcal{Z}$ 以及target domain $\tilde{y}\in\mathcal{Y}$ ，然后生成target style code $\tilde{\mathbf{s}}=F_{\tilde{y}}(\mathbf{z})$ ，接着使用 $G$ 生成图像 $G(\mathbf{x}, \tilde{\mathbf{s}})$ ，于是adversarial loss定义如下
$\begin{aligned} \mathcal{L}_{adv}=&\mathbb{E}_{\mathbf{x},y}\left [ \log D_y(\mathbf{x}) \right ]+\\ &\mathbb{E}_{\mathbf{x},\tilde{y},\mathbf{z}}\left [ \log\left ( 1-D_{\tilde{y}}\left ( G\left ( \mathbf{x},\tilde{\mathbf{s}} \right ) \right ) \right ) \right ] \qquad(1) \end{aligned}$

Style reconstruction
为了保证 $G$ 生成的图像包含了style code $\tilde{\mathbf{s}}$ ，定义style reconstruction loss如下
$\mathcal{L}_{sty}=\mathbb{E}_{\mathbf{x},\tilde{y},\mathbf{z}}\left \| \tilde{\mathbf{s}}-E_{\tilde{y}}(G(\mathbf{x},\tilde{\mathbf{s}})) \right \|_1 \qquad(2)$
意思是将 $E$ 用来计算loss，如果生成的图像 $G(\mathbf{x},\tilde{\mathbf{s}})$ 被 $E$ 解码出来的style code与 $\tilde{\mathbf{s}}$ 差异很大的话，说明 $G$ 没有按照 $\tilde{\mathbf{s}}$ 去生成图像
style reconstruction loss可以看作是一种约束“相貌”的损失函数

Style diversification
为了保证 $G$ 生成图像的多样性，定义diversity sensitive loss如下
$\mathcal{L}_{ds}=\mathbb{E}_{\mathbf{x},\tilde{y},\mathbf{z}_1,\mathbf{z}_2}\left \| G(\mathbf{x},\tilde{\mathbf{s}}_1)-G(\mathbf{x},\tilde{\mathbf{s}}_2) \right \|_1 \qquad(3)$
对于 $G$ 来说，需要最大化公式(3)，如果对于同一幅图像 $\mathbf{x}$ 使用两个完全不同的style code $\mathbf{z}_1, \mathbf{z}_2$ 生成出来的图像很相像的话，那么说明 $G$ 的多样性不足

Since the objective does not have an optimal point, we linearly decay the weight of the loss to zero during training.

Preserving source characteristics
我们要求 $G$ 只改变图像与domain有关的部分，对于domain-invariant characteristics（比如pose，表情等）要求不变，通过cycle consistency loss来保证
$\mathcal{L}_{cyc}=\mathbb{E}_{\mathbf{x},y,\tilde{y},\mathbf{z}}\left \| \mathbf{x}-G\left ( G\left ( \mathbf{x},\tilde{\mathbf{s}} \right ),\hat{\mathbf{s}} \right ) \right \|_1 \qquad(4)$
其中 $\hat{\mathbf{s}}=E_y(\mathbf{x})$ 是图像 $\mathbf{x}$ 的estimated style code

Full objective
$\mathcal{L}_D=-\mathcal{L}_{adv} \qquad(5)$
$\begin{aligned} \mathcal{L}_{F,G,E}&=\mathcal{L}_{adv}+\lambda_{sty}\mathcal{L}_{sty}\\ &-\lambda_{ds}\mathcal{L}_{ds}+\lambda_{cyc}\mathcal{L}_{cyc} \qquad(6) \end{aligned}$

3. Experiments

Evaluation metrics.

衡量生成图像的评价指标有Frechét inception distance (FID)和learned perceptual image patch similarity (LPIPS)

3.1. Analysis of individual components

Baseline StarGAN的config包括：WGAN-GP、ACGAN discriminator、depth-wise concatenation

【总结】
StarGAN v2的优点是，训练数据只需要domain级别的标注，而不需要style级别的标注
缺点是对于只能处理一对domain

For multi-domain comparisons, we train these models multiple times for every pair of image domains.

o0Helloworld0o

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
StarGAN v2: Diverse Image Synthesis for Multiple Domains

输入图像x∈Xx\in\mathcal{X}x∈X，arbitrary domain y∈Yy\in\mathcal{Y}y∈Y
复制链接

扫一扫

专栏目录