3. Our Method
3.1. Review of CycleGAN
给定来自两个domain的unpaired training samples
x
∈
X
,
y
∈
Y
x\in X, y\in Y
x∈X,y∈Y,对于其从
X
X
X到
Y
Y
Y的mapping
G
X
→
Y
G_{X\rightarrow Y}
GX→Y,及其判别器
D
Y
D_Y
DY,adversarial loss定义如下
L
G
A
N
(
G
X
→
Y
,
D
Y
)
=
E
y
[
log
D
y
(
y
)
]
+
E
x
[
log
(
1
−
D
Y
(
G
X
→
Y
(
x
)
)
)
]
(
1
)
\begin{aligned} \mathcal{L}_{GAN}&\left ( G_{X\rightarrow Y}, D_Y \right )=\mathbb{E}_y\left [ \log D_y(y) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y\left ( G_{X\rightarrow Y}(x) \right ) \right ) \right ] \qquad(1) \end{aligned}
LGAN(GX→Y,DY)=Ey[logDy(y)]+Ex[log(1−DY(GX→Y(x)))](1)
CycleGAN学习正向和反向的mapping,the cycle consistency
loss如下
L
c
y
c
=
∥
G
Y
→
X
(
G
X
→
Y
(
x
)
)
−
x
∥
1
+
∥
G
X
→
Y
(
G
Y
→
X
(
y
)
)
−
y
∥
1
(
2
)
\begin{aligned} \mathcal{L}_{cyc}=&\left \| G_{Y\rightarrow X}\left ( G_{X\rightarrow Y}(x) \right )-x \right \|_1+ \\ &\left \| G_{X\rightarrow Y}\left ( G_{Y\rightarrow X}(y) \right )-y \right \|_1 \qquad(2) \end{aligned}
Lcyc=∥GY→X(GX→Y(x))−x∥1+∥GX→Y(GY→X(y))−y∥1(2)
CycleGAN的total objective function定义如下
L
(
G
X
→
Y
,
G
Y
→
X
,
D
X
,
D
Y
)
=
L
G
A
N
(
G
X
→
Y
,
D
Y
)
+
L
G
A
N
(
G
X
→
Y
,
D
Y
)
+
L
c
y
c
(
3
)
\begin{aligned} \mathcal{L}\big ( G_{X\rightarrow Y}, &G_{Y\rightarrow X}, D_X, D_Y \big ) = \\ &\mathcal{L}_{GAN}\left ( G_{X\rightarrow Y}, D_Y \right ) + \\ &\mathcal{L}_{GAN}\left ( G_{X\rightarrow Y}, D_Y \right ) + \mathcal{L}_{cyc} \qquad(3) \end{aligned}
L(GX→Y,GY→X,DX,DY)=LGAN(GX→Y,DY)+LGAN(GX→Y,DY)+Lcyc(3)
本文定义 X X X为real face domain, Y Y Y为cartoon face domain
3.2. Cartoon Face Landmark Assisted CycleGAN
3.2.1 Landmark Consistency Loss
L
c
(
G
(
X
,
L
)
→
Y
)
=
∥
R
Y
(
G
(
X
,
L
)
→
Y
(
x
,
l
)
)
−
l
∥
2
(
4
)
\begin{aligned} \mathcal{L}_c\big ( &G_{(X,L)\rightarrow Y} \big )= \\ &\left \| R_Y\left ( G_{(X,L)\rightarrow Y}\left ( x,l \right ) \right )-l \right \|_2 \qquad(4) \end{aligned}
Lc(G(X,L)→Y)=∥∥RY(G(X,L)→Y(x,l))−l∥∥2(4)
其中
l
∈
L
l\in L
l∈L是input landmark heatmap,
R
R
R是一个预训练的U-Net,用于预测landmark heatmap,
R
Y
R_Y
RY表示domain
Y
Y
Y中的landmark regressor
公式(4)的含义为,对于real face image
x
x
x及其landmark
l
l
l,送入生成器
G
(
X
,
L
)
→
Y
G_{(X,L)\rightarrow Y}
G(X,L)→Y生成图像,对于生成的图像使用
R
Y
R_Y
RY预测landmark,应该尽可能地与
l
l
l接近
3.2.2 Landmark Matched Global Discriminator
如Figure 2所示,对于translation
X
→
Y
X\rightarrow Y
X→Y,unconditional global discriminator
D
Y
D_Y
DY produces more realistic cartoon faces,conditional global discriminator
D
Y
g
c
D_Y^{g_c}
DYgc aims to generate landmark-matched cartoon faces with landmark heat map
l
∈
L
l\in L
l∈L as part of input
L
G
A
N
(
G
(
X
,
L
)
→
Y
,
D
Y
g
c
)
=
E
y
[
log
D
Y
(
y
,
l
)
]
+
E
x
[
log
(
1
−
D
Y
(
G
(
X
,
L
)
→
Y
(
x
,
l
)
,
l
)
)
]
(
5
)
\begin{aligned} \mathcal{L}_{GAN}\big ( &G_{(X,L)\rightarrow Y}, D_Y^{g_c} \big )=\mathbb{E}_y\left [ \log D_Y\left ( y,l \right ) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y\left ( G_{(X,L)\rightarrow Y}\left ( x,l \right ),l \right ) \right ) \right ] \qquad(5) \end{aligned}
LGAN(G(X,L)→Y,DYgc)=Ey[logDY(y,l)]+Ex[log(1−DY(G(X,L)→Y(x,l),l))](5)
3.2.3 Landmark Guided Local Discriminator
在眼睛、鼻子、嘴巴的区域引入3个local discriminators,其adversarial loss定义如下
L
G
A
N
l
o
c
a
l
X
→
Y
=
∑
i
=
1
3
λ
l
i
⋅
L
G
A
N
p
a
t
c
h
(
G
(
X
,
L
)
→
Y
,
D
Y
l
i
)
=
∑
i
=
1
3
λ
l
i
{
E
y
[
log
D
Y
l
i
(
y
p
)
]
+
E
x
[
log
(
1
−
D
Y
l
i
(
[
G
(
X
,
L
)
→
Y
(
x
)
]
p
)
)
]
}
(
6
)
\begin{aligned} &\mathcal{L}_{GAN_{local}^{X\rightarrow Y}}=\sum_{i=1}^{3}\lambda_{l_i}\cdot\mathcal{L}_{GAN_{patch}}\left ( G_{(X,L)\rightarrow Y}, D_Y^{l_i} \right ) \\ &=\sum_{i=1}^{3}\lambda_{l_i}\Big \{ \mathbb{E}_y\left [ \log D_Y^{l_i}\left ( y_p \right ) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y^{l_i}\left ( \left [ G_{(X,L)\rightarrow Y}(x) \right ]_p \right ) \right ) \right ]\Big \} \qquad(6) \end{aligned}
LGANlocalX→Y=i=1∑3λli⋅LGANpatch(G(X,L)→Y,DYli)=i=1∑3λli{Ey[logDYli(yp)]+Ex[log(1−DYli([G(X,L)→Y(x)]p))]}(6)
其中
y
p
y_p
yp与
[
G
(
X
,
L
)
→
Y
(
x
)
]
p
\left [ G_{(X,L)\rightarrow Y}(x) \right ]_p
[G(X,L)→Y(x)]p分别表示real cartoon image与generated cartoon image的local patch
3.3. Network Training
3.3.1 Two Stage Training
Stage I 首先在framework中去掉local discriminator训练100K iterations,得到coarse results
Stage II 使用pre-trained landmark prediction network对coarse images预测landmark,利用landmark提取local patch,送入local discriminator得到更精确的生成结果